[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [ddlm-group] Second proposal to allow looping of 'Set'categories

I would like to reinforce Herb's comments.   Adding complexity to what is alreadya confusing artifact of the CIF syntax seems unadvisable.  I agree that this shouldbe treated as a matter style and leave the choice in the users hands.
Regards,
John
On 6/11/16 4:22 PM, Herbert J. Bernstein wrote:> Dear Colleagues,>>   I am very puzzled about the concerns being raised.  Please bear with me.>>   An experimental data CIF, whether it is a core CIF, a macromolecular CIF, a powder CIF is in general a container for all the> information about a particular experiment.  Within that container we may have various tags and values.  In some cases, for that> experiment, there is only one instance of a value being assigned to that tag, and it is convenient to use the a tag value pair to> express that assignment.  In some cases, even though we a dealing with a single instance of an experiment, in order to describe that> single experiment, we need to allow for multiple values to be assigned to the same tag, and CIF provides the loop_ construct to> allow us to present those multiple values with a single presentation of the name of the tag.  So if there is one author we might say>>   _audit_author "Peter Pan">   _audit_author_address "Second to the right, and straight on till morning">> but if there were a second author we might say>> loop_>     _audit_author>     _audit_author_address>   "Peter Pan" "Second to the right, and straight on till morning">   "Winnie the Pooh" "Hundred Aker Wood">> Notice that we don't make this into two separate loops to avoid creating an ambiguity about which author lives at which address.>> No information would have been gained or lost if we had only one author and had chosen to use a loop instead, not would any> information have been gained or lost in the case of a single author if we had presented the name and address in two separate loops.>> We have created rules about which tags belong together in the same loop, but the reality is that there are people who either do not> know those rules or who choose to ignore them. I would suggest that, as long as the intended result is clear and unambiguous, it is> not only unnecessary but pointlessly obstructionist to rigidiy enforce such rules.  Please like to create very flat CIFs for> convenience in data harvesting.  People like to create databases spanning multiple sets of results.  People (including the IUCr> publications office) like to extract particular sets of tags with their values and add "extra" tags we have not thought of.>> They are going to do these things whether we like it or not.  Yes it is helpful if we provide recommended best practices and> warnings about practices that seem to cause difficulty, but if CIF is to be a maximally useful container, a maximally useful> framework we need to focus on beings as descriptive as possible and avoid being pointlessly prescriptive.>> Which is a very, very long way of saying that I think we should allow looping of tags in all cases in which the intended meaning is> unambiguous, and focus the most of attention of warnings in software on specific things in specific datasets that seem likely to> result in ambiguous interpretions of those specific datasets.>> I hope that made some sense.>> Regards,>     Herbert>>>>>>> On Fri, Jun 10, 2016 at 10:55 AM, Bollinger, John C <John.Bollinger@stjude.org <mailto:John.Bollinger@stjude.org>> wrote:>>     Dear all,>>     This is not a fully-formed proposal, but more an outline and general idea for approaching the problem with Set categories and>     definition reuse.  If it is well received then I am prepared to expand it into a full proposal, but I suspect that the>     discussion will bring out considerations and nuances that will make that smoother and more successful, if indeed such a proposal>     is ever developed.>>     We previously agreed that the problem we are trying to solve arises from data dictionaries making assumptions about what kind of>     thing is described by a data block (or save frame).  Many data dictionaries do this implicitly; mmCIF has the distinction of>     doing it explicitly, via its ENTITY category.  Either way, this causes difficulty when we want to reuse a definition to describe>     a different kind of entity.  It would be ideal, therefore, to choose a solution that strikes at the root of this problem, and>     there are at least two general ways we could do this:>>     (1) Express the definition of "entity" applicable to a data block in that data block, by means of appropriate new data items.>     Suitable choices of default values for the new items could preserve the current meaning of data files that do not present those>     items.>>     (2) Express the definition of "entity" in the relevant dictionaries, but *factor out* category and item definitions into>     separate dictionaries or dictionary modules.  Thus, two or more dictionaries with different senses of what an entity is -- e.g.>     the Core and Symmetry dictionaries -- would not either one re-use definitions provided by the other, but instead both re-use (by>     suitably-structured import) definitions provided by a dictionary module that itself leaves "entity" undefined.>>     Of those two, I am more interested in the latter.  A significant advantage of that approach is that at one important level it>     meets both of the seemingly-conflicting criteria that were earlier presented: *with respect to individual data dictionaries*, it>     does not require new data name variants to be introduced to support loopability, and it also does not require that such a>     dictionary permit data files that existing software is at risk of misunderstanding.  It achieves this, essentially, by limiting>     the scope of some aspects of category key definition to specific dictionaries.  There would thus be no conflict between, for>     example, SPACE_GROUP being loopable in all data files conforming to the symmetry dictionary, but not being loopable in any data>     file conforming to the core dictionary.>>     One downside would be that the composability of data dictionaries would be restricted (more).  For instance, one could not rely>     simultaneously on the full definitions of SPACE_GROUP items as drawn from the core and symmetry dictionaries.  Another downside>     would be that correct validation would be even more dependent on identifying the correct dictionary(-ies) against which to>     validate. I am uncertain how significant a disadvantage either of those would be, however.>>     Thoughts?>>>     Regards,>>     John>>     -->     John C. Bollinger, Ph.D.>     Computing and X-Ray Scientist>     Department of Structural Biology>     St. Jude Children's Research Hospital>     John.Bollinger@StJude.org>     (901) 595-3166 <tel:%28901%29%20595-3166> [office]>     www.stjude.org <http://www.stjude.org>>>>>>     ________________________________>>     Email Disclaimer: www.stjude.org/emaildisclaimer <http://www.stjude.org/emaildisclaimer>>     Consultation Disclaimer: www.stjude.org/consultationdisclaimer <http://www.stjude.org/consultationdisclaimer>>     _______________________________________________>     ddlm-group mailing list>     ddlm-group@iucr.org <mailto:ddlm-group@iucr.org>>     http://mailman.iucr.org/cgi-bin/mailman/listinfo/ddlm-group>>>>> _______________________________________________> ddlm-group mailing list> ddlm-group@iucr.org> http://mailman.iucr.org/cgi-bin/mailman/listinfo/ddlm-group>
-- John Westbrook, Ph.D.RCSB, Protein Data BankRutgers, The State University of New JerseyDepartment of Chemistry and Chemical Biology174 Frelinghuysen RdPiscataway, NJ 08854-8087e-mail: john.westbrook@rcsb.orgPh: (848) 445-4290 Fax: (732) 445-4320_______________________________________________ddlm-group mailing listddlm-group@iucr.orghttp://mailman.iucr.org/cgi-bin/mailman/listinfo/ddlm-group

Reply to: [list | sender only]