Discussion List Archives

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [ddlm-group] Second proposal to allow looping of 'Set'categories

I would like to reinforce Herb's comments.   Adding complexity to what is alreadya confusing artifact of the CIF syntax seems unadvisable.  I agree that this shouldbe treated as a matter style and leave the choice in the users hands.
Regards,
John
On 6/11/16 4:22 PM, Herbert J. Bernstein wrote:> Dear Colleagues,>>   I am very puzzled about the concerns being raised.  Please bear with me.>>   An experimental data CIF, whether it is a core CIF, a macromolecular CIF, a powder CIF is in general a container for all the> information about a particular experiment.  Within that container we may have various tags and values.  In some cases, for that> experiment, there is only one instance of a value being assigned to that tag, and it is convenient to use the a tag value pair to> express that assignment.  In some cases, even though we a dealing with a single instance of an experiment, in order to describe that> single experiment, we need to allow for multiple values to be assigned to the same tag, and CIF provides the loop_ construct to> allow us to present those multiple values with a single presentation of the name of the tag.  So if there is one author we might say>>   _audit_author "Peter Pan">   _audit_author_address "Second to the right, and straight on till morning">> but if there were a second author we might say>> loop_>     _audit_author>     _audit_author_address>   "Peter Pan" "Second to the right, and straight on till morning">   "Winnie the Pooh" "Hundred Aker Wood">> Notice that we don't make this into two separate loops to avoid creating an ambiguity about which author lives at which address.>> No information would have been gained or lost if we had only one author and had chosen to use a loop instead, not would any> information have been gained or lost in the case of a single author if we had presented the name and address in two separate loops.>> We have created rules about which tags belong together in the same loop, but the reality is that there are people who either do not> know those rules or who choose to ignore them. I would suggest that, as long as the intended result is clear and unambiguous, it is> not only unnecessary but pointlessly obstructionist to rigidiy enforce such rules.  Please like to create very flat CIFs for> convenience in data harvesting.  People like to create databases spanning multiple sets of results.  People (including the IUCr> publications office) like to extract particular sets of tags with their values and add "extra" tags we have not thought of.>> They are going to do these things whether we like it or not.  Yes it is helpful if we provide recommended best practices and> warnings about practices that seem to cause difficulty, but if CIF is to be a maximally useful container, a maximally useful> framework we need to focus on beings as descriptive as possible and avoid being pointlessly prescriptive.>> Which is a very, very long way of saying that I think we should allow looping of tags in all cases in which the intended meaning is> unambiguous, and focus the most of attention of warnings in software on specific things in specific datasets that seem likely to> result in ambiguous interpretions of those specific datasets.>> I hope that made some sense.>> Regards,>     Herbert>>>>>>> On Fri, Jun 10, 2016 at 10:55 AM, Bollinger, John C <John.Bollinger@stjude.org <mailto:John.Bollinger@stjude.org>> wrote:>>     Dear all,>>     This is not a fully-formed proposal, but more an outline and general idea for approaching the problem with Set categories and>     definition reuse.  If it is well received then I am prepared to expand it into a full proposal, but I suspect that the>     discussion will bring out considerations and nuances that will make that smoother and more successful, if indeed such a proposal>     is ever developed.>>     We previously agreed that the problem we are trying to solve arises from data dictionaries making assumptions about what kind of>     thing is described by a data block (or save frame).  Many data dictionaries do this implicitly; mmCIF has the distinction of>     doing it explicitly, via its ENTITY category.  Either way, this causes difficulty when we want to reuse a definition to describe>     a different kind of entity.  It would be ideal, therefore, to choose a solution that strikes at the root of this problem, and>     there are at least two general ways we could do this:>>     (1) Express the definition of "entity" applicable to a data block in that data block, by means of appropriate new data items.>     Suitable choices of default values for the new items could preserve the current meaning of data files that do not present those>     items.>>     (2) Express the definition of "entity" in the relevant dictionaries, but *factor out* category and item definitions into>     separate dictionaries or dictionary modules.  Thus, two or more dictionaries with different senses of what an entity is -- e.g.>     the Core and Symmetry dictionaries -- would not either one re-use definitions provided by the other, but instead both re-use (by>     suitably-structured import) definitions provided by a dictionary module that itself leaves "entity" undefined.>>     Of those two, I am more interested in the latter.  A significant advantage of that approach is that at one important level it>     meets both of the seemingly-conflicting criteria that were earlier presented: *with respect to individual data dictionaries*, it>     does not require new data name variants to be introduced to support loopability, and it also does not require that such a>     dictionary permit data files that existing software is at risk of misunderstanding.  It achieves this, essentially, by limiting>     the scope of some aspects of category key definition to specific dictionaries.  There would thus be no conflict between, for>     example, SPACE_GROUP being loopable in all data files conforming to the symmetry dictionary, but not being loopable in any data>     file conforming to the core dictionary.>>     One downside would be that the composability of data dictionaries would be restricted (more).  For instance, one could not rely>     simultaneously on the full definitions of SPACE_GROUP items as drawn from the core and symmetry dictionaries.  Another downside>     would be that correct validation would be even more dependent on identifying the correct dictionary(-ies) against which to>     validate. I am uncertain how significant a disadvantage either of those would be, however.>>     Thoughts?>>>     Regards,>>     John>>     -->     John C. Bollinger, Ph.D.>     Computing and X-Ray Scientist>     Department of Structural Biology>     St. Jude Children's Research Hospital>     John.Bollinger@StJude.org>     (901) 595-3166 <tel:%28901%29%20595-3166> [office]>     www.stjude.org <http://www.stjude.org>>>>>>     ________________________________>>     Email Disclaimer: www.stjude.org/emaildisclaimer <http://www.stjude.org/emaildisclaimer>>     Consultation Disclaimer: www.stjude.org/consultationdisclaimer <http://www.stjude.org/consultationdisclaimer>>     _______________________________________________>     ddlm-group mailing list>     ddlm-group@iucr.org <mailto:ddlm-group@iucr.org>>     http://mailman.iucr.org/cgi-bin/mailman/listinfo/ddlm-group>>>>> _______________________________________________> ddlm-group mailing list> ddlm-group@iucr.org> http://mailman.iucr.org/cgi-bin/mailman/listinfo/ddlm-group>
-- John Westbrook, Ph.D.RCSB, Protein Data BankRutgers, The State University of New JerseyDepartment of Chemistry and Chemical Biology174 Frelinghuysen RdPiscataway, NJ 08854-8087e-mail: john.westbrook@rcsb.orgPh: (848) 445-4290 Fax: (732) 445-4320_______________________________________________ddlm-group mailing listddlm-group@iucr.orghttp://mailman.iucr.org/cgi-bin/mailman/listinfo/ddlm-group

Reply to: [list | sender only]
International Union of Crystallography

Scientific Union Member of the International Science Council (admitted 1947). Member of CODATA, the ISC Committee on Data. Partner with UNESCO, the United Nations Educational, Scientific and Cultural Organization in the International Year of Crystallography 2014.

International Science Council Scientific Freedom Policy

The IUCr observes the basic policy of non-discrimination and affirms the right and freedom of scientists to associate in international scientific activity without regard to such factors as ethnic origin, religion, citizenship, language, political stance, gender, sex or age, in accordance with the Statutes of the International Council for Science.