[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Reply to: [list | sender only]
Re: [ddlm-group] Second proposal to allow looping of 'Set'categories
- To: ddlm-group@iucr.org
- Subject: Re: [ddlm-group] Second proposal to allow looping of 'Set'categories
- From: "john.westbrook@rcsb.org" <john.westbrook@rcsb.org>
- Date: Sat, 11 Jun 2016 18:14:18 -0400
- DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;d=rcsb-org.20150623.gappssmtp.com; s=20150623;h=subject:to:references:from:message-id:date:user-agent:mime-version:in-reply-to:content-transfer-encoding;bh=TbEYaIEGG8356F+ZnjwgIQ+BxdPdb6P+eywAoGXQ2TI=;b=ALNzv5MKnO7fHTaJoJNVS4PDst+piANPi7UGYaRccFmz84TUW7tTUuNpE+A3Li0xCMIvT2Kwxi5GjlwBa18jjmptVJpD/g+QTyP7AKudZHFCi2OzG3RF2Nl2dK2SUPQL5SgWjXZDon3WVdSkhKHwYongwTBJWKdmUoMQ3/39cflLW6bbLA3fjBmBjqnZUbWkweC5aErQAy4/fnTqiVkiD8U0WK2/IyyGUsRqoK/OKGjS7VAN2Zj2V+VdgakOs4eZe+pvlcBDUm2o1a3WfH+DrO/fkKvlrwoy3hp32x41ZW8N1XjxTHvaW+w/aZPEpL6gt9x6NzlYasBBxPK0CI6bhw==
- In-Reply-To: <CABcsX26ioPbqb_NCj5M2sPr3gd2es3YhYXr2Ub2Z9R5+ewUcYg@mail.gmail.com>
- References: <CAM+dB2fTUYtNQNaFMGQFnNyqnAgmU4koexAu-ZsiKm5L+S7qBg@mail.gmail.com><BY2PR0401MB09365AE6ADE7ECE63BF2B300E0500@BY2PR0401MB0936.namprd04.prod.outlook.com><BY2PR0401MB093608D8433F50A7B7BA1E4FE0500@BY2PR0401MB0936.namprd04.prod.outlook.com><CABcsX26ioPbqb_NCj5M2sPr3gd2es3YhYXr2Ub2Z9R5+ewUcYg@mail.gmail.com>
I would like to reinforce Herb's comments. Adding complexity to what is alreadya confusing artifact of the CIF syntax seems unadvisable. I agree that this shouldbe treated as a matter style and leave the choice in the users hands. Regards, John On 6/11/16 4:22 PM, Herbert J. Bernstein wrote:> Dear Colleagues,>> I am very puzzled about the concerns being raised. Please bear with me.>> An experimental data CIF, whether it is a core CIF, a macromolecular CIF, a powder CIF is in general a container for all the> information about a particular experiment. Within that container we may have various tags and values. In some cases, for that> experiment, there is only one instance of a value being assigned to that tag, and it is convenient to use the a tag value pair to> express that assignment. In some cases, even though we a dealing with a single instance of an experiment, in order to describe that> single experiment, we need to allow for multiple values to be assigned to the same tag, and CIF provides the loop_ construct to> allow us to present those multiple values with a single presentation of the name of the tag. So if there is one author we might say>> _audit_author "Peter Pan"> _audit_author_address "Second to the right, and straight on till morning">> but if there were a second author we might say>> loop_> _audit_author> _audit_author_address> "Peter Pan" "Second to the right, and straight on till morning"> "Winnie the Pooh" "Hundred Aker Wood">> Notice that we don't make this into two separate loops to avoid creating an ambiguity about which author lives at which address.>> No information would have been gained or lost if we had only one author and had chosen to use a loop instead, not would any> information have been gained or lost in the case of a single author if we had presented the name and address in two separate loops.>> We have created rules about which tags belong together in the same loop, but the reality is that there are people who either do not> know those rules or who choose to ignore them. I would suggest that, as long as the intended result is clear and unambiguous, it is> not only unnecessary but pointlessly obstructionist to rigidiy enforce such rules. Please like to create very flat CIFs for> convenience in data harvesting. People like to create databases spanning multiple sets of results. People (including the IUCr> publications office) like to extract particular sets of tags with their values and add "extra" tags we have not thought of.>> They are going to do these things whether we like it or not. Yes it is helpful if we provide recommended best practices and> warnings about practices that seem to cause difficulty, but if CIF is to be a maximally useful container, a maximally useful> framework we need to focus on beings as descriptive as possible and avoid being pointlessly prescriptive.>> Which is a very, very long way of saying that I think we should allow looping of tags in all cases in which the intended meaning is> unambiguous, and focus the most of attention of warnings in software on specific things in specific datasets that seem likely to> result in ambiguous interpretions of those specific datasets.>> I hope that made some sense.>> Regards,> Herbert>>>>>>> On Fri, Jun 10, 2016 at 10:55 AM, Bollinger, John C <John.Bollinger@stjude.org <mailto:John.Bollinger@stjude.org>> wrote:>> Dear all,>> This is not a fully-formed proposal, but more an outline and general idea for approaching the problem with Set categories and> definition reuse. If it is well received then I am prepared to expand it into a full proposal, but I suspect that the> discussion will bring out considerations and nuances that will make that smoother and more successful, if indeed such a proposal> is ever developed.>> We previously agreed that the problem we are trying to solve arises from data dictionaries making assumptions about what kind of> thing is described by a data block (or save frame). Many data dictionaries do this implicitly; mmCIF has the distinction of> doing it explicitly, via its ENTITY category. Either way, this causes difficulty when we want to reuse a definition to describe> a different kind of entity. It would be ideal, therefore, to choose a solution that strikes at the root of this problem, and> there are at least two general ways we could do this:>> (1) Express the definition of "entity" applicable to a data block in that data block, by means of appropriate new data items.> Suitable choices of default values for the new items could preserve the current meaning of data files that do not present those> items.>> (2) Express the definition of "entity" in the relevant dictionaries, but *factor out* category and item definitions into> separate dictionaries or dictionary modules. Thus, two or more dictionaries with different senses of what an entity is -- e.g.> the Core and Symmetry dictionaries -- would not either one re-use definitions provided by the other, but instead both re-use (by> suitably-structured import) definitions provided by a dictionary module that itself leaves "entity" undefined.>> Of those two, I am more interested in the latter. A significant advantage of that approach is that at one important level it> meets both of the seemingly-conflicting criteria that were earlier presented: *with respect to individual data dictionaries*, it> does not require new data name variants to be introduced to support loopability, and it also does not require that such a> dictionary permit data files that existing software is at risk of misunderstanding. It achieves this, essentially, by limiting> the scope of some aspects of category key definition to specific dictionaries. There would thus be no conflict between, for> example, SPACE_GROUP being loopable in all data files conforming to the symmetry dictionary, but not being loopable in any data> file conforming to the core dictionary.>> One downside would be that the composability of data dictionaries would be restricted (more). For instance, one could not rely> simultaneously on the full definitions of SPACE_GROUP items as drawn from the core and symmetry dictionaries. Another downside> would be that correct validation would be even more dependent on identifying the correct dictionary(-ies) against which to> validate. I am uncertain how significant a disadvantage either of those would be, however.>> Thoughts?>>> Regards,>> John>> --> John C. Bollinger, Ph.D.> Computing and X-Ray Scientist> Department of Structural Biology> St. Jude Children's Research Hospital> John.Bollinger@StJude.org> (901) 595-3166 <tel:%28901%29%20595-3166> [office]> www.stjude.org <http://www.stjude.org>>>>>> ________________________________>> Email Disclaimer: www.stjude.org/emaildisclaimer <http://www.stjude.org/emaildisclaimer>> Consultation Disclaimer: www.stjude.org/consultationdisclaimer <http://www.stjude.org/consultationdisclaimer>> _______________________________________________> ddlm-group mailing list> ddlm-group@iucr.org <mailto:ddlm-group@iucr.org>> http://mailman.iucr.org/cgi-bin/mailman/listinfo/ddlm-group>>>>> _______________________________________________> ddlm-group mailing list> ddlm-group@iucr.org> http://mailman.iucr.org/cgi-bin/mailman/listinfo/ddlm-group> -- John Westbrook, Ph.D.RCSB, Protein Data BankRutgers, The State University of New JerseyDepartment of Chemistry and Chemical Biology174 Frelinghuysen RdPiscataway, NJ 08854-8087e-mail: john.westbrook@rcsb.orgPh: (848) 445-4290 Fax: (732) 445-4320_______________________________________________ddlm-group mailing listddlm-group@iucr.orghttp://mailman.iucr.org/cgi-bin/mailman/listinfo/ddlm-group
Reply to: [list | sender only]
- References:
- [ddlm-group] Second proposal to allow looping of 'Set' categories (James Hester)
- Re: [ddlm-group] Second proposal to allow looping of 'Set'categories (Bollinger, John C)
- Re: [ddlm-group] Second proposal to allow looping of'Set' categories (Bollinger, John C)
- Re: [ddlm-group] Second proposal to allow looping of 'Set'categories (Herbert J. Bernstein)
- Prev by Date: Re: [ddlm-group] Second proposal to allow looping of 'Set'categories
- Next by Date: Re: [ddlm-group] Second proposal to allow looping of 'Set'categories
- Prev by thread: Re: [ddlm-group] Second proposal to allow looping of 'Set'categories
- Next by thread: Re: [ddlm-group] Second proposal to allow looping of 'Set'categories
- Index(es):