[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Reply to: [list | sender only]
Re: Important CIF items for discussion
- To: "Discussion list of the IUCr Committee for the Maintenance of the CIFStandard (COMCIFS)" <comcifs@iucr.org>, CIF Developers <cif-developers@iucr.org>
- Subject: Re: Important CIF items for discussion
- From: "Herbert J. Bernstein" <yaya@bernstein-plus-sons.com>
- Date: Fri, 11 Jul 2008 17:13:17 -0400
- In-Reply-To: <48777D55.6050606@mcmaster.ca>
- References: <48777D55.6050606@mcmaster.ca>
Dear Colleagues, David has raised very important issues, but they are based on two CIF principles that have never really held, and under DDLm need very serious modification: The first principle put forward by David is that "CIF (normally) defines only one data item for a given piece of information and the format of that item is the one closest to the natural representation of the quantity (rather than a format in common use or one designed for ease of computation). For example, the CIF specifies which units that are to be used and alternative units are not permitted." The second principle is that "A given piece of information may appear only once in a given datablock." Certainly it is desirable to avoid easily misunderstood choices of representation and it is desirable to avoid pointless repetition of the same information, but we already treat these more as sensible guidance than as rigid rules. Even the generally well-accept "principle" of not allowing alternate units is "violated" in the PDB Exchange dictionary, and will be in the next imgCIF dictionary. One could also question how "natural" the choices of units are. Consider, for example, cell parameters. The "natural" units for angles are radians, but out of respect for common practice, we use degrees. The second principle is also violated, for example by allowing cell volumes as well as cell edge lengths and angles. The volume is derived and duplicative, and it certainly is possible to generate a CIF in which the cell volumes is inconsistent with the cell edge lengths and angles, and, unlike U's vs. B's it is highly likely that both will have been given, along with, in the case of mmCIF, several transformation matrices that also need to be cross-checked. What is nice about DDLm is that it now allows us to put the necessary cross-check information into the dictionaries, and that would seem to me the best way to address David's third issue of dealing with intermediate values used in a computation -- allow them in, but only with the methods necessary to validate them given. We only get ourselves in trouble if we use the same data item name for two different or computationally inconsistent meanings, not if we have unique names for related items. The layering of dictionaries and hierarchy of methods can be viewed as bugs or as features. A CIF being used for a journal submission or an archive submission has to be a complete fully documented package. In that case, it would not be appropriate for the CIF to depend on a local dictionary not submitted with the data CIF, e.g. using David's option D of _audit_conform_included_dictionary, but clearly it would be helpful to the community to have the IUCr start and archive of "local" dictionaries, preferably with namespace controls, e.g. using the current system of prefixes. For work within a lab, however, we cannot and should not act as the "CIF police". If someone has a scientific use for odd layering and real-time assembly of virtual dictionaries, why should they not do it? When the CIF is ready to be moved elsewhere it needs to be cleaned up and documented, e.g. by creating the expanded dictionary from the local pieces. It would suggest replacing the two principles with the following three guidelines: 1. When defining new data items, dictionary developers are advised to avoid unnecessarily duplicative definitions, e.g. two definitions of the same item that differ only in the units used. Exceptions should be justified and fully documented. 2. When relationships exist among multiple definitions, those relationships should be stated clearly, and, if at all possible, algorithmically, preferably using DDLm, to allow for automatic validation, and, when, necessary, generation of values for missing data items. 3. An existing tag should never be used in a way that is inconsistent with definitions used by journals and archives. Regards, Herbert At 11:33 AM -0400 7/11/08, David Brown wrote: >I have attached to this email a dicsussion paper concerning three >issues that have arisen during our evaluation of the new DDLm. >These are important issues for the future of CIF, and before I ask >the voting members of COMCIFS to make a decision, I would like to >see the issues fully discussed and, if possible, a consensus >reached. The attached document is six pages long, but I hope you >will take the time to read it and comment on the issues raised. > >I apologize if you have received this message more than once by >being on more than one discussion list. Please ignore the second >email. > >David Brown >Chair, COMCIFS > >Attachment converted: Macintosh HD:CIFprinciplesDiscussion.pdf (PDF >/«IC») (0033D5BF) >Attachment converted: Macintosh HD:idbrown 1.vcf (TEXT/ttxt) (0033D5C0) >_______________________________________________ >comcifs mailing list >comcifs@iucr.org >http://scripts.iucr.org/mailman/listinfo/comcifs -- ===================================================== Herbert J. Bernstein, Professor of Computer Science Dowling College, Kramer Science Center, KSC 121 Idle Hour Blvd, Oakdale, NY, 11769 +1-631-244-3035 yaya@dowling.edu =====================================================
Reply to: [list | sender only]
- References:
- Important CIF items for discussion (David Brown)
- Prev by Date: Important CIF items for discussion
- Next by Date: Re: Important CIF items for discussion
- Prev by thread: Important CIF items for discussion
- Next by thread: Re: Important CIF items for discussion
- Index(es):