RE: enumeration values
- Subject: RE: enumeration values
- From: "Bollinger, John C" <John.Bollinger@xxxxxxxxxx>
- Date: Fri, 4 Jun 2010 15:11:38 -0500
- Accept-Language: en-US
- acceptlanguage: en-US
- In-Reply-To: <AANLkTikX0ngj8MfE7vAQ9gfN59oILZGtAgaNS1mLwR9M@mail.gmail.com>
- References: <AANLkTil-uCJIT-Rg_07zc3FTtY3ln8FMiOvAhcsGWAPq@mail.gmail.com><AANLkTikX0ngj8MfE7vAQ9gfN59oILZGtAgaNS1mLwR9M@mail.gmail.com>
Dear Richard, Formal validation against DDL1 dictionaries has always been a dodgy proposition. The dictionary language was designed and the initial core dictionary was written before the requirements for or practice of CIF validation were well developed.
For another, less tractable example, see the discussion around http://www.iucr.org/__data/iucr/lists/cif-developers/msg00182.html. Considering the history, it's a credit to the CIF, DDL, and validator designers that validation against DDL1 dictionaries works
as well as it does. As for _atom_site_refinement_flags specifically, the definition text is at odds with the actual formal definition. As far as I can tell, PyCifRW is validating correctly according to the formal definition, exactly as I would expect a
dictionary-driven validator to do. It is debatable whether this should be considered an error, and if so, what the nature of the error would be, but I'm inclined to account it an error in the dictionary. As you observe, this has contributed to a proliferation
of formally invalid DDL1 CIFs. This looks like a rough area left over from the faster and looser days of early CIF development. I doubt that it was an oversight per se that the various flag combinations were not explicitly enumerated -- that was probably a conscious
choice. Rather, I suspect the oversight was in not appreciating that enumerating all 128 (excluding permutations) alternatives was important. Very likely this accounts at least partially for the deprecation of the data name in favor of _atom_site_refinement_flags_posn,
_atom_site_refinement_flags_adp, and _atom_site_refinement_flags_occupancy. As you observed, the definitions of these each enumerate all possible combinations of their respective flags (assigning a single flag order for each combination). There are several things you could do about this, among them: 1) Nothing. Document this data name as one where spurious validation errors may occur. 2) Update your local copy of the dictionary by adding all the missing flag combinations. 3) Pre-process your CIFs to replace _atom_site_refinement_flags with the three data names that jointly supersede it (splitting the values among those data names as appropriate). With respect to case sensitivity of data values, there is a bit of a gray area. On one hand, CIF clearly says that the data value 'monoclinic' is not the same as the data value 'Monoclinic' *for CIF's purposes*. That does not, however,
prevent an application consuming a CIF from treating them equivalently. The question that then arises is on which side of the CIF / application line does validation fall? There are arguments both ways. If you are writing a validation package, then I suggest you issue a warning (as opposed to an error) for case mismatches such as you describe, or that you provide an alternative lax validation mode that accepts this sort of variance.
If your concern is with writing valid CIFs, however, then you would be wise to use only the exact values enumerated in the dictionary. Best Regards, John -- John C. Bollinger, Ph.D. Department of Structural Biology St. Jude Children's Research Hospital From: cif-developers-bounces@iucr.org [mailto:cif-developers-bounces@iucr.org]
On Behalf Of Richard Gildea Dear All, The enumeration entry in cif_core.dic for _atom_site_refinement_flags is as follows: loop_ _enumeration _enumeration_detail . 'no refinement constraints' S 'special-position constraint on site' G 'rigid-group refinement of site' R 'riding-atom site attached to non-riding atom' D 'distance or angle restraint on site' T 'thermal displacement constraints' U 'Uiso or Uij restraint (rigid bond)' P 'partial occupancy constraint' No combinations are specified explicitly as being allowed, which is in contrast to the entry for _atom_site_refinement_flags_posn: loop_ _enumeration _enumeration_detail . 'no constraints on positional coordinates' D 'distance or angle restraint on positional coordinates' G 'rigid-group refinement of positional coordinates' R 'riding-atom site attached to non-riding atom' S 'special-position constraint on positional coordinates' DG 'combination of the above constraints' DR 'combination of the above constraints' DS 'combination of the above constraints' GR 'combination of the above constraints' GS 'combination of the above constraints' RS 'combination of the above constraints' DGR 'combination of the above constraints' DGS 'combination of the above constraints' DRS 'combination of the above constraints' GRS 'combination of the above constraints' DGRS 'combination of the above constraints' The text of the definition for _atom_site_refinement_flags refers to a "concatenated series of single-letter codes", which suggests that combinations of the code are allowed. This is in contrast to the text at the html version of the dictionary
(http://www.iucr.org/__data/iucr/cifdic_html/1/cif_core.dic/Iatom_site_refinement_flags.html) which states (a little ambiguously) "The
data value must be ONE of the following:" (my emphasis), before the list of single letter codes. Is it an oversight that the allowed combinations are missing from the enumeration list for _atom_site_refinement_flags? As it is, a program validating against the cif_core.dic would (I suspect incorrectly) flag a value of, for example,
'PR' as an invalid value for the item _atom_site_refinement_flags. PyCifRW is one such program that flags these concatenated values as being outside the permitted set. Without the allowed combinations being also included in the list, it makes it hard to programmatically
validate such combinations. Alternatively, if combinations are not allowed, then this would render many files as output by SHELXL to be invalid (for example when a hydrogen is riding on a partially occupied atom). I have a further question about case sensitivity of data values. The cif specification states that the case of data values must be respected - does this mean therefore that, for example, a value of 'Monoclinic' for _symmetry_cell_setting
(a commonly encountered case, judging by our local database of CIFs) should be flagged as being outside of the permitted set, or should it be allowed as valid? Thanks, Richard Email Disclaimer: www.stjude.org/emaildisclaimer |
_______________________________________________ cif-developers mailing list cif-developers@iucr.org http://scripts.iucr.org/mailman/listinfo/cif-developers
Reply to: [list | sender only]
- Follow-Ups:
- Re: enumeration values (Richard Gildea)
- References:
- enumeration values (Richard Gildea)
- Prev by Date: Re: enumeration values
- Next by Date: Re: enumeration values
- Prev by thread: Re: enumeration values
- Next by thread: Re: enumeration values
- Index(es):