RE: enumeration values
- Subject: RE: enumeration values
- From: "Bollinger, John C" <John.Bollinger@xxxxxxxxxx>
- Date: Fri, 4 Jun 2010 15:11:38 -0500
- Accept-Language: en-US
- acceptlanguage: en-US
- In-Reply-To: <AANLkTikX0ngj8MfE7vAQ9gfN59oILZGtAgaNS1mLwR9M@mail.gmail.com>
- References: <AANLkTil-uCJIT-Rg_07zc3FTtY3ln8FMiOvAhcsGWAPq@mail.gmail.com><AANLkTikX0ngj8MfE7vAQ9gfN59oILZGtAgaNS1mLwR9M@mail.gmail.com>
Formal validation against DDL1 dictionaries has always been a dodgy proposition. The dictionary language was designed and the initial core dictionary was written before the requirements for or practice of CIF validation were well developed. For another, less tractable example, see the discussion around http://www.iucr.org/__data/iucr/lists/cif-developers/msg00182.html. Considering the history, it's a credit to the CIF, DDL, and validator designers that validation against DDL1 dictionaries works as well as it does.
As for _atom_site_refinement_flags specifically, the definition text is at odds with the actual formal definition. As far as I can tell, PyCifRW is validating correctly according to the formal definition, exactly as I would expect a dictionary-driven validator to do. It is debatable whether this should be considered an error, and if so, what the nature of the error would be, but I'm inclined to account it an error in the dictionary. As you observe, this has contributed to a proliferation of formally invalid DDL1 CIFs.
This looks like a rough area left over from the faster and looser days of early CIF development. I doubt that it was an oversight per se that the various flag combinations were not explicitly enumerated -- that was probably a conscious choice. Rather, I suspect the oversight was in not appreciating that enumerating all 128 (excluding permutations) alternatives was important. Very likely this accounts at least partially for the deprecation of the data name in favor of _atom_site_refinement_flags_posn, _atom_site_refinement_flags_adp, and _atom_site_refinement_flags_occupancy. As you observed, the definitions of these each enumerate all possible combinations of their respective flags (assigning a single flag order for each combination).
There are several things you could do about this, among them:
1) Nothing. Document this data name as one where spurious validation errors may occur.
2) Update your local copy of the dictionary by adding all the missing flag combinations.
3) Pre-process your CIFs to replace _atom_site_refinement_flags with the three data names that jointly supersede it (splitting the values among those data names as appropriate).
With respect to case sensitivity of data values, there is a bit of a gray area. On one hand, CIF clearly says that the data value 'monoclinic' is not the same as the data value 'Monoclinic' *for CIF's purposes*. That does not, however, prevent an application consuming a CIF from treating them equivalently. The question that then arises is on which side of the CIF / application line does validation fall? There are arguments both ways.
If you are writing a validation package, then I suggest you issue a warning (as opposed to an error) for case mismatches such as you describe, or that you provide an alternative lax validation mode that accepts this sort of variance. If your concern is with writing valid CIFs, however, then you would be wise to use only the exact values enumerated in the dictionary.
John C. Bollinger, Ph.D.
Department of Structural Biology
St. Jude Children's Research Hospital
Email Disclaimer: www.stjude.org/emaildisclaimer
_______________________________________________ cif-developers mailing list email@example.com http://scripts.iucr.org/mailman/listinfo/cif-developers
Reply to: [list | sender only]
- Re: enumeration values (Richard Gildea)
- enumeration values (Richard Gildea)