Discussion List Archives

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: enumeration values

Dear Richard,

 

Formal validation against DDL1 dictionaries has always been a dodgy proposition.  The dictionary language was designed and the initial core dictionary was written before the requirements for or practice of CIF validation were well developed.  For another, less tractable example, see the discussion around http://www.iucr.org/__data/iucr/lists/cif-developers/msg00182.html.  Considering the history, it's a credit to the CIF, DDL, and validator designers that validation against DDL1 dictionaries works as well as it does.

 

As for _atom_site_refinement_flags specifically, the definition text is at odds with the actual formal definition.  As far as I can tell, PyCifRW is validating correctly according to the formal definition, exactly as I would expect a dictionary-driven validator to do.  It is debatable whether this should be considered an error, and if so, what the nature of the error would be, but I'm inclined to account it an error in the dictionary.  As you observe, this has contributed to a proliferation of formally invalid DDL1 CIFs.

 

This looks like a rough area left over from the faster and looser days of early CIF development.  I doubt that it was an oversight per se that the various flag combinations were not explicitly enumerated -- that was probably a conscious choice.  Rather, I suspect the oversight was in not appreciating that enumerating all 128 (excluding permutations) alternatives was important.  Very likely this accounts at least partially for the deprecation of the data name in favor of _atom_site_refinement_flags_posn, _atom_site_refinement_flags_adp, and _atom_site_refinement_flags_occupancy.  As you observed, the definitions of these each enumerate all possible combinations of their respective flags (assigning a single flag order for each combination).

 

There are several things you could do about this, among them:

 

1) Nothing.  Document this data name as one where spurious validation errors may occur.

 

2) Update your local copy of the dictionary by adding all the missing flag combinations.

 

3) Pre-process your CIFs to replace _atom_site_refinement_flags with the three data names that jointly supersede it (splitting the values among those data names as appropriate).

 

 

With respect to case sensitivity of data values, there is a bit of a gray area.  On one hand, CIF clearly says that the data value 'monoclinic' is not the same as the data value 'Monoclinic' *for CIF's purposes*.  That does not, however, prevent an application consuming a CIF from treating them equivalently.  The question that then arises is on which side of the CIF / application line does validation fall?  There are arguments both ways.

 

If you are writing a validation package, then I suggest you issue a warning (as opposed to an error) for case mismatches such as you describe, or that you provide an alternative lax validation mode that accepts this sort of variance.  If your concern is with writing valid CIFs, however, then you would be wise to use only the exact values enumerated in the dictionary.

 

 

Best Regards,

John

--

John C. Bollinger, Ph.D.

Department of Structural Biology

St. Jude Children's Research Hospital

 

 

 

From: cif-developers-bounces@iucr.org [mailto:cif-developers-bounces@iucr.org] On Behalf Of Richard Gildea
Sent: Friday, June 04, 2010 1:43 PM
To: cif-developers@iucr.org
Subject: enumeration values. .

 

Dear All,

 

The enumeration entry in cif_core.dic for _atom_site_refinement_flags is as follows:

 

loop_ _enumeration

_enumeration_detail . 'no refinement constraints'

                               S 'special-position constraint on site'
                               G 'rigid-group refinement of site'
                               R 'riding-atom site attached to non-riding atom'
                               D 'distance or angle restraint on site'
                               T 'thermal displacement constraints'
                               U 'Uiso or Uij restraint (rigid bond)'
                               P 'partial occupancy constraint'

 

No combinations are specified explicitly as being allowed, which is in contrast to the entry for _atom_site_refinement_flags_posn:

 

loop_ _enumeration

_enumeration_detail

                     . 'no constraints on  positional coordinates'
                     D 'distance or angle restraint on positional coordinates'
                     G 'rigid-group refinement of positional coordinates'
                     R 'riding-atom site attached to non-riding atom'
                     S 'special-position constraint on positional coordinates'
                     DG   'combination of the above constraints'
                     DR   'combination of the above constraints'
                     DS   'combination of the above constraints'
                     GR   'combination of the above constraints'
                     GS   'combination of the above constraints'
                     RS    'combination of the above constraints'
                     DGR  'combination of the above constraints'
                     DGS  'combination of the above constraints'
                     DRS  'combination of the above constraints'
                     GRS  'combination of the above constraints'
                     DGRS 'combination of the above constraints'

 

The text of the definition for _atom_site_refinement_flags refers to a "concatenated series of single-letter codes", which suggests that combinations of the code are allowed.  This is in contrast to the text at the html version of the dictionary (http://www.iucr.org/__data/iucr/cifdic_html/1/cif_core.dic/Iatom_site_refinement_flags.html) which states (a little ambiguously) "The data value must be ONE of the following:" (my emphasis), before the list of single letter codes.

 

Is it an oversight that the allowed combinations are missing from the enumeration list for _atom_site_refinement_flags? As it is, a program validating against the cif_core.dic would (I suspect incorrectly) flag a value of, for example, 'PR' as an invalid value for the item _atom_site_refinement_flags.  PyCifRW is one such program that flags these concatenated values as being outside the permitted set. Without the allowed combinations being also included in the list, it makes it hard to programmatically validate such combinations.  Alternatively, if combinations are not allowed, then this would render many files as output by SHELXL to be invalid (for example when a hydrogen is riding on a partially occupied atom).

 

I have a further question about case sensitivity of data values.  The cif specification states that the case of data values must be respected - does this mean therefore that, for example, a value of 'Monoclinic' for _symmetry_cell_setting (a commonly encountered case, judging by our local database of CIFs) should be flagged as being outside of the permitted set, or should it be allowed as valid?

 

Thanks,

 

Richard

 


Email Disclaimer: www.stjude.org/emaildisclaimer
_______________________________________________
cif-developers mailing list
cif-developers@iucr.org
http://scripts.iucr.org/mailman/listinfo/cif-developers

Reply to: [list | sender only]
International Union of Crystallography

Scientific Union Member of the International Science Council (admitted 1947). Member of CODATA, the ISC Committee on Data. Partner with UNESCO, the United Nations Educational, Scientific and Cultural Organization in the International Year of Crystallography 2014.

International Science Council Scientific Freedom Policy

The IUCr observes the basic policy of non-discrimination and affirms the right and freedom of scientists to associate in international scientific activity without regard to such factors as ethnic origin, religion, citizenship, language, political stance, gender, sex or age, in accordance with the Statutes of the International Council for Science.