Discussion List Archives

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: CIF Infoset

Title:
Here are a few IDB comments on the comments of DDB

The core dictionary defines three items which can be looped:
    _audit_conform_dict_name
    _audit_conform_dict_version
    _audit_conform_dict_location        # Contains the URL where the
dictionary can be found
As far as I know these have not been widely used - Acta Cryst. should
start insisting that these be included in submitted papers.  There is no
need to give the dictionary version in anything as ephemeral a comment.
    

That sounds like a positive step, but would that go in every data_block or 
is it a global_ thing?
Since each datablock is independent, each would have its own _audit_conform items at least until such time as we develop a better linkage between datablocks.
The problem I see is that the effort invested in implementing it for all 
newly created and submitted CIFs is wasted because it is an 
incomplete solution and no current software uses it or needs it.
There are already editor/browsers that read in the dictionaries and use them to valicate a CIF.  They do not yet check the _audit_conform items so the dictionaries have to be identified to the program by the user (or the program loads all the dictionaries it can find, willy nilly).  However, we are looking to the future, not just trying to keep up with the past.
So, to try and resolve the namespace of each name, you would need to
(1) check the _audit_conform list of dictionaries in reverse order
(2) check against the list of registered prefixes for accidental matches
(3) check all versions of all publically accessible dictionaries
(4) then give up.
If an _audit_conform loop is present, it should list all the dictionaries that were used in  writing the CIF together with their URLs, so an application should be able to download all the dictionaries it needs.  If there are data names appearing in the CIF that do not appear in these dictionaries, then the items are undefined and the user can do what seems most appropriate.  In an editor written by some of my students, items not located in the dictionary are loaded into a category called 'miscelaneous' where the user can view them and decide whether they are legitimate or the result of a syntactic error.
If its important enough to create a name for it then isn't it important 
enough 
define its purpose somewhere? Ad hoc data names seem to provide
nothing useful besides a legitimate excuse for laziness in the 
specification. Theres no incentive to organize things tidily.
Maybe they were important originally when COMCIFS were exploring 
the field, before dictionaries were introduced, but is it still important 
to be able to make up arbitrary stuff and stick it in a CIF without 
definition?
Who is doing this and how are they using it?
Do they really intend to save it for posterity?
New concepts are continually being developed in crystallography and it is impractical to assign them names until it is clear that the concept has some permanance, otherwise the dictionaries quickly become filled with a legacy of discarded ideas.  Thus people are encouraged to develop software that involves ad hoc names that may later be adopted by CIF or discarded.  Yes, this does lead to potential problems in the archive, though such items can be defined in a local dictionary which is listed in the _audit_conform loop.  In practice this is not likely to be a problem because such items are not usually used in archived CIFs.  We wish to retain the flexibility of CIF to develop with the field and not make people think they have to get the permission of the Academy (COMCIFS) before they try out a new idea.
I had a hazy recollection that  "this is a string" and   
      
this_is_a_string
  
were equally valid CIF constructs containing identical information
content,
used for example in space group names. Would they be formally identical 
      
in
  
an infoset? Does the white space in all strings have to be normalised 
      
(is
  
that the right word?)?
      
We had a discussion of this point while preparing the symmetry_CIF
dictionary and came to the decision that these two strings were not
equivalent, i.e., underscore is not white space.. 
    
Bummer. I know one program that needs changes made :-(
Because there is a legacy of underscore space group names (etc.) it is wise to be able to read them, but they should not be written.
But perhaps I could also draw your attention to this:
      http://journals.iucr.org/services/cif/stdcodes.html#Appdx4.3
as evidence that underscores do seem to be an
officially sanctioned form of white space in uchar data types.
The instructions in this URL refer to an item in the 2.2 version of the dictionary that has now been replaced in 2.3 by three separate items that are fully enumerated.  Thus this problem is resolved in the latest dictionary version.  Tightening up the dictionaries is an ongoing process.


David
-- 
Dr. I.D.Brown, Professor Emeritus,
Department of Physics and Astronomy
McMaster University, Hamilton
Ontario, Canada

Reply to: [list | sender only]