[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Reply to: [list | sender only]
Re: CIF Infoset
- To: "Discussion list of the IUCr Committee for the Maintenance of the CIFStandard (COMCIFS)" <email@example.com>
- Subject: Re: CIF Infoset
- From: David Brown <firstname.lastname@example.org>
- Date: Tue, 07 Sep 2004 14:03:55 -0400
- In-Reply-To: <Pine.LNX.email@example.com>
- References: <Pine.LNX.firstname.lastname@example.org>
Here are a few IDB comments on the comments of DDB|
Since each datablock is independent, each would have its own _audit_conform items at least until such time as we develop a better linkage between datablocks.The core dictionary defines three items which can be looped: _audit_conform_dict_name _audit_conform_dict_version _audit_conform_dict_location # Contains the URL where the dictionary can be found As far as I know these have not been widely used - Acta Cryst. should start insisting that these be included in submitted papers. There is no need to give the dictionary version in anything as ephemeral a comment.That sounds like a positive step, but would that go in every data_block or is it a global_ thing?
There are already editor/browsers that read in the dictionaries and use them to valicate a CIF. They do not yet check the _audit_conform items so the dictionaries have to be identified to the program by the user (or the program loads all the dictionaries it can find, willy nilly). However, we are looking to the future, not just trying to keep up with the past.The problem I see is that the effort invested in implementing it for all newly created and submitted CIFs is wasted because it is an incomplete solution and no current software uses it or needs it.
If an _audit_conform loop is present, it should list all the dictionaries that were used in writing the CIF together with their URLs, so an application should be able to download all the dictionaries it needs. If there are data names appearing in the CIF that do not appear in these dictionaries, then the items are undefined and the user can do what seems most appropriate. In an editor written by some of my students, items not located in the dictionary are loaded into a category called 'miscelaneous' where the user can view them and decide whether they are legitimate or the result of a syntactic error.So, to try and resolve the namespace of each name, you would need to (1) check the _audit_conform list of dictionaries in reverse order (2) check against the list of registered prefixes for accidental matches (3) check all versions of all publically accessible dictionaries (4) then give up.
New concepts are continually being developed in crystallography and it is impractical to assign them names until it is clear that the concept has some permanance, otherwise the dictionaries quickly become filled with a legacy of discarded ideas. Thus people are encouraged to develop software that involves ad hoc names that may later be adopted by CIF or discarded. Yes, this does lead to potential problems in the archive, though such items can be defined in a local dictionary which is listed in the _audit_conform loop. In practice this is not likely to be a problem because such items are not usually used in archived CIFs. We wish to retain the flexibility of CIF to develop with the field and not make people think they have to get the permission of the Academy (COMCIFS) before they try out a new idea.If its important enough to create a name for it then isn't it important enough define its purpose somewhere? Ad hoc data names seem to provide nothing useful besides a legitimate excuse for laziness in the specification. Theres no incentive to organize things tidily. Maybe they were important originally when COMCIFS were exploring the field, before dictionaries were introduced, but is it still important to be able to make up arbitrary stuff and stick it in a CIF without definition? Who is doing this and how are they using it? Do they really intend to save it for posterity?
Because there is a legacy of underscore space group names (etc.) it is wise to be able to read them, but they should not be written.I had a hazy recollection that "this is a string" andthis_is_a_stringwere equally valid CIF constructs containing identical information content, used for example in space group names. Would they be formally identicalinan infoset? Does the white space in all strings have to be normalised(isthat the right word?)?We had a discussion of this point while preparing the symmetry_CIF dictionary and came to the decision that these two strings were not equivalent, i.e., underscore is not white space..Bummer. I know one program that needs changes made :-(
The instructions in this URL refer to an item in the 2.2 version of the dictionary that has now been replaced in 2.3 by three separate items that are fully enumerated. Thus this problem is resolved in the latest dictionary version. Tightening up the dictionaries is an ongoing process.But perhaps I could also draw your attention to this: http://journals.iucr.org/services/cif/stdcodes.html#Appdx4.3 as evidence that underscores do seem to be an officially sanctioned form of white space in uchar data types.
-- Dr. I.D.Brown, Professor Emeritus, Department of Physics and Astronomy McMaster University, Hamilton Ontario, Canada
Reply to: [list | sender only]
- Re: CIF Infoset (ddb)