[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Reply to: [list | sender only]
Re: Self-described CIF proposal
- Subject: Re: Self-described CIF proposal
- From: David Brown <idbrown@xxxxxxxxxxx>
- Date: Mon, 09 Jun 2008 11:24:16 -0400
- In-Reply-To: <483D8673.firstname.lastname@example.org>
- References: <483D8673.email@example.com>
Joe Krahn's request for a mechanism for defining local datanames
within the CIF itself may be easier to implement than might be
imagined, but only in the DDLm dictionaries that are now being prepared
and evaluated. DDLm is a new dictionary definition language that
tidies up a lot of the loose ends that have become apparent in the DDL1
and DDL2 dictionaries in current use. DDLm is cleaner and more
flexible, and programs designed to be used with CIF dictionaries
written in DDLm will still be able to read all the legacy CIFs.
Without changing the way we write CIFs we will be able to access the
advanced DDLm features such as methods. |
One characteristic of the DDLm dictionaries is that they are likely to be smaller and more specialized, and will be assembled into a customized virtual dictionary each time a CIF is read. DDLm contains straightforward procedures for importing and merging dictionaries, and these can include private dictionaries as well as the IUCr approved dictionaries. This process is controlled by a head dictionary that contains _import statements listing the different component dictionaries that are to be imported and assembled.
For most CIFs a standard head dictionary would be retrieved from the IUCr web site (or a local directory where a copy is stored), but the head dictionary could be a custom dictionary stored on a local web site, though in this case its URI would have to stable as long as the CIF itself remained archived or the CIF could become unreadable. Alternatively such a customized head dictionary could be included as a text item in the CIF itself.
In order to locate the correct CIF dictionary, each CIF will include either a number of _audit items identifying the head dictionary, its version and location, or it will include an _audit text item whose value is the head dictionary itself. While most CIFs will opt for one of the IUCr templates, specialty CIFs can create their own custom virtual dictionary using the embedded head dictionary. Such a head dictionary could also include one-off definitions if desired, though such items would only apply to the CIF in which the head dictionary is embedded.
Programs that are DDLm compatible will read in a CIF, look for _audit items to locate the head dictionary, load and assemble the virtual dictionary and then interpret the CIF. It will be possible to us use a dictionary written in DDLm to read in legacy CIFs and the software to do this would be able to exploit the new features of DDLm. Existing CIF could of course, still be read by currently available software designed to work with DDL1 and DDL2 dictionaries. It will not, however, be possible to use existing software to read CIFs written with a DDLm dictionary. For this reason DDLm dictionaries will initially be more of a programming language than a language for writing CIFs.
To come back to Joe's point, it is currently possible to include privately defined items into CIFs written with the DDL1 and DDL2 dictionaries, but it is awkward. There is nothing to stop a dictionary definition being included as a text field in a CIF (provided semicolon delimiters are not used) but there is no protocol for extracting this information and including it as part of the dictionary. There are protocols for merging DDL1- and DDL2-based dictionaries but they are external to the DDLs and the dictionaries. On the other hand DDLm expects that dictionaries will be routinely merged and the machinery to do this is built into DDLm.
We are hoping that DDLm will receive COMCIFS approval before the end of the year, along with the first dictionaries. Other dictionaries will follow and DDLm software can be brought into use as the dictionaries are approved.
Those of us who are putting together the first round of DDLm dictionaries and programs would welcome any comments.
Joe Krahn wrote:
CIF relies on dictionaries to parse data correctly. The underlying STAR format does not have a well-defined system for representing general-purpose data, and leaves these details to a higher-level specification. My proposal is to define a "self-described CIF" format. I mentioned this before, but there was not a lot of interest. I assume that this is because most CIF developers are working with standardized databases, where dealing with non-standard self-described data is difficult. Experimentalists often need to store general-purpose data that cannot always be handled by trying to create a dictionary that covers all possible needs. In my opinion, STAR should be flexible enough to represent data in a manner similar to NetCDF. The general syntax can be that a CIF data block can contain save-frames that represent data in the same manner as save-frames within a dictionary. Dictionary data that is not in a save-frame will have to be contained in a special save frame, which could be named "dictionary", or some form of 'un-named' tag such as a single underscore. As simple example of user-defined data, this could be inserted in a data block that includes a mass for each atom, but also uses the dictionary for everything else. To avoid conflicts, non-standard values used in the context of a standard dictionary could all require a "[user]" prefix. data_XXX save__atom_site.[user]mass _item_description.description 'Atomic mass for this atom.' _item_type.code float _item_units.code 'unified_atomic_mass' save_ ... For dictionary-oriented data, this idea can still be useful for tagging a data block with the matching dictionary, for example: data_XXX save_dictionary _dictionary.title mmcif_std.dic _dictionary.version 2.0.10 save_ ... Current mmCIF files contain "_audit_conform" entries, but it seems more useful to have a general mechanism rather than identifying the dictionary within dictionary-defined fields. Of course, this could also be done with some sort of formatted comment on the first or second line of the file. I think this should be a fairly simple extension to CIF. If CIF developers don't want to change CIF, this idea could also be implemented as an alternative STAR implementation, or it could be explicitly defined as a CIF extension rather than a change to CIF itself. Joe Krahn _______________________________________________ cif-developers mailing list firstname.lastname@example.org http://scripts.iucr.org/mailman/listinfo/cif-developers
begin:vcard fn:I.David Brown n:Brown;I.David org:McMaster University;Brockhouse Institute for Materials Research adr:;;King St. W;Hamilton;Ontario;L8S 4M1;Canada email;internet:email@example.com title:Professor Emeritus tel;work:+905 525 9140 x 24710 tel;fax:+905 521 2773 version:2.1 end:vcard
_______________________________________________ cif-developers mailing list firstname.lastname@example.org http://scripts.iucr.org/mailman/listinfo/cif-developers
Reply to: [list | sender only]
- Self-described CIF proposal (Joe Krahn)