Discussion List Archives

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Self-described CIF proposal

  • Subject: Self-described CIF proposal
  • From: Joe Krahn <krahn@xxxxxxxxxxxxx>
  • Date: Wed, 28 May 2008 12:21:07 -0400
CIF relies on dictionaries to parse data correctly. The underlying STAR
format does not have a well-defined system for representing
general-purpose data, and leaves these details to a higher-level

My proposal is to define a "self-described CIF" format. I mentioned this
before, but there was not a lot of interest. I assume that this is
because most CIF developers are working with standardized databases,
where dealing with non-standard self-described data is difficult.
Experimentalists often need to store general-purpose data that cannot
always be handled by trying to create a dictionary that covers all
possible needs. In my opinion, STAR should be flexible enough to
represent data in a manner similar to NetCDF.

The general syntax can be that a CIF data block can contain save-frames
that represent data in the same manner as save-frames within a
dictionary. Dictionary data that is not in a save-frame will have to be
contained in a special save frame, which could be named "dictionary", or
some form of 'un-named' tag such as a single underscore.

As simple example of user-defined data, this could be inserted in a data
block that includes a mass for each atom, but also uses the dictionary
for everything else. To avoid conflicts, non-standard values used in the
context of a standard dictionary could all require a "[user]" prefix.

    _item_description.description 'Atomic mass for this atom.'
    _item_type.code float
    _item_units.code 'unified_atomic_mass'

For dictionary-oriented data, this idea can still be useful for tagging
a data block with the matching dictionary, for example:

    _dictionary.title           mmcif_std.dic
    _dictionary.version         2.0.10

Current mmCIF files contain "_audit_conform" entries, but it seems more
useful to have a general mechanism rather than identifying the
dictionary within dictionary-defined fields. Of course, this could also
be done with some sort of formatted comment on the first or second line
of the file.

I think this should be a fairly simple extension to CIF. If CIF
developers don't want to change CIF, this idea could also be implemented
as an alternative STAR implementation, or it could be explicitly defined
as a CIF extension rather than a change to CIF itself.

Joe Krahn
cif-developers mailing list

Reply to: [list | sender only]
International Union of Crystallography

Scientific Union Member of the International Science Council (admitted 1947). Member of CODATA, the ISC Committee on Data. Partner with UNESCO, the United Nations Educational, Scientific and Cultural Organization in the International Year of Crystallography 2014.

International Science Council Scientific Freedom Policy

The IUCr observes the basic policy of non-discrimination and affirms the right and freedom of scientists to associate in international scientific activity without regard to such factors as ethnic origin, religion, citizenship, language, political stance, gender, sex or age, in accordance with the Statutes of the International Council for Science.