Discussion List Archives

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Self-described CIF proposal

Dear Colleagues,

   We already have several mechanisms for people who need to describe
their own data -- local dictionaries, database schema, XML schema, etc.
There is nothing to stop someone from appending their local dictionaries
to each file.  One would hope that they would go to the trouble to
use a unique namespace to avoid collisions with other local dictionaries,
e.g. by registering their own prefix.

   That being said, nothing is gained and much is lost when the same
definitions are given different names in different datasets.  That makes
it much harder to do data mining and discover common intellectual
threads in the scientific literature and encourages wheel reinvention.
When possible, I would urge everyone to try to use existing definitions
and when they have a truly new definition to try to contribute it
to the appropriate dictionary.


At 12:21 PM -0400 5/28/08, Joe Krahn wrote:
>CIF relies on dictionaries to parse data correctly. The underlying STAR
>format does not have a well-defined system for representing
>general-purpose data, and leaves these details to a higher-level
>My proposal is to define a "self-described CIF" format. I mentioned this
>before, but there was not a lot of interest. I assume that this is
>because most CIF developers are working with standardized databases,
>where dealing with non-standard self-described data is difficult.
>Experimentalists often need to store general-purpose data that cannot
>always be handled by trying to create a dictionary that covers all
>possible needs. In my opinion, STAR should be flexible enough to
>represent data in a manner similar to NetCDF.
>The general syntax can be that a CIF data block can contain save-frames
>that represent data in the same manner as save-frames within a
>dictionary. Dictionary data that is not in a save-frame will have to be
>contained in a special save frame, which could be named "dictionary", or
>some form of 'un-named' tag such as a single underscore.
>As simple example of user-defined data, this could be inserted in a data
>block that includes a mass for each atom, but also uses the dictionary
>for everything else. To avoid conflicts, non-standard values used in the
>context of a standard dictionary could all require a "[user]" prefix.
>     _item_description.description 'Atomic mass for this atom.'
>     _item_type.code float
>     _item_units.code 'unified_atomic_mass'
>     save_
>For dictionary-oriented data, this idea can still be useful for tagging
>a data block with the matching dictionary, for example:
>     _dictionary.title           mmcif_std.dic
>     _dictionary.version         2.0.10
>     save_
>Current mmCIF files contain "_audit_conform" entries, but it seems more
>useful to have a general mechanism rather than identifying the
>dictionary within dictionary-defined fields. Of course, this could also
>be done with some sort of formatted comment on the first or second line
>of the file.
>I think this should be a fairly simple extension to CIF. If CIF
>developers don't want to change CIF, this idea could also be implemented
>as an alternative STAR implementation, or it could be explicitly defined
>as a CIF extension rather than a change to CIF itself.
>Joe Krahn
>cif-developers mailing list

  Herbert J. Bernstein, Professor of Computer Science
    Dowling College, Kramer Science Center, KSC 121
         Idle Hour Blvd, Oakdale, NY, 11769

cif-developers mailing list

Reply to: [list | sender only]
International Union of Crystallography

Scientific Union Member of the International Science Council (admitted 1947). Member of CODATA, the ISC Committee on Data. Partner with UNESCO, the United Nations Educational, Scientific and Cultural Organization in the International Year of Crystallography 2014.

International Science Council Scientific Freedom Policy

The IUCr observes the basic policy of non-discrimination and affirms the right and freedom of scientists to associate in international scientific activity without regard to such factors as ethnic origin, religion, citizenship, language, political stance, gender, sex or age, in accordance with the Statutes of the International Council for Science.