Discussion List Archives

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Self-described CIF proposal

Yes, CIF dictionaries are layered, so you can have a local
dictionary that just add the few terms that you need, but
because they are layered, it is important to avoid namespace
conflicts and use unique prefixes for local dictionaries.

=====================================================
  Herbert J. Bernstein, Professor of Computer Science
    Dowling College, Kramer Science Center, KSC 121
         Idle Hour Blvd, Oakdale, NY, 11769

                  +1-631-244-3035
                  yaya@dowling.edu
=====================================================

On Thu, 29 May 2008, Joe Krahn wrote:

> Often it is useful to write out data that does not make sense to
> standardize, and is not intended for a database. For example, maybe I
> want to calculate and write out a derived value for each atom, like
> "distance from Tryptophan 217", or "number of waters within 3.0A". I
> suppose one could use a dictionary with a few generic values like
> "user1", "user2" for temporary working values.
>
> Alternatively, is it possible to write a dictionary that describes just
> the additional entries and inherit the standard dictionary for
> everything else?
>
> Joe Krahn
>
> Herbert J. Bernstein wrote:
>> Dear Colleagues,
>>
>>    We already have several mechanisms for people who need to describe
>> their own data -- local dictionaries, database schema, XML schema, etc.
>> There is nothing to stop someone from appending their local dictionaries
>> to each file.  One would hope that they would go to the trouble to
>> use a unique namespace to avoid collisions with other local dictionaries,
>> e.g. by registering their own prefix.
>>
>>    That being said, nothing is gained and much is lost when the same
>> definitions are given different names in different datasets.  That makes
>> it much harder to do data mining and discover common intellectual
>> threads in the scientific literature and encourages wheel reinvention.
>> When possible, I would urge everyone to try to use existing definitions
>> and when they have a truly new definition to try to contribute it
>> to the appropriate dictionary.
>>
>>    Regards,
>>      Herbert
>>
>>
>> At 12:21 PM -0400 5/28/08, Joe Krahn wrote:
>>> CIF relies on dictionaries to parse data correctly. The underlying STAR
>>> format does not have a well-defined system for representing
>>> general-purpose data, and leaves these details to a higher-level
>>> specification.
>>>
>>> My proposal is to define a "self-described CIF" format. I mentioned this
>>> before, but there was not a lot of interest. I assume that this is
>>> because most CIF developers are working with standardized databases,
>>> where dealing with non-standard self-described data is difficult.
>>> Experimentalists often need to store general-purpose data that cannot
>>> always be handled by trying to create a dictionary that covers all
>>> possible needs. In my opinion, STAR should be flexible enough to
>>> represent data in a manner similar to NetCDF.
>>>
>>> The general syntax can be that a CIF data block can contain save-frames
>>> that represent data in the same manner as save-frames within a
>>> dictionary. Dictionary data that is not in a save-frame will have to be
>>> contained in a special save frame, which could be named "dictionary", or
>>> some form of 'un-named' tag such as a single underscore.
>>>
>>> As simple example of user-defined data, this could be inserted in a data
>>> block that includes a mass for each atom, but also uses the dictionary
>>> for everything else. To avoid conflicts, non-standard values used in the
>>> context of a standard dictionary could all require a "[user]" prefix.
>>>
>>> data_XXX
>>> save__atom_site.[user]mass
>>>     _item_description.description 'Atomic mass for this atom.'
>>>     _item_type.code float
>>>     _item_units.code 'unified_atomic_mass'
>>>     save_
>>> ...
>>>
>>>
>>> For dictionary-oriented data, this idea can still be useful for tagging
>>> a data block with the matching dictionary, for example:
>>>
>>> data_XXX
>>> save_dictionary
>>>     _dictionary.title           mmcif_std.dic
>>>     _dictionary.version         2.0.10
>>>     save_
>>> ...
>>>
>>> Current mmCIF files contain "_audit_conform" entries, but it seems more
>>> useful to have a general mechanism rather than identifying the
>>> dictionary within dictionary-defined fields. Of course, this could also
>>> be done with some sort of formatted comment on the first or second line
>>> of the file.
>>>
>>> I think this should be a fairly simple extension to CIF. If CIF
>>> developers don't want to change CIF, this idea could also be implemented
>>> as an alternative STAR implementation, or it could be explicitly defined
>>> as a CIF extension rather than a change to CIF itself.
>>>
>>> Joe Krahn
>>> _______________________________________________
>>> cif-developers mailing list
>>> cif-developers@iucr.org
>>> http://scripts.iucr.org/mailman/listinfo/cif-developers
>>
>>
>
> _______________________________________________
> cif-developers mailing list
> cif-developers@iucr.org
> http://scripts.iucr.org/mailman/listinfo/cif-developers
>
_______________________________________________
cif-developers mailing list
cif-developers@iucr.org
http://scripts.iucr.org/mailman/listinfo/cif-developers

Reply to: [list | sender only]
International Union of Crystallography

Scientific Union Member of the International Science Council (admitted 1947). Member of CODATA, the ISC Committee on Data. Partner with UNESCO, the United Nations Educational, Scientific and Cultural Organization in the International Year of Crystallography 2014.

International Science Council Scientific Freedom Policy

The IUCr observes the basic policy of non-discrimination and affirms the right and freedom of scientists to associate in international scientific activity without regard to such factors as ethnic origin, religion, citizenship, language, political stance, gender, sex or age, in accordance with the Statutes of the International Council for Science.