Often it is useful to write out data that does not make sense to
standardize, and is not intended for a database. For example, maybe I
want to calculate and write out a derived value for each atom, like
"distance from Tryptophan 217", or "number of waters within 3.0A". I
suppose one could use a dictionary with a few generic values like
"user1", "user2" for temporary working values.

Alternatively, is it possible to write a dictionary that describes just
the additional entries and inherit the standard dictionary for
everything else?

Joe Krahn

Herbert J. Bernstein wrote:
> Dear Colleagues,
> 
>    We already have several mechanisms for people who need to describe
> their own data -- local dictionaries, database schema, XML schema, etc.
> There is nothing to stop someone from appending their local dictionaries
> to each file.  One would hope that they would go to the trouble to
> use a unique namespace to avoid collisions with other local dictionaries,
> e.g. by registering their own prefix.
> 
>    That being said, nothing is gained and much is lost when the same
> definitions are given different names in different datasets.  That makes
> it much harder to do data mining and discover common intellectual
> threads in the scientific literature and encourages wheel reinvention.
> When possible, I would urge everyone to try to use existing definitions
> and when they have a truly new definition to try to contribute it
> to the appropriate dictionary.
> 
>    Regards,
>      Herbert
> 
> 
> At 12:21 PM -0400 5/28/08, Joe Krahn wrote:
>> CIF relies on dictionaries to parse data correctly. The underlying STAR
>> format does not have a well-defined system for representing
>> general-purpose data, and leaves these details to a higher-level
>> specification.
>>
>> My proposal is to define a "self-described CIF" format. I mentioned this
>> before, but there was not a lot of interest. I assume that this is
>> because most CIF developers are working with standardized databases,
>> where dealing with non-standard self-described data is difficult.
>> Experimentalists often need to store general-purpose data that cannot
>> always be handled by trying to create a dictionary that covers all
>> possible needs. In my opinion, STAR should be flexible enough to
>> represent data in a manner similar to NetCDF.
>>
>> The general syntax can be that a CIF data block can contain save-frames
>> that represent data in the same manner as save-frames within a
>> dictionary. Dictionary data that is not in a save-frame will have to be
>> contained in a special save frame, which could be named "dictionary", or
>> some form of 'un-named' tag such as a single underscore.
>>
>> As simple example of user-defined data, this could be inserted in a data
>> block that includes a mass for each atom, but also uses the dictionary
>> for everything else. To avoid conflicts, non-standard values used in the
>> context of a standard dictionary could all require a "[user]" prefix.
>>
>> data_XXX
>> save__atom_site.[user]mass
>>     _item_description.description 'Atomic mass for this atom.'
>>     _item_type.code float
>>     _item_units.code 'unified_atomic_mass'
>>     save_
>> ...
>>
>>
>> For dictionary-oriented data, this idea can still be useful for tagging
>> a data block with the matching dictionary, for example:
>>
>> data_XXX
>> save_dictionary
>>     _dictionary.title           mmcif_std.dic
>>     _dictionary.version         2.0.10
>>     save_
>> ...
>>
>> Current mmCIF files contain "_audit_conform" entries, but it seems more
>> useful to have a general mechanism rather than identifying the
>> dictionary within dictionary-defined fields. Of course, this could also
>> be done with some sort of formatted comment on the first or second line
>> of the file.
>>
>> I think this should be a fairly simple extension to CIF. If CIF
>> developers don't want to change CIF, this idea could also be implemented
>> as an alternative STAR implementation, or it could be explicitly defined
>> as a CIF extension rather than a change to CIF itself.
>>
>> Joe Krahn
>> _______________________________________________
>> cif-developers mailing list
>> cif-developers@iucr.org
>> http://scripts.iucr.org/mailman/listinfo/cif-developers
> 
> 

_______________________________________________
cif-developers mailing list
cif-developers@iucr.org
http://scripts.iucr.org/mailman/listinfo/cif-developers

Reply to: [list | sender only]