[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Reply to: [list | sender only]
Re: Self-described CIF proposal
- Subject: Re: Self-described CIF proposal
- From: "Herbert J. Bernstein" <yaya@xxxxxxxxxxxxxxxxxxxxxxx>
- Date: Wed, 28 May 2008 12:37:00 -0400
- In-Reply-To: <483D8673.7090609@niehs.nih.gov>
- References: <483D8673.7090609@niehs.nih.gov>
Dear Colleagues, We already have several mechanisms for people who need to describe their own data -- local dictionaries, database schema, XML schema, etc. There is nothing to stop someone from appending their local dictionaries to each file. One would hope that they would go to the trouble to use a unique namespace to avoid collisions with other local dictionaries, e.g. by registering their own prefix. That being said, nothing is gained and much is lost when the same definitions are given different names in different datasets. That makes it much harder to do data mining and discover common intellectual threads in the scientific literature and encourages wheel reinvention. When possible, I would urge everyone to try to use existing definitions and when they have a truly new definition to try to contribute it to the appropriate dictionary. Regards, Herbert At 12:21 PM -0400 5/28/08, Joe Krahn wrote: >CIF relies on dictionaries to parse data correctly. The underlying STAR >format does not have a well-defined system for representing >general-purpose data, and leaves these details to a higher-level >specification. > >My proposal is to define a "self-described CIF" format. I mentioned this >before, but there was not a lot of interest. I assume that this is >because most CIF developers are working with standardized databases, >where dealing with non-standard self-described data is difficult. >Experimentalists often need to store general-purpose data that cannot >always be handled by trying to create a dictionary that covers all >possible needs. In my opinion, STAR should be flexible enough to >represent data in a manner similar to NetCDF. > >The general syntax can be that a CIF data block can contain save-frames >that represent data in the same manner as save-frames within a >dictionary. Dictionary data that is not in a save-frame will have to be >contained in a special save frame, which could be named "dictionary", or >some form of 'un-named' tag such as a single underscore. > >As simple example of user-defined data, this could be inserted in a data >block that includes a mass for each atom, but also uses the dictionary >for everything else. To avoid conflicts, non-standard values used in the >context of a standard dictionary could all require a "[user]" prefix. > >data_XXX >save__atom_site.[user]mass > _item_description.description 'Atomic mass for this atom.' > _item_type.code float > _item_units.code 'unified_atomic_mass' > save_ >... > > >For dictionary-oriented data, this idea can still be useful for tagging >a data block with the matching dictionary, for example: > >data_XXX >save_dictionary > _dictionary.title mmcif_std.dic > _dictionary.version 2.0.10 > save_ >... > >Current mmCIF files contain "_audit_conform" entries, but it seems more >useful to have a general mechanism rather than identifying the >dictionary within dictionary-defined fields. Of course, this could also >be done with some sort of formatted comment on the first or second line >of the file. > >I think this should be a fairly simple extension to CIF. If CIF >developers don't want to change CIF, this idea could also be implemented >as an alternative STAR implementation, or it could be explicitly defined >as a CIF extension rather than a change to CIF itself. > >Joe Krahn >_______________________________________________ >cif-developers mailing list >cif-developers@iucr.org >http://scripts.iucr.org/mailman/listinfo/cif-developers -- ===================================================== Herbert J. Bernstein, Professor of Computer Science Dowling College, Kramer Science Center, KSC 121 Idle Hour Blvd, Oakdale, NY, 11769 +1-631-244-3035 yaya@dowling.edu ===================================================== _______________________________________________ cif-developers mailing list cif-developers@iucr.org http://scripts.iucr.org/mailman/listinfo/cif-developers
Reply to: [list | sender only]
- Follow-Ups:
- Re: Self-described CIF proposal (Joe Krahn)
- References:
- Self-described CIF proposal (Joe Krahn)
- Prev by Date: Self-described CIF proposal
- Next by Date: Re: Self-described CIF proposal
- Prev by thread: Self-described CIF proposal
- Next by thread: Re: Self-described CIF proposal
- Index(es):