[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Reply to: [list | sender only]
Re: Self-described CIF proposal
- Subject: Re: Self-described CIF proposal
- From: Joe Krahn <krahn@xxxxxxxxxxxxx>
- Date: Thu, 29 May 2008 19:02:12 -0400
- In-Reply-To: <a06240800c46338bc064b@[192.168.2.104]>
- References: <483D8673.7090609@niehs.nih.gov><a06240800c46338bc064b@[192.168.2.104]>
Often it is useful to write out data that does not make sense to standardize, and is not intended for a database. For example, maybe I want to calculate and write out a derived value for each atom, like "distance from Tryptophan 217", or "number of waters within 3.0A". I suppose one could use a dictionary with a few generic values like "user1", "user2" for temporary working values. Alternatively, is it possible to write a dictionary that describes just the additional entries and inherit the standard dictionary for everything else? Joe Krahn Herbert J. Bernstein wrote: > Dear Colleagues, > > We already have several mechanisms for people who need to describe > their own data -- local dictionaries, database schema, XML schema, etc. > There is nothing to stop someone from appending their local dictionaries > to each file. One would hope that they would go to the trouble to > use a unique namespace to avoid collisions with other local dictionaries, > e.g. by registering their own prefix. > > That being said, nothing is gained and much is lost when the same > definitions are given different names in different datasets. That makes > it much harder to do data mining and discover common intellectual > threads in the scientific literature and encourages wheel reinvention. > When possible, I would urge everyone to try to use existing definitions > and when they have a truly new definition to try to contribute it > to the appropriate dictionary. > > Regards, > Herbert > > > At 12:21 PM -0400 5/28/08, Joe Krahn wrote: >> CIF relies on dictionaries to parse data correctly. The underlying STAR >> format does not have a well-defined system for representing >> general-purpose data, and leaves these details to a higher-level >> specification. >> >> My proposal is to define a "self-described CIF" format. I mentioned this >> before, but there was not a lot of interest. I assume that this is >> because most CIF developers are working with standardized databases, >> where dealing with non-standard self-described data is difficult. >> Experimentalists often need to store general-purpose data that cannot >> always be handled by trying to create a dictionary that covers all >> possible needs. In my opinion, STAR should be flexible enough to >> represent data in a manner similar to NetCDF. >> >> The general syntax can be that a CIF data block can contain save-frames >> that represent data in the same manner as save-frames within a >> dictionary. Dictionary data that is not in a save-frame will have to be >> contained in a special save frame, which could be named "dictionary", or >> some form of 'un-named' tag such as a single underscore. >> >> As simple example of user-defined data, this could be inserted in a data >> block that includes a mass for each atom, but also uses the dictionary >> for everything else. To avoid conflicts, non-standard values used in the >> context of a standard dictionary could all require a "[user]" prefix. >> >> data_XXX >> save__atom_site.[user]mass >> _item_description.description 'Atomic mass for this atom.' >> _item_type.code float >> _item_units.code 'unified_atomic_mass' >> save_ >> ... >> >> >> For dictionary-oriented data, this idea can still be useful for tagging >> a data block with the matching dictionary, for example: >> >> data_XXX >> save_dictionary >> _dictionary.title mmcif_std.dic >> _dictionary.version 2.0.10 >> save_ >> ... >> >> Current mmCIF files contain "_audit_conform" entries, but it seems more >> useful to have a general mechanism rather than identifying the >> dictionary within dictionary-defined fields. Of course, this could also >> be done with some sort of formatted comment on the first or second line >> of the file. >> >> I think this should be a fairly simple extension to CIF. If CIF >> developers don't want to change CIF, this idea could also be implemented >> as an alternative STAR implementation, or it could be explicitly defined >> as a CIF extension rather than a change to CIF itself. >> >> Joe Krahn >> _______________________________________________ >> cif-developers mailing list >> cif-developers@iucr.org >> http://scripts.iucr.org/mailman/listinfo/cif-developers > > _______________________________________________ cif-developers mailing list cif-developers@iucr.org http://scripts.iucr.org/mailman/listinfo/cif-developers
Reply to: [list | sender only]
- Follow-Ups:
- Re: Self-described CIF proposal (Herbert J. Bernstein)
- References:
- Self-described CIF proposal (Joe Krahn)
- Re: Self-described CIF proposal (Herbert J. Bernstein)
- Prev by Date: Re: Self-described CIF proposal
- Next by Date: Re: Self-described CIF proposal
- Prev by thread: Re: Self-described CIF proposal
- Next by thread: Re: Self-described CIF proposal
- Index(es):