[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Reply to: [list | sender only]
Re: Self-described CIF proposal
- Subject: Re: Self-described CIF proposal
- From: "Herbert J. Bernstein" <yaya@xxxxxxxxxxxxxxxxxxxxxxx>
- Date: Thu, 29 May 2008 20:30:43 -0400 (EDT)
- In-Reply-To: <483F35F4.6010808@niehs.nih.gov>
- References: <483D8673.7090609@niehs.nih.gov><a06240800c46338bc064b@[192.168.2.104]><483F35F4.6010808@niehs.nih.gov>
Yes, CIF dictionaries are layered, so you can have a local dictionary that just add the few terms that you need, but because they are layered, it is important to avoid namespace conflicts and use unique prefixes for local dictionaries. ===================================================== Herbert J. Bernstein, Professor of Computer Science Dowling College, Kramer Science Center, KSC 121 Idle Hour Blvd, Oakdale, NY, 11769 +1-631-244-3035 yaya@dowling.edu ===================================================== On Thu, 29 May 2008, Joe Krahn wrote: > Often it is useful to write out data that does not make sense to > standardize, and is not intended for a database. For example, maybe I > want to calculate and write out a derived value for each atom, like > "distance from Tryptophan 217", or "number of waters within 3.0A". I > suppose one could use a dictionary with a few generic values like > "user1", "user2" for temporary working values. > > Alternatively, is it possible to write a dictionary that describes just > the additional entries and inherit the standard dictionary for > everything else? > > Joe Krahn > > Herbert J. Bernstein wrote: >> Dear Colleagues, >> >> We already have several mechanisms for people who need to describe >> their own data -- local dictionaries, database schema, XML schema, etc. >> There is nothing to stop someone from appending their local dictionaries >> to each file. One would hope that they would go to the trouble to >> use a unique namespace to avoid collisions with other local dictionaries, >> e.g. by registering their own prefix. >> >> That being said, nothing is gained and much is lost when the same >> definitions are given different names in different datasets. That makes >> it much harder to do data mining and discover common intellectual >> threads in the scientific literature and encourages wheel reinvention. >> When possible, I would urge everyone to try to use existing definitions >> and when they have a truly new definition to try to contribute it >> to the appropriate dictionary. >> >> Regards, >> Herbert >> >> >> At 12:21 PM -0400 5/28/08, Joe Krahn wrote: >>> CIF relies on dictionaries to parse data correctly. The underlying STAR >>> format does not have a well-defined system for representing >>> general-purpose data, and leaves these details to a higher-level >>> specification. >>> >>> My proposal is to define a "self-described CIF" format. I mentioned this >>> before, but there was not a lot of interest. I assume that this is >>> because most CIF developers are working with standardized databases, >>> where dealing with non-standard self-described data is difficult. >>> Experimentalists often need to store general-purpose data that cannot >>> always be handled by trying to create a dictionary that covers all >>> possible needs. In my opinion, STAR should be flexible enough to >>> represent data in a manner similar to NetCDF. >>> >>> The general syntax can be that a CIF data block can contain save-frames >>> that represent data in the same manner as save-frames within a >>> dictionary. Dictionary data that is not in a save-frame will have to be >>> contained in a special save frame, which could be named "dictionary", or >>> some form of 'un-named' tag such as a single underscore. >>> >>> As simple example of user-defined data, this could be inserted in a data >>> block that includes a mass for each atom, but also uses the dictionary >>> for everything else. To avoid conflicts, non-standard values used in the >>> context of a standard dictionary could all require a "[user]" prefix. >>> >>> data_XXX >>> save__atom_site.[user]mass >>> _item_description.description 'Atomic mass for this atom.' >>> _item_type.code float >>> _item_units.code 'unified_atomic_mass' >>> save_ >>> ... >>> >>> >>> For dictionary-oriented data, this idea can still be useful for tagging >>> a data block with the matching dictionary, for example: >>> >>> data_XXX >>> save_dictionary >>> _dictionary.title mmcif_std.dic >>> _dictionary.version 2.0.10 >>> save_ >>> ... >>> >>> Current mmCIF files contain "_audit_conform" entries, but it seems more >>> useful to have a general mechanism rather than identifying the >>> dictionary within dictionary-defined fields. Of course, this could also >>> be done with some sort of formatted comment on the first or second line >>> of the file. >>> >>> I think this should be a fairly simple extension to CIF. If CIF >>> developers don't want to change CIF, this idea could also be implemented >>> as an alternative STAR implementation, or it could be explicitly defined >>> as a CIF extension rather than a change to CIF itself. >>> >>> Joe Krahn >>> _______________________________________________ >>> cif-developers mailing list >>> cif-developers@iucr.org >>> http://scripts.iucr.org/mailman/listinfo/cif-developers >> >> > > _______________________________________________ > cif-developers mailing list > cif-developers@iucr.org > http://scripts.iucr.org/mailman/listinfo/cif-developers > _______________________________________________ cif-developers mailing list cif-developers@iucr.org http://scripts.iucr.org/mailman/listinfo/cif-developers
Reply to: [list | sender only]
- Follow-Ups:
- Re: Self-described CIF proposal (Joe Krahn)
- References:
- Self-described CIF proposal (Joe Krahn)
- Re: Self-described CIF proposal (Herbert J. Bernstein)
- Re: Self-described CIF proposal (Joe Krahn)
- Prev by Date: Re: Self-described CIF proposal
- Next by Date: Re: Self-described CIF proposal
- Prev by thread: Re: Self-described CIF proposal
- Next by thread: Re: Self-described CIF proposal
- Index(es):