[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Reply to: [list | sender only]
Re: Self-described CIF proposal
- Subject: Re: Self-described CIF proposal
- From: "Herbert J. Bernstein" <yaya@xxxxxxxxxxxxxxxxxxxxxxx>
- Date: Thu, 29 May 2008 20:30:43 -0400 (EDT)
- In-Reply-To: <[email protected]>
- References: <[email protected]><a06240800c46338bc064b@[192.168.2.104]><[email protected]>
Yes, CIF dictionaries are layered, so you can have a local
dictionary that just add the few terms that you need, but
because they are layered, it is important to avoid namespace
conflicts and use unique prefixes for local dictionaries.
=====================================================
Herbert J. Bernstein, Professor of Computer Science
Dowling College, Kramer Science Center, KSC 121
Idle Hour Blvd, Oakdale, NY, 11769
+1-631-244-3035
[email protected]
=====================================================
On Thu, 29 May 2008, Joe Krahn wrote:
> Often it is useful to write out data that does not make sense to
> standardize, and is not intended for a database. For example, maybe I
> want to calculate and write out a derived value for each atom, like
> "distance from Tryptophan 217", or "number of waters within 3.0A". I
> suppose one could use a dictionary with a few generic values like
> "user1", "user2" for temporary working values.
>
> Alternatively, is it possible to write a dictionary that describes just
> the additional entries and inherit the standard dictionary for
> everything else?
>
> Joe Krahn
>
> Herbert J. Bernstein wrote:
>> Dear Colleagues,
>>
>> We already have several mechanisms for people who need to describe
>> their own data -- local dictionaries, database schema, XML schema, etc.
>> There is nothing to stop someone from appending their local dictionaries
>> to each file. One would hope that they would go to the trouble to
>> use a unique namespace to avoid collisions with other local dictionaries,
>> e.g. by registering their own prefix.
>>
>> That being said, nothing is gained and much is lost when the same
>> definitions are given different names in different datasets. That makes
>> it much harder to do data mining and discover common intellectual
>> threads in the scientific literature and encourages wheel reinvention.
>> When possible, I would urge everyone to try to use existing definitions
>> and when they have a truly new definition to try to contribute it
>> to the appropriate dictionary.
>>
>> Regards,
>> Herbert
>>
>>
>> At 12:21 PM -0400 5/28/08, Joe Krahn wrote:
>>> CIF relies on dictionaries to parse data correctly. The underlying STAR
>>> format does not have a well-defined system for representing
>>> general-purpose data, and leaves these details to a higher-level
>>> specification.
>>>
>>> My proposal is to define a "self-described CIF" format. I mentioned this
>>> before, but there was not a lot of interest. I assume that this is
>>> because most CIF developers are working with standardized databases,
>>> where dealing with non-standard self-described data is difficult.
>>> Experimentalists often need to store general-purpose data that cannot
>>> always be handled by trying to create a dictionary that covers all
>>> possible needs. In my opinion, STAR should be flexible enough to
>>> represent data in a manner similar to NetCDF.
>>>
>>> The general syntax can be that a CIF data block can contain save-frames
>>> that represent data in the same manner as save-frames within a
>>> dictionary. Dictionary data that is not in a save-frame will have to be
>>> contained in a special save frame, which could be named "dictionary", or
>>> some form of 'un-named' tag such as a single underscore.
>>>
>>> As simple example of user-defined data, this could be inserted in a data
>>> block that includes a mass for each atom, but also uses the dictionary
>>> for everything else. To avoid conflicts, non-standard values used in the
>>> context of a standard dictionary could all require a "[user]" prefix.
>>>
>>> data_XXX
>>> save__atom_site.[user]mass
>>> _item_description.description 'Atomic mass for this atom.'
>>> _item_type.code float
>>> _item_units.code 'unified_atomic_mass'
>>> save_
>>> ...
>>>
>>>
>>> For dictionary-oriented data, this idea can still be useful for tagging
>>> a data block with the matching dictionary, for example:
>>>
>>> data_XXX
>>> save_dictionary
>>> _dictionary.title mmcif_std.dic
>>> _dictionary.version 2.0.10
>>> save_
>>> ...
>>>
>>> Current mmCIF files contain "_audit_conform" entries, but it seems more
>>> useful to have a general mechanism rather than identifying the
>>> dictionary within dictionary-defined fields. Of course, this could also
>>> be done with some sort of formatted comment on the first or second line
>>> of the file.
>>>
>>> I think this should be a fairly simple extension to CIF. If CIF
>>> developers don't want to change CIF, this idea could also be implemented
>>> as an alternative STAR implementation, or it could be explicitly defined
>>> as a CIF extension rather than a change to CIF itself.
>>>
>>> Joe Krahn
>>> _______________________________________________
>>> cif-developers mailing list
>>> [email protected]
>>> http://scripts.iucr.org/mailman/listinfo/cif-developers
>>
>>
>
> _______________________________________________
> cif-developers mailing list
> [email protected]
> http://scripts.iucr.org/mailman/listinfo/cif-developers
>
_______________________________________________
cif-developers mailing list
[email protected]
http://scripts.iucr.org/mailman/listinfo/cif-developers
Reply to: [list | sender only]
- Follow-Ups:
- Re: Self-described CIF proposal (Joe Krahn)
- References:
- Self-described CIF proposal (Joe Krahn)
- Re: Self-described CIF proposal (Herbert J. Bernstein)
- Re: Self-described CIF proposal (Joe Krahn)
- Prev by Date: Re: Self-described CIF proposal
- Next by Date: Re: Self-described CIF proposal
- Prev by thread: Re: Self-described CIF proposal
- Next by thread: Re: Self-described CIF proposal
- Index(es):

