Discussion List Archives

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Absence of _item.category_id or _item_linked.parent_name in somemmcif definitions

Hi all,

If there are no objections from the group, I propose to create a revised version of
the PDBx/mmCIF dictionary that localizes all of the content for each data item
within that items' definition section (save__item.name).   The historical style
of dictionary composition that consolidates information within parent items
has been a source of confusion and aggravation for many mmCIf adopters.

Regards,

John


On 12/21/12 7:35 AM, yayahjb wrote:
> Dear Colleagues,
>
>    This is not a matter of something being missing from the mmCIF dictionary.
> John W. tried explaining this a few years ago.  I'll give it a shot this time,
> but in view of the continuing problems in reading the DDL2 dictionaries,
> I will also suggest a change.  In this case, even though as a technical matter
> it ain't broke, as a practical human engineering matter, something is broke
> so we _should_ fix it.
>
>    In the mmcif dictionary, the definitions of individual tags are _not_
> in general completely contained in a single save frame.  For example,
> as James notes, the category and parent name for  _atom_site_anisotrop.id
> are explicitlygiven in a loops in the _atom_site.id save frame:
>
>        loop_
>      _item.name
>      _item.category_id
>      _item.mandatory_code
>                 '_atom_site.id'                 atom_site            yes
>                 '_atom_site_anisotrop.id'       atom_site_anisotrop  yes
>                 '_geom_angle.atom_site_id_1'    geom_angle           yes
> ...
>
>      loop_
>      _item_linked.child_name
>      _item_linked.parent_name
>                 '_atom_site_anisotrop.id'       '_atom_site.id'
>                 '_geom_angle.atom_site_id_1'    '_atom_site.id'
> ...
>
> The two alternatives to this approach would be ether to move the
> information from the parent save frames to the children, or to duplicate
> the information.  Duplication may help in readability, but it is a maintenance
> headache.  The DDLm solution of only requiring/allowing specification
> of parents, rather than children would be similar to moving this linking
> information from loops in the parents into the children in DDL2.
>
>    My suggestion would be to move to the DDLm approach in the DDL2
> dictionaries, of putting the parent information with the children, rather
> than the children information with the parents in the formal dictionary,
> without duplication to minimize maintenance problems, but that Brian's
> nice formatting program be modified to gather the child information
> so that it is printed with the parent information as well as with each
> child to help people in reading these dictionaries.
>
>    I would suggest doing the same for the DDLm dictionaires:  provide
> print formatting tools to gather child information relating to a parent
> to put with the parent as a useful index.
>
>    Regards,
>      Herbert
>
>
>
> On 12/21/12 1:26 AM, James Hester wrote:
>> I will have a stab at this, although somebody with more experience of the mmCIF development process may wish to comment further.
>>
>> As far as I can tell, programmatically the only way to fix the 'missing parent' problem you identify is indeed to go through the
>> entire dictionary processing 'item_linked.parent_name' and '_item_linked.child_name' loops, which are usually found at the top of
>> the pointer tree (in this case in the _atom_site save frame).  This same save frame also contains a list of category ids for each
>> of the 'id' values.  My approach in PyCIFRW is to repopulate the individual definitions when ingesting the dictionary, to save
>> time later.  The PyCIFRW code and comments for this can be found at
>> https://bitbucket.org/jamesrhester/pycifrw/src/78576030f75bb4f8cb52d84a60e603815ad38afb/pycifrw/CifFile.nw?at=stable
>> starting at line 839, with lines 854-862 describing and discussing your issue.  Note also subsequent lines discussing PDBX.
>>
>> There is a school of thought that the category name is 'implicit' in a DDL2 dataname or save frame name, however IT Vol G states
>> that this is conventional rather than required so I prefer (like you it seems) never to assume this unless given no alternative.
>>
>> An mmCIF/PDB person may wish to comment on the philosophical reasons behind these decisions, which I gather have something to do
>> with taking a relational database view of a CIF file.
>>
>> all the best,
>> James.
>>
>> On Thu, Dec 20, 2012 at 1:32 PM, Richard Gildea <rgildea@gmail.com <mailto:rgildea@gmail.com>> wrote:
>>
>>     Dear All,
>>
>>     Certain definitions in the mmcif dictionary (e.g.
>>     _atom_site_anisotrop.id <http://atom_site_anisotrop.id>) do not
>>     contain the items _item.category_id or _item_linked.parent_name.
>>     Without these data items, how is it possible to identify
>>     programmatically that _atom_site_anisotrop.id
>>     <http://atom_site_anisotrop.id> belongs to the
>>     _atom_site_anisotrop category and that it is a pointer to
>>     _atom_site.id <http://atom_site.id> (without examining every save
>>     frame?
>>
>>     For quick reference here is the definition in question:
>>
>>     save__atom_site_anisotrop.id  <http://save__atom_site_anisotrop.id>
>>          _item_description.description
>>     ;              This data item is a pointer to _atom_site.id  <http://atom_site.id>  in the ATOM_SITE
>>                     category.
>>     ;
>>          _item.name  <http://item.name>                   '_atom_site_anisotrop.id  <http://atom_site_anisotrop.id>'
>>          _item.mandatory_code          yes
>>          _item_aliases.alias_name    '_atom_site_aniso_label'
>>          _item_aliases.dictionary      cif_core.dic
>>          _item_aliases.version         2.0.1
>>           save_
>>
>>     Cheers,
>>
>>     Richard
>>
>>     _______________________________________________
>>     cif-developers mailing list
>>     cif-developers@iucr.org <mailto:cif-developers@iucr.org>
>>     http://mailman.iucr.org/mailman/listinfo/cif-developers
>>
>>
>>
>>
>> --
>> T +61 (02) 9717 9907
>> F +61 (02) 9717 3145
>> M +61 (04) 0249 4148
>>
>>
>> _______________________________________________
>> cif-developers mailing list
>> cif-developers@iucr.org
>> http://mailman.iucr.org/mailman/listinfo/cif-developers
>
> _______________________________________________
> cif-developers mailing list
> cif-developers@iucr.org
> http://mailman.iucr.org/mailman/listinfo/cif-developers

-- 

John Westbrook, Ph.D.
RCSB, Protein Data Bank
Rutgers, The State University of New Jersey
Department of Chemistry and Chemical Biology
174 Frelinghuysen Rd
Piscataway, NJ 08854-8087
e-mail: jwest@rcsb.rutgers.edu
Ph: (848) 445-4290 Fax: (732) 445-4320
_______________________________________________
cif-developers mailing list
cif-developers@iucr.org
http://mailman.iucr.org/mailman/listinfo/cif-developers

Reply to: [list | sender only]
International Union of Crystallography

Scientific Union Member of the International Science Council (admitted 1947). Member of CODATA, the ISC Committee on Data. Partner with UNESCO, the United Nations Educational, Scientific and Cultural Organization in the International Year of Crystallography 2014.

International Science Council Scientific Freedom Policy

The IUCr observes the basic policy of non-discrimination and affirms the right and freedom of scientists to associate in international scientific activity without regard to such factors as ethnic origin, religion, citizenship, language, political stance, gender, sex or age, in accordance with the Statutes of the International Council for Science.