Discussion List Archives

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: Absence of _item.category_id or _item_linked.parent_name in somemmcif definitions. .

I think it is an excellent idea to revise the dictionary to consolidate the data for each item in the same frame.

John B.

-----Original Message-----
From: cif-developers-bounces@iucr.org [mailto:cif-developers-bounces@iucr.org] On Behalf Of John Westbrook
Sent: Friday, December 21, 2012 6:44 AM
To: cif-developers@iucr.org
Subject: Re: Absence of _item.category_id or _item_linked.parent_name in som emmcif definitions. .

Hi all,

If there are no objections from the group, I propose to create a revised version of the PDBx/mmCIF dictionary that localizes all of the content for each data item
within that items' definition section (save__item.name).   The historical style
of dictionary composition that consolidates information within parent items has been a source of confusion and aggravation for many mmCIf adopters.



On 12/21/12 7:35 AM, yayahjb wrote:
> Dear Colleagues,
>    This is not a matter of something being missing from the mmCIF dictionary.
> John W. tried explaining this a few years ago.  I'll give it a shot
> this time, but in view of the continuing problems in reading the DDL2
> dictionaries, I will also suggest a change.  In this case, even though
> as a technical matter it ain't broke, as a practical human engineering
> matter, something is broke so we _should_ fix it.
>    In the mmcif dictionary, the definitions of individual tags are
> _not_ in general completely contained in a single save frame.  For
> example, as James notes, the category and parent name for
> _atom_site_anisotrop.id are explicitlygiven in a loops in the _atom_site.id save frame:
>        loop_
>      _item.name
>      _item.category_id
>      _item.mandatory_code
>                 '_atom_site.id'                 atom_site            yes
>                 '_atom_site_anisotrop.id'       atom_site_anisotrop  yes
>                 '_geom_angle.atom_site_id_1'    geom_angle           yes
> ...
>      loop_
>      _item_linked.child_name
>      _item_linked.parent_name
>                 '_atom_site_anisotrop.id'       '_atom_site.id'
>                 '_geom_angle.atom_site_id_1'    '_atom_site.id'
> ...
> The two alternatives to this approach would be ether to move the
> information from the parent save frames to the children, or to
> duplicate the information.  Duplication may help in readability, but
> it is a maintenance headache.  The DDLm solution of only
> requiring/allowing specification of parents, rather than children
> would be similar to moving this linking information from loops in the parents into the children in DDL2.
>    My suggestion would be to move to the DDLm approach in the DDL2
> dictionaries, of putting the parent information with the children,
> rather than the children information with the parents in the formal
> dictionary, without duplication to minimize maintenance problems, but
> that Brian's nice formatting program be modified to gather the child
> information so that it is printed with the parent information as well
> as with each child to help people in reading these dictionaries.
>    I would suggest doing the same for the DDLm dictionaires:  provide
> print formatting tools to gather child information relating to a
> parent to put with the parent as a useful index.
>    Regards,
>      Herbert
> On 12/21/12 1:26 AM, James Hester wrote:
>> I will have a stab at this, although somebody with more experience of the mmCIF development process may wish to comment further.
>> As far as I can tell, programmatically the only way to fix the
>> 'missing parent' problem you identify is indeed to go through the
>> entire dictionary processing 'item_linked.parent_name' and
>> '_item_linked.child_name' loops, which are usually found at the top
>> of the pointer tree (in this case in the _atom_site save frame).
>> This same save frame also contains a list of category ids for each of
>> the 'id' values.  My approach in PyCIFRW is to repopulate the
>> individual definitions when ingesting the dictionary, to save time
>> later.  The PyCIFRW code and comments for this can be found at
>> https://bitbucket.org/jamesrhester/pycifrw/src/78576030f75bb4f8cb52d8
>> 4a60e603815ad38afb/pycifrw/CifFile.nw?at=stable
>> starting at line 839, with lines 854-862 describing and discussing your issue.  Note also subsequent lines discussing PDBX.
>> There is a school of thought that the category name is 'implicit' in
>> a DDL2 dataname or save frame name, however IT Vol G states that this is conventional rather than required so I prefer (like you it seems) never to assume this unless given no alternative.
>> An mmCIF/PDB person may wish to comment on the philosophical reasons
>> behind these decisions, which I gather have something to do with taking a relational database view of a CIF file.
>> all the best,
>> James.
>> On Thu, Dec 20, 2012 at 1:32 PM, Richard Gildea <rgildea@gmail.com <mailto:rgildea@gmail.com>> wrote:
>>     Dear All,
>>     Certain definitions in the mmcif dictionary (e.g.
>>     _atom_site_anisotrop.id <http://atom_site_anisotrop.id>) do not
>>     contain the items _item.category_id or _item_linked.parent_name.
>>     Without these data items, how is it possible to identify
>>     programmatically that _atom_site_anisotrop.id
>>     <http://atom_site_anisotrop.id> belongs to the
>>     _atom_site_anisotrop category and that it is a pointer to
>>     _atom_site.id <http://atom_site.id> (without examining every save
>>     frame?
>>     For quick reference here is the definition in question:
>>     save__atom_site_anisotrop.id  <http://save__atom_site_anisotrop.id>
>>          _item_description.description
>>     ;              This data item is a pointer to _atom_site.id  <http://atom_site.id>  in the ATOM_SITE
>>                     category.
>>     ;
>>          _item.name  <http://item.name>                   '_atom_site_anisotrop.id  <http://atom_site_anisotrop.id>'
>>          _item.mandatory_code          yes
>>          _item_aliases.alias_name    '_atom_site_aniso_label'
>>          _item_aliases.dictionary      cif_core.dic
>>          _item_aliases.version         2.0.1
>>           save_
>>     Cheers,
>>     Richard
>>     _______________________________________________
>>     cif-developers mailing list
>>     cif-developers@iucr.org <mailto:cif-developers@iucr.org>
>>     http://mailman.iucr.org/mailman/listinfo/cif-developers
>> --
>> T +61 (02) 9717 9907
>> F +61 (02) 9717 3145
>> M +61 (04) 0249 4148
>> _______________________________________________
>> cif-developers mailing list
>> cif-developers@iucr.org
>> http://mailman.iucr.org/mailman/listinfo/cif-developers
> _______________________________________________
> cif-developers mailing list
> cif-developers@iucr.org
> http://mailman.iucr.org/mailman/listinfo/cif-developers


John Westbrook, Ph.D.
RCSB, Protein Data Bank
Rutgers, The State University of New Jersey Department of Chemistry and Chemical Biology
174 Frelinghuysen Rd
Piscataway, NJ 08854-8087
e-mail: jwest@rcsb.rutgers.edu
Ph: (848) 445-4290 Fax: (732) 445-4320
cif-developers mailing list

Email Disclaimer:  www.stjude.org/emaildisclaimer
Consultation Disclaimer:  www.stjude.org/consultationdisclaimer

cif-developers mailing list

Reply to: [list | sender only]
International Union of Crystallography

Scientific Union Member of the International Science Council (admitted 1947). Member of CODATA, the ISC Committee on Data. Partner with UNESCO, the United Nations Educational, Scientific and Cultural Organization in the International Year of Crystallography 2014.

International Science Council Scientific Freedom Policy

The IUCr observes the basic policy of non-discrimination and affirms the right and freedom of scientists to associate in international scientific activity without regard to such factors as ethnic origin, religion, citizenship, language, political stance, gender, sex or age, in accordance with the Statutes of the International Council for Science.