[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Reply to: [list | sender only]
Re: Absence of _item.category_id or _item_linked.parent_name in somemmcif definitions
- Subject: Re: Absence of _item.category_id or _item_linked.parent_name in somemmcif definitions
- From: John Westbrook <jwest@xxxxxxxxxxxxxxxx>
- Date: Fri, 21 Dec 2012 07:43:45 -0500
- In-Reply-To: <50D4579E.6010000@gmail.com>
- References: <CACCG97GtT4pdy+2NwmDrwPqCcLzhQMMqQaQuD2-uLggKDi-P-A@mail.gmail.com><CAM+dB2dv9=xQPBok-hKEMFNVck+_0L6PK2xmc6qzJuy-uyjWYQ@mail.gmail.com><50D4579E.6010000@gmail.com>
Hi all, If there are no objections from the group, I propose to create a revised version of the PDBx/mmCIF dictionary that localizes all of the content for each data item within that items' definition section (save__item.name). The historical style of dictionary composition that consolidates information within parent items has been a source of confusion and aggravation for many mmCIf adopters. Regards, John On 12/21/12 7:35 AM, yayahjb wrote: > Dear Colleagues, > > This is not a matter of something being missing from the mmCIF dictionary. > John W. tried explaining this a few years ago. I'll give it a shot this time, > but in view of the continuing problems in reading the DDL2 dictionaries, > I will also suggest a change. In this case, even though as a technical matter > it ain't broke, as a practical human engineering matter, something is broke > so we _should_ fix it. > > In the mmcif dictionary, the definitions of individual tags are _not_ > in general completely contained in a single save frame. For example, > as James notes, the category and parent name for _atom_site_anisotrop.id > are explicitlygiven in a loops in the _atom_site.id save frame: > > loop_ > _item.name > _item.category_id > _item.mandatory_code > '_atom_site.id' atom_site yes > '_atom_site_anisotrop.id' atom_site_anisotrop yes > '_geom_angle.atom_site_id_1' geom_angle yes > ... > > loop_ > _item_linked.child_name > _item_linked.parent_name > '_atom_site_anisotrop.id' '_atom_site.id' > '_geom_angle.atom_site_id_1' '_atom_site.id' > ... > > The two alternatives to this approach would be ether to move the > information from the parent save frames to the children, or to duplicate > the information. Duplication may help in readability, but it is a maintenance > headache. The DDLm solution of only requiring/allowing specification > of parents, rather than children would be similar to moving this linking > information from loops in the parents into the children in DDL2. > > My suggestion would be to move to the DDLm approach in the DDL2 > dictionaries, of putting the parent information with the children, rather > than the children information with the parents in the formal dictionary, > without duplication to minimize maintenance problems, but that Brian's > nice formatting program be modified to gather the child information > so that it is printed with the parent information as well as with each > child to help people in reading these dictionaries. > > I would suggest doing the same for the DDLm dictionaires: provide > print formatting tools to gather child information relating to a parent > to put with the parent as a useful index. > > Regards, > Herbert > > > > On 12/21/12 1:26 AM, James Hester wrote: >> I will have a stab at this, although somebody with more experience of the mmCIF development process may wish to comment further. >> >> As far as I can tell, programmatically the only way to fix the 'missing parent' problem you identify is indeed to go through the >> entire dictionary processing 'item_linked.parent_name' and '_item_linked.child_name' loops, which are usually found at the top of >> the pointer tree (in this case in the _atom_site save frame). This same save frame also contains a list of category ids for each >> of the 'id' values. My approach in PyCIFRW is to repopulate the individual definitions when ingesting the dictionary, to save >> time later. The PyCIFRW code and comments for this can be found at >> https://bitbucket.org/jamesrhester/pycifrw/src/78576030f75bb4f8cb52d84a60e603815ad38afb/pycifrw/CifFile.nw?at=stable >> starting at line 839, with lines 854-862 describing and discussing your issue. Note also subsequent lines discussing PDBX. >> >> There is a school of thought that the category name is 'implicit' in a DDL2 dataname or save frame name, however IT Vol G states >> that this is conventional rather than required so I prefer (like you it seems) never to assume this unless given no alternative. >> >> An mmCIF/PDB person may wish to comment on the philosophical reasons behind these decisions, which I gather have something to do >> with taking a relational database view of a CIF file. >> >> all the best, >> James. >> >> On Thu, Dec 20, 2012 at 1:32 PM, Richard Gildea <rgildea@gmail.com <mailto:rgildea@gmail.com>> wrote: >> >> Dear All, >> >> Certain definitions in the mmcif dictionary (e.g. >> _atom_site_anisotrop.id <http://atom_site_anisotrop.id>) do not >> contain the items _item.category_id or _item_linked.parent_name. >> Without these data items, how is it possible to identify >> programmatically that _atom_site_anisotrop.id >> <http://atom_site_anisotrop.id> belongs to the >> _atom_site_anisotrop category and that it is a pointer to >> _atom_site.id <http://atom_site.id> (without examining every save >> frame? >> >> For quick reference here is the definition in question: >> >> save__atom_site_anisotrop.id <http://save__atom_site_anisotrop.id> >> _item_description.description >> ; This data item is a pointer to _atom_site.id <http://atom_site.id> in the ATOM_SITE >> category. >> ; >> _item.name <http://item.name> '_atom_site_anisotrop.id <http://atom_site_anisotrop.id>' >> _item.mandatory_code yes >> _item_aliases.alias_name '_atom_site_aniso_label' >> _item_aliases.dictionary cif_core.dic >> _item_aliases.version 2.0.1 >> save_ >> >> Cheers, >> >> Richard >> >> _______________________________________________ >> cif-developers mailing list >> cif-developers@iucr.org <mailto:cif-developers@iucr.org> >> http://mailman.iucr.org/mailman/listinfo/cif-developers >> >> >> >> >> -- >> T +61 (02) 9717 9907 >> F +61 (02) 9717 3145 >> M +61 (04) 0249 4148 >> >> >> _______________________________________________ >> cif-developers mailing list >> cif-developers@iucr.org >> http://mailman.iucr.org/mailman/listinfo/cif-developers > > _______________________________________________ > cif-developers mailing list > cif-developers@iucr.org > http://mailman.iucr.org/mailman/listinfo/cif-developers -- John Westbrook, Ph.D. RCSB, Protein Data Bank Rutgers, The State University of New Jersey Department of Chemistry and Chemical Biology 174 Frelinghuysen Rd Piscataway, NJ 08854-8087 e-mail: jwest@rcsb.rutgers.edu Ph: (848) 445-4290 Fax: (732) 445-4320 _______________________________________________ cif-developers mailing list cif-developers@iucr.org http://mailman.iucr.org/mailman/listinfo/cif-developers
Reply to: [list | sender only]
- Follow-Ups:
- References:
- Prev by Date: Re: Absence of _item.category_id or _item_linked.parent_name in somemmcif definitions
- Next by Date: RE: Absence of _item.category_id or _item_linked.parent_name in somemmcif definitions. .
- Prev by thread: Re: Absence of _item.category_id or _item_linked.parent_name in somemmcif definitions
- Next by thread: RE: Absence of _item.category_id or _item_linked.parent_name in somemmcif definitions. .
- Index(es):