[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Reply to: [list | sender only]
Re: [ddlm-group] DDLm aliases (subject changed). .. .. .. .
- To: Group finalising DDLm and associated dictionaries <ddlm-group@iucr.org>
- Subject: Re: [ddlm-group] DDLm aliases (subject changed). .. .. .. .
- From: John Westbrook <jwest@rcsb.rutgers.edu>
- Date: Thu, 27 Jan 2011 14:26:31 -0500
- In-Reply-To: <8F77913624F7524AACD2A92EAF3BFA54166D7D1EEB@SJMEMXMBS11.stjude.sjcrh.local>
- References: <AANLkTi=ATdNovWFiecEwDrbtMdTwZ7guvYuBCGrdnb-i@mail.gmail.com> <8F77913624F7524AACD2A92EAF3BFA54166D7D1EDE@SJMEMXMBS11.stjude.sjcrh.local > <4D404DAA.8070804@mcmaster.ca> <a06240802c96600c48956@[192.168.2.102]> <8F77913624F7524AACD2A92EAF3BFA54166D7D1EE1@SJMEMXMBS11.stjude.sjcrh.local > <a06240800c9668e1faa7c@[192.168.2.102]> <8F77913624F7524AACD2A92EAF3BFA54166D7D1EE8@SJMEMXMBS11.stjude.sjcrh.local > <a06240802c9674292646e@[192.168.2.102]><8F77913624F7524AACD2A92EAF3BFA54166D7D1EEB@SJMEMXMBS11.stjude.sjcrh.local>
Hi John and Herbert, I do not wish to complicate the discussion but I have a somewhat different perspective on the the issue of normalization. Certainly in the development of the mmCIF dictionary anything approaching normalization was so at odds with familiar data organization that it was not practical. As a result mmCIF has a highly denormalized organization in which each category mirrors the organization of a typical data file. To cope with this data organization style, parent-child relationships were introduced between common identifiers in key and non-key roles. A further practical complication comes from having to track multiple nomenclatures composed of natural keys some of these having unusual null-value rules. To better address this in software we have added DDL2 extensions to define parent/child linking groups - See - http://mmcif.pdb.org/dictionaries/mmcif_ddl.dic/Data/history.html categories - pdbx_item_link_group and pdbx_item_link_group_list The groups defined in these categories allow validation of common items between categories with multiple connecting relationships. For instance, tables of bonds, angles and torsions have multiple independent collections of natural keys times the number of nomenclatures. In some cases the validation must make independent comparisons of each group against the same group of parents items. I raise this issue because it is an unavoidable consequence of denormalization. And, as Herbert points out the denormalized organization is important in data harvesting and generally maintaining a connection to laboratory practice. In the original design of DDLm their was an emphasis on adopting simple rather than complex category keys. This has been an issue of some concern for me as this does not map well to our data which is rich in complex natural keys. John On 1/27/11 1:24 PM, Bollinger, John C wrote: > > On Thursday, January 27, 2011 10:16 AM, Herbert J. Bernstein wrote: > >> Let me just talk about the category join issue. The current documentation >> is vague about the issue of how one should match up the keys, and John B.'s >> interpretation may well be what was intended, but I think for the join >> the actually be useful, it has to be extended to cover the normalization >> and denormalization cases, in which the choice of keys depends on >> the degree of normalization. > > I don't think the draft is so vague, in that section 4.3 describes joined categories as being applicable to "categories have equivalent category keys", and it remarks that "the keys of joined categories may be used interchangeably in the instance document." It may be that it would be useful to expand the scope of the feature as Herbert suggests, but I don't think the draft can be read to already define it that way. > > I always thought that that the origin of this feature was the old ATOM_SITE vs. ATOM_SITE_ANISO issue, where the choice of whether to join does not involve normalization. (Instead, it involves whether null anisotropic displacement parameters are explicitly recorded for atoms refined isotropically, and it relates to the way small-molecule structural results have traditionally been tabulated.) In fact, the DDLm draft refers specifically to that case. For that and similar cases, the current definition is already useful. > >> This actually gets back to an old >> disagreement between CCP4 and the PDB, which could finally be resolved >> with a liberal (i.e. denormalization-friendly) interpretation >> of category join. > > Can you summarize this disagreement, please? Is it still an issue, or has it effectively been settled? > >> When you normalize a category, you often strip out several columns >> that were originally key components in the larger category, and put them >> entirely in the child category, so there is less repetition in the >> parent category. If we are to allow the option of using the >> dictionary with the normalized categories with fewer key components to be >> presented as the original wider, flatter denormalized categories, >> then we need to interpret the _category.parent_join in a way >> that permits more key components in the denormalized presentation, > > Agreed. > > The fundamental question is whether we do want to allow a denormalized presentation in such cases. What are the advantages? I currently see this one: > > () if denormalizing joins are allowed then some normalizations can be performed in existing dictionaries that otherwise could not be performed without invalidating existing instance documents. At least in principle. > > I do not, however, see any special advantage inherent generally in multiplying the ways in which future instance documents can be written. > > Herbert argues that a denormalized presentation is more convenient for "data harvest", but I'm not clear on what he means by that term as distinguished from "database loads," which he presents as an alternative use. I'm also not clear whether merely _allowing_ denormalized presentation is sufficient to serve the data harvesting use case. Once I understand this argument better, I may agree that there is an advantage here. > > Are there other advantages? > > I see this disadvantage: > > () if denormalizing joins are allowed then that introduces a new type of validity error that CIF authors may inadvertently introduce into their files and that CIF validators must test for: duplicate parent-category keys with different parent-category attributes. That's a reasonably complicated problem because "different" depends in part on the semantics of the non-key items' types. > > There might be other disadvantages, but I have not yet identified any. > > If we decide we do want to allowed denormalized presentation in such cases, then we can surely come up with suitable semantics. Herbert presented one possibility, but before we discuss details let's first settle whether we even need to go there. > > > Regards, > > John > > -- > John C. Bollinger, Ph.D. > Department of Structural Biology > St. Jude Children's Research Hospital > > > Email Disclaimer: www.stjude.org/emaildisclaimer > > _______________________________________________ > ddlm-group mailing list > ddlm-group@iucr.org > http://scripts.iucr.org/mailman/listinfo/ddlm-group -- ****************************************************************** John Westbrook, Ph.D. Rutgers, The State University of New Jersey Department of Chemistry and Chemical Biology 610 Taylor Road Piscataway, NJ 08854-8087 e-mail: jwest@rcsb.rutgers.edu Ph: (732) 445-4290 Fax: (732) 445-4320 ****************************************************************** _______________________________________________ ddlm-group mailing list ddlm-group@iucr.org http://scripts.iucr.org/mailman/listinfo/ddlm-group
Reply to: [list | sender only]
- Follow-Ups:
- Re: [ddlm-group] DDLm aliases (subject changed). .. .. .. .. . (Bollinger, John C)
- Re: [ddlm-group] DDLm aliases (subject changed). .. .. .. . (Herbert J. Bernstein)
- References:
- Re: [ddlm-group] DDLm aliases (subject changed) (James Hester)
- Re: [ddlm-group] DDLm aliases (subject changed). . (David Brown)
- Re: [ddlm-group] DDLm aliases (subject changed). . (Herbert J. Bernstein)
- Re: [ddlm-group] DDLm aliases (subject changed). .. . (Herbert J. Bernstein)
- Re: [ddlm-group] DDLm aliases (subject changed). .. .. . (Herbert J. Bernstein)
- Re: [ddlm-group] DDLm aliases (subject changed). .. .. .. . (Bollinger, John C)
- Prev by Date: Re: [ddlm-group] DDLm aliases (subject changed). .. .. .
- Next by Date: Re: [ddlm-group] DDLm aliases (subject changed). .. .
- Prev by thread: Re: [ddlm-group] DDLm aliases (subject changed). .. .. .. .
- Next by thread: Re: [ddlm-group] DDLm aliases (subject changed). .. .. .. .
- Index(es):