Discussion List Archives

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [ddlm-group] DDLm aliases (subject changed). .. .. .. .

Let me be blunt -- DDLm will be rejected in practice if it does not
support denormalized categories with compound keys well.  At the same time,
there is no reason why we should need to leave CIF presentation behind
when we normalize.   John W.'s DDL2 solves the problem by mapping the
parent-child relationships carefully in both directions.  I believe
what I have suggested in the handling of keys in a parent join would
allow us to adopt the DDLm approach of the parent categories being
ignorant of the structure, or even the existence of their child
categories, while preserving the ability to have denormalized
presentations.

   -- Herbert

At 2:26 PM -0500 1/27/11, John Westbrook wrote:
>Hi John and Herbert,
>
>I do not wish to complicate the discussion but I have a somewhat 
>different perspective on
>the the issue of normalization.   Certainly in the development of 
>the mmCIF dictionary
>anything approaching normalization was so at odds with familiar data 
>organization that
>it was not practical.   As a result mmCIF has a highly denormalized 
>organization in which
>each category mirrors the organization of a typical data file.   To 
>cope with this
>data organization style, parent-child relationships were introduced 
>between common
>identifiers in key and non-key roles.   A further practical 
>complication comes from
>having to track multiple nomenclatures composed of natural keys some 
>of these having
>unusual null-value rules.
>
>To better address this in software we have added DDL2 extensions to 
>define parent/child linking groups -
>
>See -
>
>http://mmcif.pdb.org/dictionaries/mmcif_ddl.dic/Data/history.html
>
>categories - pdbx_item_link_group and pdbx_item_link_group_list
>
>The groups defined in these categories allow validation of common 
>items between categories
>with multiple connecting relationships.   For instance, tables of 
>bonds, angles and torsions
>have multiple independent collections of natural keys times the 
>number of nomenclatures.
>In some cases the validation must make independent comparisons of 
>each group against
>the same group of parents items.
>
>I raise this issue because it is an unavoidable consequence of 
>denormalization.  And,
>as Herbert points out the denormalized organization is important in 
>data harvesting
>and generally maintaining a connection to laboratory practice.
>
>In the original design of DDLm their was an emphasis on adopting 
>simple rather than
>complex category keys.  This has been an issue of some concern for 
>me as this does
>not map well to our data which is rich in complex natural keys.
>
>John
>
>
>
>
>
>On 1/27/11 1:24 PM, Bollinger, John C wrote:
>>
>>  On Thursday, January 27, 2011 10:16 AM, Herbert J. Bernstein wrote:
>>
>>>  Let me just talk about the category join issue.  The current documentation
>>>  is vague about the issue of how one should match up the keys, and John B.'s
>>>  interpretation may well be what was intended, but I think for the join
>>>  the actually be useful, it has to be extended to cover the normalization
>>>  and denormalization cases, in which the choice of keys depends on
>>>  the degree of normalization.
>>
>>  I don't think the draft is so vague, in that section 4.3 describes 
>>joined categories as being applicable to "categories have 
>>equivalent category keys", and it remarks that "the keys of joined 
>>categories may be used interchangeably in the instance document." 
>>It may be that it would be useful to expand the scope of the 
>>feature as Herbert suggests, but I don't think the draft can be 
>>read to already define it that way.
>  >
>>  I always thought that that the origin of this feature was the old 
>>ATOM_SITE vs. ATOM_SITE_ANISO issue, where the choice of whether to 
>>join does not involve normalization.  (Instead, it involves whether 
>>null anisotropic displacement parameters are explicitly recorded 
>>for atoms refined isotropically, and it relates to the way 
>>small-molecule structural results have traditionally been 
>>tabulated.)  In fact, the DDLm draft refers specifically to that 
>>case.  For that and similar cases, the current definition is 
>>already useful.
>  >
>>>    This actually gets back to an old
>>>  disagreement between CCP4 and the PDB, which could finally be resolved
>>>  with a liberal (i.e. denormalization-friendly) interpretation
>>>  of category join.
>>
>>  Can you summarize this disagreement, please?  Is it still an 
>>issue, or has it effectively been settled?
>>
>>>  When you normalize a category, you often strip out several columns
>>>  that were originally key components in the larger category, and put them
>>>  entirely in the child category, so there is less repetition in the
>>>  parent category.  If we are to allow the option of using the
>>>  dictionary with the normalized categories with fewer key components to be
>>>  presented as the original wider, flatter denormalized categories,
>>>  then we need to interpret the _category.parent_join in a way
>>>  that permits more key components in the denormalized presentation,
>>
>>  Agreed.
>>
>>  The fundamental question is whether we do want to allow a 
>>denormalized presentation in such cases.  What are the advantages? 
>>I currently see this one:
>>
>>  () if denormalizing joins are allowed then some normalizations can 
>>be performed in existing dictionaries that otherwise could not be 
>>performed without invalidating existing instance documents.  At 
>>least in principle.
>>
>>  I do not, however, see any special advantage inherent generally in 
>>multiplying the ways in which future instance documents can be 
>>written.
>>
>>  Herbert argues that a denormalized presentation is more convenient 
>>for "data harvest", but I'm not clear on what he means by that term 
>>as distinguished from "database loads," which he presents as an 
>>alternative use.  I'm also not clear whether merely _allowing_ 
>>denormalized presentation is sufficient to serve the data 
>>harvesting use case.  Once I understand this argument better, I may 
>>agree that there is an advantage here.
>>
>>  Are there other advantages?
>>
>>  I see this disadvantage:
>>
>>  () if denormalizing joins are allowed then that introduces a new 
>>type of validity error that CIF authors may inadvertently introduce 
>>into their files and that CIF validators must test for: duplicate 
>>parent-category keys with different parent-category attributes. 
>>That's a reasonably complicated problem because "different" depends 
>>in part on the semantics of the non-key items' types.
>>
>>  There might be other disadvantages, but I have not yet identified any.
>>
>>  If we decide we do want to allowed denormalized presentation in 
>>such cases, then we can surely come up with suitable semantics. 
>>Herbert presented one possibility, but before we discuss details 
>>let's first settle whether we even need to go there.
>>
>>
>>  Regards,
>>
>>  John
>>
>>  --
>>  John C. Bollinger, Ph.D.
>>  Department of Structural Biology
>>  St. Jude Children's Research Hospital
>>
>>
>>  Email Disclaimer:  www.stjude.org/emaildisclaimer
>>
>>  _______________________________________________
>>  ddlm-group mailing list
>>  ddlm-group@iucr.org
>>  http://scripts.iucr.org/mailman/listinfo/ddlm-group
>
>--
>******************************************************************
>    John Westbrook, Ph.D.
>    Rutgers, The State University of New Jersey
>    Department of Chemistry and Chemical Biology
>    610 Taylor Road
>    Piscataway, NJ 08854-8087
>    e-mail: jwest@rcsb.rutgers.edu
>    Ph:  (732) 445-4290  Fax: (732) 445-4320
>******************************************************************
>_______________________________________________
>ddlm-group mailing list
>ddlm-group@iucr.org
>http://scripts.iucr.org/mailman/listinfo/ddlm-group


-- 
=====================================================
  Herbert J. Bernstein, Professor of Computer Science
    Dowling College, Kramer Science Center, KSC 121
         Idle Hour Blvd, Oakdale, NY, 11769

                  +1-631-244-3035
                  yaya@dowling.edu
=====================================================
_______________________________________________
ddlm-group mailing list
ddlm-group@iucr.org
http://scripts.iucr.org/mailman/listinfo/ddlm-group

Reply to: [list | sender only]
International Union of Crystallography

Scientific Union Member of the International Science Council (admitted 1947). Member of CODATA, the ISC Committee on Data. Partner with UNESCO, the United Nations Educational, Scientific and Cultural Organization in the International Year of Crystallography 2014.

International Science Council Scientific Freedom Policy

The IUCr observes the basic policy of non-discrimination and affirms the right and freedom of scientists to associate in international scientific activity without regard to such factors as ethnic origin, religion, citizenship, language, political stance, gender, sex or age, in accordance with the Statutes of the International Council for Science.