Discussion List Archives

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [ddlm-group] DDLm aliases (subject changed). .. .. .. .

As John W. noted the normalization issue is alive and well and
important.  Having a join that can only handle normalized cases
is a mistake.


=====================================================
  Herbert J. Bernstein, Professor of Computer Science
    Dowling College, Kramer Science Center, KSC 121
         Idle Hour Blvd, Oakdale, NY, 11769

                  +1-631-244-3035
                  yaya@dowling.edu
=====================================================

On Thu, 27 Jan 2011, Bollinger, John C wrote:

> On Thursday, January 27, 2011 10:16 AM, Herbert J. Bernstein wrote:
>
>>> Let me just talk about the category join issue.  The current documentation
>>> is vague about the issue of how one should match up the keys, and John B.'s
>>> interpretation may well be what was intended, but I think for the join
>>> the actually be useful, it has to be extended to cover the normalization
>>> and denormalization cases, in which the choice of keys depends on
>>> the degree of normalization.
>
> I don't think the draft is so vague, in that section 4.3 describes joined categories as being applicable to "categories have equivalent category keys", and it remarks that "the keys of joined categories may be used interchangeably in the instance document."  It may be that it would be useful to expand the scope of the feature as Herbert suggests, but I don't think the draft can be read to already define it that way.
>
> I always thought that that the origin of this feature was the old ATOM_SITE vs. ATOM_SITE_ANISO issue, where the choice of whether to join does not involve normalization.  (Instead, it involves whether null anisotropic displacement parameters are explicitly recorded for atoms refined isotropically, and it relates to the way small-molecule structural results have traditionally been tabulated.)  In fact, the DDLm draft refers specifically to that case.  For that and similar cases, the current definition is already useful.
>
>>>  This actually gets back to an old
>>> disagreement between CCP4 and the PDB, which could finally be resolved
>>> with a liberal (i.e. denormalization-friendly) interpretation
>>> of category join.
>
> Can you summarize this disagreement, please?  Is it still an issue, or has it effectively been settled?
>
>>> When you normalize a category, you often strip out several columns
>>> that were originally key components in the larger category, and put them
>>> entirely in the child category, so there is less repetition in the
>>> parent category.  If we are to allow the option of using the
>>> dictionary with the normalized categories with fewer key components to be
>>> presented as the original wider, flatter denormalized categories,
>>> then we need to interpret the _category.parent_join in a way
>>> that permits more key components in the denormalized presentation,
>
> Agreed.
>
> The fundamental question is whether we do want to allow a denormalized presentation in such cases.  What are the advantages?  I currently see this one:
>
> () if denormalizing joins are allowed then some normalizations can be performed in existing dictionaries that otherwise could not be performed without invalidating existing instance documents.  At least in principle.
>
> I do not, however, see any special advantage inherent generally in multiplying the ways in which future instance documents can be written.
>
> Herbert argues that a denormalized presentation is more convenient for "data harvest", but I'm not clear on what he means by that term as distinguished from "database loads," which he presents as an alternative use.  I'm also not clear whether merely _allowing_ denormalized presentation is sufficient to serve the data harvesting use case.  Once I understand this argument better, I may agree that there is an advantage here.
>
> Are there other advantages?
>
> I see this disadvantage:
>
> () if denormalizing joins are allowed then that introduces a new type of validity error that CIF authors may inadvertently introduce into their files and that CIF validators must test for: duplicate parent-category keys with different parent-category attributes.  That's a reasonably complicated problem because "different" depends in part on the semantics of the non-key items' types.
>
> There might be other disadvantages, but I have not yet identified any.
>
> If we decide we do want to allowed denormalized presentation in such cases, then we can surely come up with suitable semantics.  Herbert presented one possibility, but before we discuss details let's first settle whether we even need to go there.
>
> Regards,
>
> John
>
>> --
> John C. Bollinger, Ph.D.
> Department of Structural Biology
> St. Jude Children's Research Hospital
>
> Email Disclaimer:  www.stjude.org/emaildisclaimer
>
> _______________________________________________
> ddlm-group mailing list
> ddlm-group@iucr.org
> http://scripts.iucr.org/mailman/listinfo/ddlm-group
>
>
>
> [*** Normal Termination ***]
>
_______________________________________________
ddlm-group mailing list
ddlm-group@iucr.org
http://scripts.iucr.org/mailman/listinfo/ddlm-group

Reply to: [list | sender only]
International Union of Crystallography

Scientific Union Member of the International Science Council (admitted 1947). Member of CODATA, the ISC Committee on Data. Partner with UNESCO, the United Nations Educational, Scientific and Cultural Organization in the International Year of Crystallography 2014.

International Science Council Scientific Freedom Policy

The IUCr observes the basic policy of non-discrimination and affirms the right and freedom of scientists to associate in international scientific activity without regard to such factors as ethnic origin, religion, citizenship, language, political stance, gender, sex or age, in accordance with the Statutes of the International Council for Science.