Discussion List Archives

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [ddlm-group] DDLm aliases (subject changed). .. .. .

Let me just talk about the category join issue.  The current documentation
is vague about the issue of how one should match up the keys, and John B.'s
interpretation may well be what was intended, but I think for the join
the actually be useful, it has to be extended to cover the normalization
and denormalization cases, in which the choice of keys depends on
the degree of normalization.  This actually gets back to an old
disagreement between CCP4 and the PDB, which could finally be resolved
with a liberal (i.e. denormalization-friendly) interpretation
of category join.

When you normalize a category, you often strip out several columns
that were originally key components in the larger category, and put them
entirely in the child category, so there is less repetition in the
parent category.  If we are to allow the option of using the
dictionary with the normalized categories with fewer key components to be
presented as the original wider, flatter denormalized categories,
then we need to interpret the _category.parent_join in a way
that permits more key components in the denormalized presentation,

    If a tag in the child category has a linked tag in the parent
category (either directly or by having a common directly or indirectly
linked tag), then the tag from the parent category
must be used in the joined presentation.

    If a tag in the child category has no linked tag in the
parent category, then an attempt will be made to construct
a tag using dotted notation combining the parent category
name with the child category object name.  If that composite
name does not conflict with an existing tag, then that
composite name will be used in the joined presentation.  If
there is a conflict, the child category tag name will be
used in the joined presentation.

    If a tag from the child category is a member of the key of
the child category and a joined presentation includes that
item, then it will automatically be added to the key of the
joined presentation.

This interpretation of the join semantics would allow flexible
use of normalized and denormalized presentations of data
without having to clutter parent categories with definitions
from child categories that are not needed, and indeed cannot
be used, in the normalized presentation, allowing greatly simplified
flat tables for such things as data harvest, but clean, normalized
tables for database loads.

I'll send an updated draft reflecting John's other comments in the
next message, but the issue of allowing denormalizing joins is
a separable discussion.


At 9:34 AM -0600 1/27/11, Bollinger, John C wrote:
>On Wednesday, January 26, 2011 9:50 PM, Herbert J. Bernstein wrote:
>>So, to pull it all together, see below.  Please review and see
>>what I have missed, mistyped  or failed to convert from some
>>earlier incarnation.  Comments, corrections and suggestions
>>greatly appreciated.
>>I have not yet included the type change for _dictionary_xref.format
>>because I am not sure a single word code would be sufficient
>>to describe the format of any given dictionary, so for the moment
>>it is still Text.
>Herbert's latest version looks good to me.  See below for comments 
>and tentative corrections:
>>     _definition.id      alias_ensemble
>>     _definition.scope   Category
>>     _category.parent_id  alias
>>     _category.parent_join  Yes
>>     _category_key.primitive  ['_alias_ensemble.ensemble_id',
>>                               '_alias_ensemble.definition_id',
>>                               '_alias_ensemble.xref_code']
>>      save_
>As I understand the use of _category.parent_join, I think its value 
>for this category needs to be 'No', because the parent category has 
>a different (narrower) key structure.
>>      _definition.id   '_alias_ensemble.definition_id'
>>      _definition.class  Attribute
>>      _definition.update 2011-01-21
>>      _description.text
>>      Identifier tag of a definition associated with
>>      an xref code by which to group this tag with
>>      other tags.
>>      A given tag may belong to multiple ensembles
>>      and may be cited against multiple dictionaries.
>>      Note that the tag does not have to be a valid
>>      tag under DDLm tag construction rules, but
>>      it should be a valid tag under the rules of
>>      some DDL.
>I would prefer to describe this a bit differently:
>      Together with _alias_ensemble.xref_code, identifies
>      an alias belonging to an ensemble.  An alias may
>      belong to any number of ensembles, including zero.
>I omit the bit about tag construction rules, as no DDL yet proposed 
>defines any such rules; allowable tags are defined by CIF.  As James 
>earlier observed, DDLm can define any tag allowed by CIF, even if 
>that name is not in the subset addressable by dREL.  Similarly, DDL1 
>and DDL2 can both define any data name allowed by CIF1, which 
>collectively are a subset of those allowed by CIF2.  See also below.
>>      _name.category_id alias_ensemble
>>      _name.object_id   definition_id>     _name.linked_item_id 
>>      _type.purpose     Key
>>      _type.container   Single
>>      _type.contents    Code
>>       save_
>Shouldn't this item's _type.contents be 'Tag' to agree with the 
>linked item's?  Alternatively, if 'Tag' signifies something more 
>specific than "data name allowed by CIF2" then perhaps 
>_alias.definition_id needs to be changed instead.  I presume that 
>these questions are related to the comments about DDL tag 
>construction rules in this item's proposed description.
>>      _definition.id   '_alias_ensemble.ensemble_id'
>>      _definition.class  Attribute
>>      _definition.update 2011-01-26
>>      _description.text
>>      A code identifying an ensemble of related tags.
>>      To help ensure that dictionaries can be merged,
>>      each code should either begin with an IUCr-registered
>>      prefix or if not prefixed, have been approved
>>      by COMCIFS.  The special prefix 'local_' may be
>>      use for purely internal purposes of an organization.
>Is it needful or appropriate to repeat the definition text of the 
>linked item here?  As long as we do adopt the ENSEMBLE category, the 
>importance of _alias_ensemble.ensemble_id is primarily that it 
>associates an alias with one of the ensembles defined elsewhere in 
>the dictionary.  I suggest this alternative description text:
>      Identifies an ensemble to which the alias identified by ( 
>_alias_ensemble.definition_id, _alias_ensemble.xref_code ) belongs.
>John C. Bollinger, Ph.D.
>Department of Structural Biology
>St. Jude Children's Research Hospital
>Email Disclaimer:  www.stjude.org/emaildisclaimer
>ddlm-group mailing list

  Herbert J. Bernstein, Professor of Computer Science
    Dowling College, Kramer Science Center, KSC 121
         Idle Hour Blvd, Oakdale, NY, 11769

ddlm-group mailing list

Reply to: [list | sender only]
International Union of Crystallography

Scientific Union Member of the International Science Council (admitted 1947). Member of CODATA, the ISC Committee on Data. Partner with UNESCO, the United Nations Educational, Scientific and Cultural Organization in the International Year of Crystallography 2014.

International Science Council Scientific Freedom Policy

The IUCr observes the basic policy of non-discrimination and affirms the right and freedom of scientists to associate in international scientific activity without regard to such factors as ethnic origin, religion, citizenship, language, political stance, gender, sex or age, in accordance with the Statutes of the International Council for Science.