Discussion List Archives

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [ddlm-group] DDLm aliases (subject changed). .. .. .. .. .. .... .. .

On Tuesday, February 01, 2011 7:34 PM, Herbert J. Bernstein wrote:

>   I have no objection to fine distinctions among particular aliases.
>The question is whether that belongs in a top level alias category,
>in a normalized subcategory, or both.

To the extent that the question is answered by the choice of category keys, it is fundamentally a data modeling issue, and only incidentally a presentation issue.  On the other hand, if the range of allowed presentation styles Herbert would like to have is provided strictly by a suitable choice of category joining rules, then at instantiation time, it is a presentation issue alone.  Even in that case, however, there is a non-trivial technical question of whether to adopt joining rules that would allow such a range in the first place.

>   I freely admit that I don't understand John B.'s point in
>the first paragraph you quote.

My apologies.  I will attempt to clarify, but first, please understand that this discussion ignores the possibility of presentations relying on denormalizing category joins.  I'm talking about the definitions' meaning, not their presentation.  With that said:

Current COMCIFS policy is to avoid defining the same data name with different semantics in different dictionaries.  By extension, the community is encouraged to follow the same practice.  But what if COMCIFS later discovers a reason to change its policy?  Or what if someone in another discipline becomes interested in adopting STAR and DDLm, but cannot rely on the same policy?  Or what if STAR and DDLm are adopted in another community with a central authority comparable to COMCIFS, but which does not care to avoid creating their own, conflicting, definitions of data names already defined in crystallographic dictionaries?  Or what if two separate, unsanctioned dictionaries define the same data name incompatibly?

For those or other reasons, it may now be, or in the future it may become the case that some data names' semantics are not globally well-defined.  Then to convey information about a particular definition, one would need to identify it by both dictionary and data name.  The crux of my previous comments, then, is that there is a technical/policy decision at this point: will DDLm be designed to accommodate these possibilities, or will it incorporate an assumption that data names are globally unique?  The current design of aliases does not support either alternative very well, but the needed changes depend on which alternative is chosen.

Case 1.

Suppose we decide DDLm should accommodate conflicting external definitions of the same data name.  Now consider the current attributes of the ALIAS category: definition_id and dictionary_uri, with the category key being definition_id.  This is not a proper relation, because, by hypothesis, dictionary_uri is not dependent on definition_id.

So suppose we remove dictionary_uri from ALIAS.  There is still a problem: what is the meaning of rows in the ALIAS category?  In relational terms, what is the predicate of ALIAS?  I see these options
a) "In some dictionary there is a definition of definition_id that is equivalent to the defined item's."
b) "All definitions of definition_id in any dictionary are equivalent to the defined item's."
The first is not very useful, and the second is inconsistent with our assumptions.  Choice (a) could be helped by having some other category, perhaps ALIAS_DICTIONARY, that defines which dictionary(-ies) each alias data name is drawn from.  However, that does not accommodate the possibility of different items in one dictionary referencing conflicting definitions of the same data name.  We don't necessarily need to provide for that, though, so there is a secondary technical/policy decision to be made if we follow this route.

Suppose we want to make DDLm as flexible as possible in this area by providing for dictionaries to reference conflicting definitions of the same data names.  Then to be any use, the ALIAS category needs some kind of dictionary identifier or an equivalent as part of its key.  Several alternatives are possible, but suppose we choose xref_code for that purpose.  ALIAS's predicate is then "The definition of definition_id in the dictionary identified by xref_code is equivalent to the defined item's."  That is well-defined, meaningful, and useful.  If dictionary_uri is still omitted then in a global sense it complies with Boyce-Codd normal form, but probably not with fourth normal form.  Per-definition it is fully normalized all the way to domain-key normal form.  Additional normalization questions do arise as we add other attributes, but I do not address them further here.

Alternatively, suppose we don't care to provide for dictionaries to reference conflicting definitions of the same data names.  The ALIAS_DICTIONARY approach described above would be well-normalized and sufficient to support this sub-case, but an alternative would be to add a dictionary identifier to ALIAS, just as described in the previous paragraph.

Case 2.

Suppose we decide DDLm should rely on an assumption that all definitions of any given data name are semantically equivalent.  As before, consider the current attributes and key of the ALIAS category.  ALIAS's predicate would then be "The definition of definition_id is in the dictionary located at dictionary_uri, and it is equivalent to the defined item's."  This is nonideal because the location of the alias definition is not dependent on the defined item.  Whether that formally constitutes a normalization problem depends on how we map categories to a relational model, but as a practical matter, it would result in repetition of the association between definition_id and dictionary_uri when multiple definitions declare aliases to the same data name.

That wouldn't be too bad under the assumption that a data name may be defined equivalently in multiple dictionaries, supposing that we don't care whether dictionaries declare all the dictionaries in which an alias data name is defined.  It would be less satisfactory under the assumption that each data name is defined in exactly one dictionary, because that opens the way for logical inconsistency in a valid dictionary.  If avoiding both repetition and possible inconsistency were desired, then dictionary_uri would need to be moved.  The ALIAS_DICTIONARY approach described in the previous case would serve this purpose.

>  I am just trying to allow
>DDLm to handle a reasonable range of presentations of
>aliases as they exist in the DDL1 core versus DDL2 core world
>as well as dealing with the aliases we will have to add for
>the stricter DDLm tag naming rules and to do so in a way that one
>set of DDLm dictionaries can do the entire job.

These are excellent objectives.  However, "Handl[ing] a reasonable range of presentations" and "dealing with the aliases we will have to add" are independent issues on different levels, and it will facilitate action to address them independently.


Email Disclaimer:  www.stjude.org/emaildisclaimer

ddlm-group mailing list

Reply to: [list | sender only]
International Union of Crystallography

Scientific Union Member of the International Science Council (admitted 1947). Member of CODATA, the ISC Committee on Data. Partner with UNESCO, the United Nations Educational, Scientific and Cultural Organization in the International Year of Crystallography 2014.

International Science Council Scientific Freedom Policy

The IUCr observes the basic policy of non-discrimination and affirms the right and freedom of scientists to associate in international scientific activity without regard to such factors as ethnic origin, religion, citizenship, language, political stance, gender, sex or age, in accordance with the Statutes of the International Council for Science.