Discussion List Archives

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [ddlm-group] DDLm aliases (subject changed). .. .. .. .. .. .... .. .

I disagree -- once the keys of the parent category and child category
are settled, the question of keys for the denormalized presentation
is settled -- it has to be the union of the two sets of keys.  The
rest really are just presentation issues, e.g. providing mechanisms
for alternate tag names in the denormalized presentation, whether
we need to explicitly say what is implicitly forced by allowing
denormalization at all, etc.

Once you accept that we are working with relations and following
Codd's rules, almost everything we are discussing is a matter of
taste in completely equivalent presentations of the same

As for the rest, in which you wish to extend CIF to cover more
general choice of domains with conflicting uses of the same tags,
we alsready have the prefix mechanism, that is similar to the
approach on XML, except with an _ instead of a ":".  I think
it would be a very bad idea to move CIF in the direction of
needing access to particular dictionaries to understand
which of several alternative meanings of, say, _cell.length_a
we intended in a particular data cif.  It would make publishing
journals and running archives even more difficult tasks than they
now are.  I think the original COMCIFS decision of a global
name space was a wise choice for the major applications of
CIF, and would suggest we stick to it for DDLm.

  Herbert J. Bernstein, Professor of Computer Science
    Dowling College, Kramer Science Center, KSC 121
         Idle Hour Blvd, Oakdale, NY, 11769


On Wed, 2 Feb 2011, Bollinger, John C wrote:

> On Tuesday, February 01, 2011 7:34 PM, Herbert J. Bernstein wrote:
>>   I have no objection to fine distinctions among particular aliases.
>> The question is whether that belongs in a top level alias category,
>> in a normalized subcategory, or both.
> To the extent that the question is answered by the choice of category 
> keys, it is fundamentally a data modeling issue, and only incidentally a 
> presentation issue.  On the other hand, if the range of allowed 
> presentation styles Herbert would like to have is provided strictly by a 
> suitable choice of category joining rules, then at instantiation time, 
> it is a presentation issue alone.  Even in that case, however, there is 
> a non-trivial technical question of whether to adopt joining rules that 
> would allow such a range in the first place.
>>   I freely admit that I don't understand John B.'s point in
>> the first paragraph you quote.
> My apologies.  I will attempt to clarify, but first, please understand 
> that this discussion ignores the possibility of presentations relying on 
> denormalizing category joins.  I'm talking about the definitions' 
> meaning, not their presentation.  With that said:
> Current COMCIFS policy is to avoid defining the same data name with 
> different semantics in different dictionaries.  By extension, the 
> community is encouraged to follow the same practice.  But what if 
> COMCIFS later discovers a reason to change its policy?  Or what if 
> someone in another discipline becomes interested in adopting STAR and 
> DDLm, but cannot rely on the same policy?  Or what if STAR and DDLm are 
> adopted in another community with a central authority comparable to 
> COMCIFS, but which does not care to avoid creating their own, 
> conflicting, definitions of data names already defined in 
> crystallographic dictionaries?  Or what if two separate, unsanctioned 
> dictionaries define the same data name incompatibly?
> For those or other reasons, it may now be, or in the future it may 
> become the case that some data names' semantics are not globally 
> well-defined.  Then to convey information about a particular definition, 
> one would need to identify it by both dictionary and data name.  The 
> crux of my previous comments, then, is that there is a technical/policy 
> decision at this point: will DDLm be designed to accommodate these 
> possibilities, or will it incorporate an assumption that data names are 
> globally unique?  The current design of aliases does not support either 
> alternative very well, but the needed changes depend on which 
> alternative is chosen.
> Case 1.
> Suppose we decide DDLm should accommodate conflicting external 
> definitions of the same data name.  Now consider the current attributes 
> of the ALIAS category: definition_id and dictionary_uri, with the 
> category key being definition_id.  This is not a proper relation, 
> because, by hypothesis, dictionary_uri is not dependent on 
> definition_id.
> So suppose we remove dictionary_uri from ALIAS.  There is still a 
> problem: what is the meaning of rows in the ALIAS category?  In 
> relational terms, what is the predicate of ALIAS?  I see these options 
> a) "In some dictionary there is a definition of definition_id that is 
> equivalent to the defined item's." b) "All definitions of definition_id 
> in any dictionary are equivalent to the defined item's." The first is 
> not very useful, and the second is inconsistent with our assumptions. 
> Choice (a) could be helped by having some other category, perhaps 
> ALIAS_DICTIONARY, that defines which dictionary(-ies) each alias data 
> name is drawn from.  However, that does not accommodate the possibility 
> of different items in one dictionary referencing conflicting definitions 
> of the same data name.  We don't necessarily need to provide for that, 
> though, so there is a secondary technical/policy decision to be made if 
> we follow this route.
> Suppose we want to make DDLm as flexible as possible in this area by 
> providing for dictionaries to reference conflicting definitions of the 
> same data names.  Then to be any use, the ALIAS category needs some kind 
> of dictionary identifier or an equivalent as part of its key.  Several 
> alternatives are possible, but suppose we choose xref_code for that 
> purpose.  ALIAS's predicate is then "The definition of definition_id in 
> the dictionary identified by xref_code is equivalent to the defined 
> item's."  That is well-defined, meaningful, and useful.  If 
> dictionary_uri is still omitted then in a global sense it complies with 
> Boyce-Codd normal form, but probably not with fourth normal form. 
> Per-definition it is fully normalized all the way to domain-key normal 
> form.  Additional normalization questions do arise as we add other 
> attributes, but I do not address them further here.
> Alternatively, suppose we don't care to provide for dictionaries to 
> reference conflicting definitions of the same data names.  The 
> ALIAS_DICTIONARY approach described above would be well-normalized and 
> sufficient to support this sub-case, but an alternative would be to add 
> a dictionary identifier to ALIAS, just as described in the previous 
> paragraph. ---
> Case 2.
> Suppose we decide DDLm should rely on an assumption that all definitions 
> of any given data name are semantically equivalent.  As before, consider 
> the current attributes and key of the ALIAS category.  ALIAS's predicate 
> would then be "The definition of definition_id is in the dictionary 
> located at dictionary_uri, and it is equivalent to the defined item's." 
> This is nonideal because the location of the alias definition is not 
> dependent on the defined item.  Whether that formally constitutes a 
> normalization problem depends on how we map categories to a relational 
> model, but as a practical matter, it would result in repetition of the 
> association between definition_id and dictionary_uri when multiple 
> definitions declare aliases to the same data name.
> That wouldn't be too bad under the assumption that a data name may be 
> defined equivalently in multiple dictionaries, supposing that we don't 
> care whether dictionaries declare all the dictionaries in which an alias 
> data name is defined.  It would be less satisfactory under the 
> assumption that each data name is defined in exactly one dictionary, 
> because that opens the way for logical inconsistency in a valid 
> dictionary.  If avoiding both repetition and possible inconsistency were 
> desired, then dictionary_uri would need to be moved.  The 
> ALIAS_DICTIONARY approach described in the previous case would serve 
> this purpose. ---
>>  I am just trying to allow
>> DDLm to handle a reasonable range of presentations of
>> aliases as they exist in the DDL1 core versus DDL2 core world
>> as well as dealing with the aliases we will have to add for
>> the stricter DDLm tag naming rules and to do so in a way that one
>> set of DDLm dictionaries can do the entire job.
> These are excellent objectives.  However, "Handl[ing] a reasonable range 
> of presentations" and "dealing with the aliases we will have to add" are 
> independent issues on different levels, and it will facilitate action to 
> address them independently.
> John
> Email Disclaimer:  www.stjude.org/emaildisclaimer
> _______________________________________________
> ddlm-group mailing list
> ddlm-group@iucr.org
> http://scripts.iucr.org/mailman/listinfo/ddlm-group
ddlm-group mailing list

Reply to: [list | sender only]
International Union of Crystallography

Scientific Union Member of the International Science Council (admitted 1947). Member of CODATA, the ISC Committee on Data. Partner with UNESCO, the United Nations Educational, Scientific and Cultural Organization in the International Year of Crystallography 2014.

International Science Council Scientific Freedom Policy

The IUCr observes the basic policy of non-discrimination and affirms the right and freedom of scientists to associate in international scientific activity without regard to such factors as ethnic origin, religion, citizenship, language, political stance, gender, sex or age, in accordance with the Statutes of the International Council for Science.