[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [ddlm-group] DDLm aliases (subject changed). .. .. .. .. .. .... .

Bottom line (literally)

>I see no reason why DDLm instance documents (i.e. dictionaries)
>should have different presentation rules than the instance documents
>they themselves describe.  Given a valid, possibly-denormalized
>instance document and a dictionary with which it complies, it must
>be possible to programmatically normalize the instance to the form
>described by the dictionary (else the document contains
>inconsistencies and therefore is invalid).  DDLm dictionaries are
>instance documents of DDLm, so there is no need for different
>behavior with respect to them.
>
>Although I think the same applies to DDLm's own presentation, I am
>concerned about what would happen if DDLm were presented in a
>denormalized form that contained inconsistencies.  Rather than
>expend continuing effort to ensure that a denormalized presentation
>of DDLm remains consistent, I would rather expend effort to express
>and maintain DDLm in its self-defined normalized form.  In any case,
>I emphasize again that allowing a denormalized presentation is not
>at all the same thing as defining a denormalized model.
>
>None of the foregoing settles just what presentation rules DDLm
>should actually require with respect to joined categories.  Should
>denormalizing joins be permitted?  There is a cost/benefit analysis
>to be performed here, but I'm not up to attempting it at the moment.

Which seems to leave us entirely with a matter of taste:  does
anybody want to have a denormalized version of alias and its
subcategories?  I am happy to do it either way.  I just need
to use the sets.   If nobody else speaks up, I'll just make a
guess and start programming on that basis.  We can then look
at the result and figure out whether to use it or redo it in
Madrid.

   Herbert


At 4:22 PM -0600 2/1/11, Bollinger, John C wrote:
>Dear Herbert,
>
>On Monday, January 31, 2011 3:09 PM, Herbert J. Bernstein wrote:
>
>>At 1:20 PM -0600 1/31/11, Bollinger, John C wrote:
>
>[...]
>
>>This discussion began with adding what we were then calling styles
>>to group related sets of tags.  One tag could have multiple styles.
>>In normalized form, that would mean creating a new relation with
>>the tags and the styles as components of a composite key, so the
>>say key could be repeated with multiple styles and the same
>>style could be repeated with multiple keys.
>
>Indeed so.  This is what the ALIAS_DEFINITION_SET category provides 
>(by whichever name it's now going).
>
>>Placing that directly in the alias category instead of
>>in a separate relation _is_ a denormalization.
>
>In a formal sense, I think you're saying that the result would not 
>satisfy second normal form because _alias.dictionary_uri would 
>depend on only part of the key (_alias.definition_id).  I agree. 
>That does rely on _alias.dictionary_uri not being part of a 
>candidate key, but the current definition assumes that.
>
>If the only attributes were _alias.definition_id and 
>_alias.definition_set_id, however, and both were elements of the 
>key, then the category would comply even with domain-key normal 
>form.  One might in that case complain that the meaning of the ALIAS 
>category was changed, and that would be true, but it would be as 
>normalized as can be.
>
>>   You happen to
>>have preferred to use the xref_code, but adding that to the
>>alias category key is and was a denormalization.  In CIF, until
>>now at least, COMCIFS has tried to maintain a global name space,
>>with a given tag having one meaning across multiple dictionaries.
>>That is why there is a prefix registration system, so adding
>>the dictionary to the alias key should not be necessary.
>
>So this is exactly one of the conversations I said we needed to 
>have: "What is the entity being modeled, and what assumptions are 
>being made about it?  [... T]his question could be framed as 'Should 
>a dictionary identifier be added to the ALIAS category key?'"  Thank 
>you for indulging me.
>
>Xref_code, or some other dictionary identifier, is a different case 
>than definition_set_id.  Whereas there is no viable argument for 
>definition_set_id being part of a candidate key for ALIAS as that 
>category is currently defined, there *are* arguments for xref_code 
>being part of a candidate key.  We can choose how we want to model 
>things, but the decision is not arbitrary: it has technical, 
>semantic, and policy implications.
>
>>From a technical perspective, the question can be again reframed as 
>>"does a definition_id determine the dictionary in which its 
>>definition appears?"  Inasmuch as the definition does not presently 
>>include dictionary_uri in the category key, DDLm as currently 
>>constituted appears to say "yes."  I think that's erroneous.  At 
>>minimum, COMCIFs' intention seems to be to redefine many mmCIF data 
>>names in a DDLm dictionary, and Herbert has expressed plans to do 
>>similarly for imgCIF.  Herbert nevertheless offers a contrasting 
>>view:
>
>>The idea in CIF is that you _don't_ use the same tag name with
>>different meanings in different dictionaries, but with the introduction
>>of DDL2 and mmCIF we ended up with 2 versions of the same core definitions
>>having the same meanings but different tag names.  Thus we needed to
>>have aliases to relate the DDL2 dotted notation versions of the
>>tags to the DDL1 undotted notations of the tags.
>
>I understand the original impetus for aliases.  Interpreting DDL2, 
>however, I conclude that the concept was broadened during 
>development, and that the assumption of data names having global 
>scope was intentionally avoided.  Others here were closer to the 
>process than I, but I observe that the description of the DDL2 
>ITEM_ALIASES category specifically says "Each alias name is 
>*identified by* the name and version of the dictionary to which it 
>belongs" (emphasis added).  Indeed, the category key is 
>(_item_aliases.alias_name, _item_aliases.dictionary, 
>_item_aliases.version).  That's even broader than anything currently 
>under discussion for DDLm.  ITG remarks that 
>"_item_aliases.dictionary [... is] provided to distinguish between 
>dictionaries [...]," which would not be necessary if a given data 
>name could be assumed to be defined in only one dictionary, or even 
>to be defined equivalently in every dictionary where it appears.
>
>As much as the idea may be to globally avoid data name clashes, it 
>is not necessary to assume that they are successfully avoided. 
>Rejecting that assumption not only protects against failures and 
>policy changes in the CIF community, but it also makes DDLm a better 
>candidate for adoption in disciplines with less central authority. 
>Furthermore, although we do not need to follow DDL2 here, it does 
>establish a precedent for scoping aliases to specific dictionaries. 
>These are all good reasons to choose that, for DDLm's purposes, 
>definition_id PLUS some form of dictionary identifier are required 
>to uniquely identify an alias definition.  Are there good reasons to 
>choose otherwise?
>
>Supposing that we do adopt the view that unique identification of 
>definitions requires at least definition_id and a dictionary 
>identifier, ALIAS is not even a proper relation unless a dictionary 
>identifier (such as xref_code) is added to the category key.
>
>[...]
>
>>I would be very happy having fully normalized DDLm dictionaries, but
>>I can cope with denormalized dictionaries, just as I have to cope
>>with denormalized datafiles -- indeed, for some search procedures,
>>I deliberately denormalize dictionaries internally.  It
>>sounds like John B. wants to stick to fully normalized DDLm dictionaries.
>
>Hmm.  I would be happy to see dictionaries define data models that 
>comply with higher normalization forms, but that is a design 
>decision that should rest with their authors and maintainers.  I 
>would in particular like DDLm itself to describe a highly normalized 
>model for its own domain (dictionaries), though exactly which form 
>would be most appropriate is an open question.  Ensuring that DDLm 
>describes a well-normalized data model does not force other DDLm 
>dictionaries to describe equally normalized models.  *Presentation* 
>of these models, on the other hand, remains a separate issue, 
>discussed next.
>
>>While this has some impact on software developers, it has very little
>>direct impact on users -- so what do people think:
>>
>>    Should all DDLm dictionaries be fully normalized (if so, to which level
>>of normalization) or
>>
>>    Should DDLm dictionaries bee allowed the same flexibility as
>>data files in being denormalized?
>
>I see no reason why DDLm instance documents (i.e. dictionaries) 
>should have different presentation rules than the instance documents 
>they themselves describe.  Given a valid, possibly-denormalized 
>instance document and a dictionary with which it complies, it must 
>be possible to programmatically normalize the instance to the form 
>described by the dictionary (else the document contains 
>inconsistencies and therefore is invalid).  DDLm dictionaries are 
>instance documents of DDLm, so there is no need for different 
>behavior with respect to them.
>
>Although I think the same applies to DDLm's own presentation, I am 
>concerned about what would happen if DDLm were presented in a 
>denormalized form that contained inconsistencies.  Rather than 
>expend continuing effort to ensure that a denormalized presentation 
>of DDLm remains consistent, I would rather expend effort to express 
>and maintain DDLm in its self-defined normalized form.  In any case, 
>I emphasize again that allowing a denormalized presentation is not 
>at all the same thing as defining a denormalized model.
>
>None of the foregoing settles just what presentation rules DDLm 
>should actually require with respect to joined categories.  Should 
>denormalizing joins be permitted?  There is a cost/benefit analysis 
>to be performed here, but I'm not up to attempting it at the moment.
>
>
>John
>
>--
>John C. Bollinger, Ph.D.
>Department of Structural Biology
>St. Jude Children's Research Hospital
>
>
>Email Disclaimer:  www.stjude.org/emaildisclaimer
>
>_______________________________________________
>ddlm-group mailing list
>ddlm-group@iucr.org
>http://scripts.iucr.org/mailman/listinfo/ddlm-group


-- 
=====================================================
  Herbert J. Bernstein, Professor of Computer Science
    Dowling College, Kramer Science Center, KSC 121
         Idle Hour Blvd, Oakdale, NY, 11769

                  +1-631-244-3035
                  yaya@dowling.edu
=====================================================
_______________________________________________
ddlm-group mailing list
ddlm-group@iucr.org
http://scripts.iucr.org/mailman/listinfo/ddlm-group

Reply to: [list | sender only]