[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [ddlm-group] DDLm aliases (subject changed). .. .. .. .. .. .... .

Hi Herb,

I am trying to follow your discussion with respect to the appropriate
basis for alias.    From my perspective, it is important to retain
item_name, dictionary name, and version as the identifier.  There
are currently no conventions on dictionary naming that distinguish
version details.    Also, it is a DDL2 convention that aliases refer
specifically to the same item of data (e.g. semantically equivalent
items).

John



> As much as the idea may be to globally avoid data name clashes, it
>>is not necessary to assume that they are successfully avoided.
>>Rejecting that assumption not only protects against failures and
>>policy changes in the CIF community, but it also makes DDLm a better
>>candidate for adoption in disciplines with less central authority.
>>Furthermore, although we do not need to follow DDL2 here, it does
>>establish a precedent for scoping aliases to specific dictionaries.
>>These are all good reasons to choose that, for DDLm's purposes,
>>definition_id PLUS some form of dictionary identifier are required
>>to uniquely identify an alias definition.  Are there good reasons to
>>choose otherwise?
>>


On 2/1/11 5:59 PM, Herbert J. Bernstein wrote:
> Bottom line (literally)
>
>> I see no reason why DDLm instance documents (i.e. dictionaries)
>> should have different presentation rules than the instance documents
>> they themselves describe.  Given a valid, possibly-denormalized
>> instance document and a dictionary with which it complies, it must
>> be possible to programmatically normalize the instance to the form
>> described by the dictionary (else the document contains
>> inconsistencies and therefore is invalid).  DDLm dictionaries are
>> instance documents of DDLm, so there is no need for different
>> behavior with respect to them.
>>
>> Although I think the same applies to DDLm's own presentation, I am
>> concerned about what would happen if DDLm were presented in a
>> denormalized form that contained inconsistencies.  Rather than
>> expend continuing effort to ensure that a denormalized presentation
>> of DDLm remains consistent, I would rather expend effort to express
>> and maintain DDLm in its self-defined normalized form.  In any case,
>> I emphasize again that allowing a denormalized presentation is not
>> at all the same thing as defining a denormalized model.
>>
>> None of the foregoing settles just what presentation rules DDLm
>> should actually require with respect to joined categories.  Should
>> denormalizing joins be permitted?  There is a cost/benefit analysis
>> to be performed here, but I'm not up to attempting it at the moment.
>
> Which seems to leave us entirely with a matter of taste:  does
> anybody want to have a denormalized version of alias and its
> subcategories?  I am happy to do it either way.  I just need
> to use the sets.   If nobody else speaks up, I'll just make a
> guess and start programming on that basis.  We can then look
> at the result and figure out whether to use it or redo it in
> Madrid.
>
>     Herbert
>
>
> At 4:22 PM -0600 2/1/11, Bollinger, John C wrote:
>> Dear Herbert,
>>
>> On Monday, January 31, 2011 3:09 PM, Herbert J. Bernstein wrote:
>>
>>> At 1:20 PM -0600 1/31/11, Bollinger, John C wrote:
>>
>> [...]
>>
>>> This discussion began with adding what we were then calling styles
>>> to group related sets of tags.  One tag could have multiple styles.
>>> In normalized form, that would mean creating a new relation with
>>> the tags and the styles as components of a composite key, so the
>>> say key could be repeated with multiple styles and the same
>>> style could be repeated with multiple keys.
>>
>> Indeed so.  This is what the ALIAS_DEFINITION_SET category provides
>> (by whichever name it's now going).
>>
>>> Placing that directly in the alias category instead of
>>> in a separate relation _is_ a denormalization.
>>
>> In a formal sense, I think you're saying that the result would not
>> satisfy second normal form because _alias.dictionary_uri would
>> depend on only part of the key (_alias.definition_id).  I agree.
>> That does rely on _alias.dictionary_uri not being part of a
>> candidate key, but the current definition assumes that.
>>
>> If the only attributes were _alias.definition_id and
>> _alias.definition_set_id, however, and both were elements of the
>> key, then the category would comply even with domain-key normal
>> form.  One might in that case complain that the meaning of the ALIAS
>> category was changed, and that would be true, but it would be as
>> normalized as can be.
>>
>>>    You happen to
>>> have preferred to use the xref_code, but adding that to the
>>> alias category key is and was a denormalization.  In CIF, until
>>> now at least, COMCIFS has tried to maintain a global name space,
>>> with a given tag having one meaning across multiple dictionaries.
>>> That is why there is a prefix registration system, so adding
>>> the dictionary to the alias key should not be necessary.
>>
>> So this is exactly one of the conversations I said we needed to
>> have: "What is the entity being modeled, and what assumptions are
>> being made about it?  [... T]his question could be framed as 'Should
>> a dictionary identifier be added to the ALIAS category key?'"  Thank
>> you for indulging me.
>>
>> Xref_code, or some other dictionary identifier, is a different case
>> than definition_set_id.  Whereas there is no viable argument for
>> definition_set_id being part of a candidate key for ALIAS as that
>> category is currently defined, there *are* arguments for xref_code
>> being part of a candidate key.  We can choose how we want to model
>> things, but the decision is not arbitrary: it has technical,
>> semantic, and policy implications.
>>
>> > From a technical perspective, the question can be again reframed as
>>> "does a definition_id determine the dictionary in which its
>>> definition appears?"  Inasmuch as the definition does not presently
>>> include dictionary_uri in the category key, DDLm as currently
>>> constituted appears to say "yes."  I think that's erroneous.  At
>>> minimum, COMCIFs' intention seems to be to redefine many mmCIF data
>>> names in a DDLm dictionary, and Herbert has expressed plans to do
>>> similarly for imgCIF.  Herbert nevertheless offers a contrasting
>>> view:
>>
>>> The idea in CIF is that you _don't_ use the same tag name with
>>> different meanings in different dictionaries, but with the introduction
>>> of DDL2 and mmCIF we ended up with 2 versions of the same core definitions
>>> having the same meanings but different tag names.  Thus we needed to
>>> have aliases to relate the DDL2 dotted notation versions of the
>>> tags to the DDL1 undotted notations of the tags.
>>
>> I understand the original impetus for aliases.  Interpreting DDL2,
>> however, I conclude that the concept was broadened during
>> development, and that the assumption of data names having global
>> scope was intentionally avoided.  Others here were closer to the
>> process than I, but I observe that the description of the DDL2
>> ITEM_ALIASES category specifically says "Each alias name is
>> *identified by* the name and version of the dictionary to which it
>> belongs" (emphasis added).  Indeed, the category key is
>> (_item_aliases.alias_name, _item_aliases.dictionary,
>> _item_aliases.version).  That's even broader than anything currently
>> under discussion for DDLm.  ITG remarks that
>> "_item_aliases.dictionary [... is] provided to distinguish between
>> dictionaries [...]," which would not be necessary if a given data
>> name could be assumed to be defined in only one dictionary, or even
>> to be defined equivalently in every dictionary where it appears.
>>
>> As much as the idea may be to globally avoid data name clashes, it
>> is not necessary to assume that they are successfully avoided.
>> Rejecting that assumption not only protects against failures and
>> policy changes in the CIF community, but it also makes DDLm a better
>> candidate for adoption in disciplines with less central authority.
>> Furthermore, although we do not need to follow DDL2 here, it does
>> establish a precedent for scoping aliases to specific dictionaries.
>> These are all good reasons to choose that, for DDLm's purposes,
>> definition_id PLUS some form of dictionary identifier are required
>> to uniquely identify an alias definition.  Are there good reasons to
>> choose otherwise?
>>
>> Supposing that we do adopt the view that unique identification of
>> definitions requires at least definition_id and a dictionary
>> identifier, ALIAS is not even a proper relation unless a dictionary
>> identifier (such as xref_code) is added to the category key.
>>
>> [...]
>>
>>> I would be very happy having fully normalized DDLm dictionaries, but
>>> I can cope with denormalized dictionaries, just as I have to cope
>>> with denormalized datafiles -- indeed, for some search procedures,
>>> I deliberately denormalize dictionaries internally.  It
>>> sounds like John B. wants to stick to fully normalized DDLm dictionaries.
>>
>> Hmm.  I would be happy to see dictionaries define data models that
>> comply with higher normalization forms, but that is a design
>> decision that should rest with their authors and maintainers.  I
>> would in particular like DDLm itself to describe a highly normalized
>> model for its own domain (dictionaries), though exactly which form
>> would be most appropriate is an open question.  Ensuring that DDLm
>> describes a well-normalized data model does not force other DDLm
>> dictionaries to describe equally normalized models.  *Presentation*
>> of these models, on the other hand, remains a separate issue,
>> discussed next.
>>
>>> While this has some impact on software developers, it has very little
>>> direct impact on users -- so what do people think:
>>>
>>>     Should all DDLm dictionaries be fully normalized (if so, to which level
>>> of normalization) or
>>>
>>>     Should DDLm dictionaries bee allowed the same flexibility as
>>> data files in being denormalized?
>>
>> I see no reason why DDLm instance documents (i.e. dictionaries)
>> should have different presentation rules than the instance documents
>> they themselves describe.  Given a valid, possibly-denormalized
>> instance document and a dictionary with which it complies, it must
>> be possible to programmatically normalize the instance to the form
>> described by the dictionary (else the document contains
>> inconsistencies and therefore is invalid).  DDLm dictionaries are
>> instance documents of DDLm, so there is no need for different
>> behavior with respect to them.
>>
>> Although I think the same applies to DDLm's own presentation, I am
>> concerned about what would happen if DDLm were presented in a
>> denormalized form that contained inconsistencies.  Rather than
>> expend continuing effort to ensure that a denormalized presentation
>> of DDLm remains consistent, I would rather expend effort to express
>> and maintain DDLm in its self-defined normalized form.  In any case,
>> I emphasize again that allowing a denormalized presentation is not
>> at all the same thing as defining a denormalized model.
>>
>> None of the foregoing settles just what presentation rules DDLm
>> should actually require with respect to joined categories.  Should
>> denormalizing joins be permitted?  There is a cost/benefit analysis
>> to be performed here, but I'm not up to attempting it at the moment.
>>
>>
>> John
>>
>> --
>> John C. Bollinger, Ph.D.
>> Department of Structural Biology
>> St. Jude Children's Research Hospital
>>
>>
>> Email Disclaimer:  www.stjude.org/emaildisclaimer
>>
>> _______________________________________________
>> ddlm-group mailing list
>> ddlm-group@iucr.org
>> http://scripts.iucr.org/mailman/listinfo/ddlm-group
>
>

-- 
******************************************************************
   John Westbrook, Ph.D.
   Rutgers, The State University of New Jersey
   Department of Chemistry and Chemical Biology
   610 Taylor Road
   Piscataway, NJ 08854-8087
   e-mail: jwest@rcsb.rutgers.edu
   Ph:  (732) 445-4290  Fax: (732) 445-4320
******************************************************************
_______________________________________________
ddlm-group mailing list
ddlm-group@iucr.org
http://scripts.iucr.org/mailman/listinfo/ddlm-group

Reply to: [list | sender only]