Discussion List Archives

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [ddlm-group] DDLm aliases (subject changed). .. .. .. .. .. .... .

Dear John,

   I have no objection to fine distinctions among particular aliases.
The question is whether that belongs in a top level alias category,
in a normalized subcategory, or both.  A given tag may have
multiple aliases, and each alias may well appear in multiple
dictionaries.  Hopefully each alias has only one semantic definition,
and the whole ensemble of aliases of a given tag have very closely
related, if not identical semantic definitions.  However, various
set of aliases may well follow different syntactic rules, which
is why, in most cases, we ended up with aliases.

   I freely admit that I don't understand John B.'s point in
the first paragraph you quote.  I am just trying to allow
DDLm to handle a reasonable range of presentations of
aliases as they exist in the DDL1 core versus DDL2 core world
as well as dealing with the aliases we will have to add for
the stricter DDLm tag naming rules and to do so in a way that one
set of DDLm dictionaries can do the entire job.

   Regards,
     Herbert


At 7:23 PM -0500 2/1/11, John Westbrook wrote:
>Hi Herb,
>
>I am trying to follow your discussion with respect to the appropriate
>basis for alias.    From my perspective, it is important to retain
>item_name, dictionary name, and version as the identifier.  There
>are currently no conventions on dictionary naming that distinguish
>version details.    Also, it is a DDL2 convention that aliases refer
>specifically to the same item of data (e.g. semantically equivalent
>items).
>
>John
>
>
>
>>  As much as the idea may be to globally avoid data name clashes, it
>>>is not necessary to assume that they are successfully avoided.
>>>Rejecting that assumption not only protects against failures and
>>>policy changes in the CIF community, but it also makes DDLm a better
>>>candidate for adoption in disciplines with less central authority.
>>>Furthermore, although we do not need to follow DDL2 here, it does
>>>establish a precedent for scoping aliases to specific dictionaries.
>>>These are all good reasons to choose that, for DDLm's purposes,
>>>definition_id PLUS some form of dictionary identifier are required
>>>to uniquely identify an alias definition.  Are there good reasons to
>>>choose otherwise?
>>>
>
>
>On 2/1/11 5:59 PM, Herbert J. Bernstein wrote:
>>  Bottom line (literally)
>>
>>>  I see no reason why DDLm instance documents (i.e. dictionaries)
>>>  should have different presentation rules than the instance documents
>>>  they themselves describe.  Given a valid, possibly-denormalized
>>>  instance document and a dictionary with which it complies, it must
>>>  be possible to programmatically normalize the instance to the form
>>>  described by the dictionary (else the document contains
>>>  inconsistencies and therefore is invalid).  DDLm dictionaries are
>>>  instance documents of DDLm, so there is no need for different
>>>  behavior with respect to them.
>>>
>>>  Although I think the same applies to DDLm's own presentation, I am
>>>  concerned about what would happen if DDLm were presented in a
>>>  denormalized form that contained inconsistencies.  Rather than
>>>  expend continuing effort to ensure that a denormalized presentation
>>>  of DDLm remains consistent, I would rather expend effort to express
>>>  and maintain DDLm in its self-defined normalized form.  In any case,
>>>  I emphasize again that allowing a denormalized presentation is not
>>>  at all the same thing as defining a denormalized model.
>>>
>>>  None of the foregoing settles just what presentation rules DDLm
>>>  should actually require with respect to joined categories.  Should
>>>  denormalizing joins be permitted?  There is a cost/benefit analysis
>>>  to be performed here, but I'm not up to attempting it at the moment.
>>
>  > Which seems to leave us entirely with a matter of taste:  does
>  > anybody want to have a denormalized version of alias and its
>>  subcategories?  I am happy to do it either way.  I just need
>>  to use the sets.   If nobody else speaks up, I'll just make a
>>  guess and start programming on that basis.  We can then look
>>  at the result and figure out whether to use it or redo it in
>>  Madrid.
>>
>>      Herbert
>>
>>
>>  At 4:22 PM -0600 2/1/11, Bollinger, John C wrote:
>>>  Dear Herbert,
>>>
>>>  On Monday, January 31, 2011 3:09 PM, Herbert J. Bernstein wrote:
>>>
>>>>  At 1:20 PM -0600 1/31/11, Bollinger, John C wrote:
>>>
>>>  [...]
>>>
>>>>  This discussion began with adding what we were then calling styles
>>>>  to group related sets of tags.  One tag could have multiple styles.
>>>>  In normalized form, that would mean creating a new relation with
>>>>  the tags and the styles as components of a composite key, so the
>>>>  say key could be repeated with multiple styles and the same
>>>>  style could be repeated with multiple keys.
>>>
>>>  Indeed so.  This is what the ALIAS_DEFINITION_SET category provides
>>>  (by whichever name it's now going).
>>>
>>>>  Placing that directly in the alias category instead of
>>>>  in a separate relation _is_ a denormalization.
>>>
>>>  In a formal sense, I think you're saying that the result would not
>>>  satisfy second normal form because _alias.dictionary_uri would
>>>  depend on only part of the key (_alias.definition_id).  I agree.
>>>  That does rely on _alias.dictionary_uri not being part of a
>>>  candidate key, but the current definition assumes that.
>>>
>>>  If the only attributes were _alias.definition_id and
>>>  _alias.definition_set_id, however, and both were elements of the
>>>  key, then the category would comply even with domain-key normal
>>>  form.  One might in that case complain that the meaning of the ALIAS
>>>  category was changed, and that would be true, but it would be as
>>>  normalized as can be.
>>>
>>>>     You happen to
>>>>  have preferred to use the xref_code, but adding that to the
>>>>  alias category key is and was a denormalization.  In CIF, until
>>>>  now at least, COMCIFS has tried to maintain a global name space,
>>>>  with a given tag having one meaning across multiple dictionaries.
>>>>  That is why there is a prefix registration system, so adding
>>>>  the dictionary to the alias key should not be necessary.
>>>
>>>  So this is exactly one of the conversations I said we needed to
>>>  have: "What is the entity being modeled, and what assumptions are
>>>  being made about it?  [... T]his question could be framed as 'Should
>>>  a dictionary identifier be added to the ALIAS category key?'"  Thank
>>>  you for indulging me.
>>>
>>>  Xref_code, or some other dictionary identifier, is a different case
>>>  than definition_set_id.  Whereas there is no viable argument for
>>>  definition_set_id being part of a candidate key for ALIAS as that
>>>  category is currently defined, there *are* arguments for xref_code
>>>  being part of a candidate key.  We can choose how we want to model
>>>  things, but the decision is not arbitrary: it has technical,
>>>  semantic, and policy implications.
>>>
>>>  > From a technical perspective, the question can be again reframed as
>>>>  "does a definition_id determine the dictionary in which its
>>>>  definition appears?"  Inasmuch as the definition does not presently
>>>>  include dictionary_uri in the category key, DDLm as currently
>>>>  constituted appears to say "yes."  I think that's erroneous.  At
>>>>  minimum, COMCIFs' intention seems to be to redefine many mmCIF data
>>>>  names in a DDLm dictionary, and Herbert has expressed plans to do
>>>>  similarly for imgCIF.  Herbert nevertheless offers a contrasting
>>>>  view:
>>>
>>>>  The idea in CIF is that you _don't_ use the same tag name with
>>>>  different meanings in different dictionaries, but with the introduction
>>>>  of DDL2 and mmCIF we ended up with 2 versions of the same core definitions
>>>>  having the same meanings but different tag names.  Thus we needed to
>>>>  have aliases to relate the DDL2 dotted notation versions of the
>>>>  tags to the DDL1 undotted notations of the tags.
>  >>
>>>  I understand the original impetus for aliases.  Interpreting DDL2,
>  >> however, I conclude that the concept was broadened during
>>>  development, and that the assumption of data names having global
>>>  scope was intentionally avoided.  Others here were closer to the
>>>  process than I, but I observe that the description of the DDL2
>>>  ITEM_ALIASES category specifically says "Each alias name is
>>>  *identified by* the name and version of the dictionary to which it
>>>  belongs" (emphasis added).  Indeed, the category key is
>>>  (_item_aliases.alias_name, _item_aliases.dictionary,
>>>  _item_aliases.version).  That's even broader than anything currently
>>>  under discussion for DDLm.  ITG remarks that
>>>  "_item_aliases.dictionary [... is] provided to distinguish between
>>>  dictionaries [...]," which would not be necessary if a given data
>>>  name could be assumed to be defined in only one dictionary, or even
>>>  to be defined equivalently in every dictionary where it appears.
>>>
>>>  As much as the idea may be to globally avoid data name clashes, it
>>>  is not necessary to assume that they are successfully avoided.
>>>  Rejecting that assumption not only protects against failures and
>>>  policy changes in the CIF community, but it also makes DDLm a better
>>>  candidate for adoption in disciplines with less central authority.
>>>  Furthermore, although we do not need to follow DDL2 here, it does
>>>  establish a precedent for scoping aliases to specific dictionaries.
>>>  These are all good reasons to choose that, for DDLm's purposes,
>>>  definition_id PLUS some form of dictionary identifier are required
>>>  to uniquely identify an alias definition.  Are there good reasons to
>>>  choose otherwise?
>>>
>>>  Supposing that we do adopt the view that unique identification of
>>>  definitions requires at least definition_id and a dictionary
>>>  identifier, ALIAS is not even a proper relation unless a dictionary
>>>  identifier (such as xref_code) is added to the category key.
>>>
>>>  [...]
>>>
>>>>  I would be very happy having fully normalized DDLm dictionaries, but
>>>>  I can cope with denormalized dictionaries, just as I have to cope
>>>>  with denormalized datafiles -- indeed, for some search procedures,
>>>>  I deliberately denormalize dictionaries internally.  It
>>>>  sounds like John B. wants to stick to fully normalized DDLm dictionaries.
>>>
>>>  Hmm.  I would be happy to see dictionaries define data models that
>>>  comply with higher normalization forms, but that is a design
>>>  decision that should rest with their authors and maintainers.  I
>>>  would in particular like DDLm itself to describe a highly normalized
>>>  model for its own domain (dictionaries), though exactly which form
>>>  would be most appropriate is an open question.  Ensuring that DDLm
>>>  describes a well-normalized data model does not force other DDLm
>>>  dictionaries to describe equally normalized models.  *Presentation*
>>>  of these models, on the other hand, remains a separate issue,
>>>  discussed next.
>>>
>>>>  While this has some impact on software developers, it has very little
>>>>  direct impact on users -- so what do people think:
>>>>
>>>>      Should all DDLm dictionaries be fully normalized (if so, to 
>>>>which level
>>>>  of normalization) or
>>>>
>>>>      Should DDLm dictionaries bee allowed the same flexibility as
>>>>  data files in being denormalized?
>>>
>>>  I see no reason why DDLm instance documents (i.e. dictionaries)
>>>  should have different presentation rules than the instance documents
>>>  they themselves describe.  Given a valid, possibly-denormalized
>>>  instance document and a dictionary with which it complies, it must
>>>  be possible to programmatically normalize the instance to the form
>>>  described by the dictionary (else the document contains
>>>  inconsistencies and therefore is invalid).  DDLm dictionaries are
>>>  instance documents of DDLm, so there is no need for different
>>>  behavior with respect to them.
>>>
>>>  Although I think the same applies to DDLm's own presentation, I am
>>>  concerned about what would happen if DDLm were presented in a
>>>  denormalized form that contained inconsistencies.  Rather than
>  >> expend continuing effort to ensure that a denormalized presentation
>  >> of DDLm remains consistent, I would rather expend effort to express
>>>  and maintain DDLm in its self-defined normalized form.  In any case,
>>>  I emphasize again that allowing a denormalized presentation is not
>>>  at all the same thing as defining a denormalized model.
>>>
>>>  None of the foregoing settles just what presentation rules DDLm
>>>  should actually require with respect to joined categories.  Should
>>>  denormalizing joins be permitted?  There is a cost/benefit analysis
>>>  to be performed here, but I'm not up to attempting it at the moment.
>>>
>>>
>>>  John
>>>
>>>  --
>>>  John C. Bollinger, Ph.D.
>>>  Department of Structural Biology
>>>  St. Jude Children's Research Hospital
>>>
>>>
>>>  Email Disclaimer:  www.stjude.org/emaildisclaimer
>>>
>>>  _______________________________________________
>>>  ddlm-group mailing list
>>>  ddlm-group@iucr.org
>>>  http://scripts.iucr.org/mailman/listinfo/ddlm-group
>>
>>
>
>--
>******************************************************************
>    John Westbrook, Ph.D.
>    Rutgers, The State University of New Jersey
>    Department of Chemistry and Chemical Biology
>    610 Taylor Road
>    Piscataway, NJ 08854-8087
>    e-mail: jwest@rcsb.rutgers.edu
>    Ph:  (732) 445-4290  Fax: (732) 445-4320
>******************************************************************
>_______________________________________________
>ddlm-group mailing list
>ddlm-group@iucr.org
>http://scripts.iucr.org/mailman/listinfo/ddlm-group


-- 
=====================================================
  Herbert J. Bernstein, Professor of Computer Science
    Dowling College, Kramer Science Center, KSC 121
         Idle Hour Blvd, Oakdale, NY, 11769

                  +1-631-244-3035
                  yaya@dowling.edu
=====================================================
_______________________________________________
ddlm-group mailing list
ddlm-group@iucr.org
http://scripts.iucr.org/mailman/listinfo/ddlm-group

Reply to: [list | sender only]
International Union of Crystallography

Scientific Union Member of the International Council for Science (admitted 1947). Member of CODATA, the ICSU Committee on Data. Member of ICSTI, the International Council for Scientific and Technical Information. Partner with UNESCO, the United Nations Educational, Scientific and Cultural Organization in the International Year of Crystallography 2014.

ICSU Scientific Freedom Policy

The IUCr observes the basic policy of non-discrimination and affirms the right and freedom of scientists to associate in international scientific activity without regard to such factors as ethnic origin, religion, citizenship, language, political stance, gender, sex or age, in accordance with the Statutes of the International Council for Science.