[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Reply to: [list | sender only]
Re: [ddlm-group] DDLm aliases (subject changed). .. .. .. .. .. .... .
- To: ddlm-group@iucr.org
- Subject: Re: [ddlm-group] DDLm aliases (subject changed). .. .. .. .. .. .... .
- From: John Westbrook <jwest@rcsb.rutgers.edu>
- Date: Tue, 01 Feb 2011 19:23:18 -0500
- In-Reply-To: <a06240801c96e3f206f96@[192.168.2.102]>
- References: <AANLkTi=ATdNovWFiecEwDrbtMdTwZ7guvYuBCGrdnb-i@mail.gmail.com> <4D404DAA.8070804@mcmaster.ca> <a06240802c96600c48956@[192.168.2.102]> <8F77913624F7524AACD2A92EAF3BFA54166D7D1EE1@SJMEMXMBS11.stjude.sjcrh.local > <a06240800c9668e1faa7c@[192.168.2.102]> <8F77913624F7524AACD2A92EAF3BFA54166D7D1EE8@SJMEMXMBS11.stjude.sjcrh.local > <a06240802c9674292646e@[192.168.2.102]> <8F77913624F7524AACD2A92EAF3BFA54166D7D1EEB@SJMEMXMBS11.stjude.sjcrh.local > <4D41C6E7.2040109@rcsb.rutgers.edu> <8F77913624F7524AACD2A92EAF3BFA54166D7D1EEF@SJMEMXMBS11.stjude.sjcrh.local > <a06240800c967b204830b@[192.168.2.102]> <8F77913624F7524AACD2A92EAF3BFA54166D7D1EF0@SJMEMXMBS11.stjude.sjcrh.local > <alpine.BSF.2.00.1101282147550.61818@epsilon.pair.com> <8F77913624F7524AACD2A92EAF3BFA54166D7D1EF1@SJMEMXMBS11.stjude.sjcrh.local > <a06240801c96cc655685e@[149.72.7.214]> <8F77913624F7524AACD2A92EAF3BFA54166D7D1EF4@SJMEMXMBS11.stjude.sjcrh.local ><a06240801c96e3f206f96@[192.168.2.102]>
Hi Herb, I am trying to follow your discussion with respect to the appropriate basis for alias. From my perspective, it is important to retain item_name, dictionary name, and version as the identifier. There are currently no conventions on dictionary naming that distinguish version details. Also, it is a DDL2 convention that aliases refer specifically to the same item of data (e.g. semantically equivalent items). John > As much as the idea may be to globally avoid data name clashes, it >>is not necessary to assume that they are successfully avoided. >>Rejecting that assumption not only protects against failures and >>policy changes in the CIF community, but it also makes DDLm a better >>candidate for adoption in disciplines with less central authority. >>Furthermore, although we do not need to follow DDL2 here, it does >>establish a precedent for scoping aliases to specific dictionaries. >>These are all good reasons to choose that, for DDLm's purposes, >>definition_id PLUS some form of dictionary identifier are required >>to uniquely identify an alias definition. Are there good reasons to >>choose otherwise? >> On 2/1/11 5:59 PM, Herbert J. Bernstein wrote: > Bottom line (literally) > >> I see no reason why DDLm instance documents (i.e. dictionaries) >> should have different presentation rules than the instance documents >> they themselves describe. Given a valid, possibly-denormalized >> instance document and a dictionary with which it complies, it must >> be possible to programmatically normalize the instance to the form >> described by the dictionary (else the document contains >> inconsistencies and therefore is invalid). DDLm dictionaries are >> instance documents of DDLm, so there is no need for different >> behavior with respect to them. >> >> Although I think the same applies to DDLm's own presentation, I am >> concerned about what would happen if DDLm were presented in a >> denormalized form that contained inconsistencies. Rather than >> expend continuing effort to ensure that a denormalized presentation >> of DDLm remains consistent, I would rather expend effort to express >> and maintain DDLm in its self-defined normalized form. In any case, >> I emphasize again that allowing a denormalized presentation is not >> at all the same thing as defining a denormalized model. >> >> None of the foregoing settles just what presentation rules DDLm >> should actually require with respect to joined categories. Should >> denormalizing joins be permitted? There is a cost/benefit analysis >> to be performed here, but I'm not up to attempting it at the moment. > > Which seems to leave us entirely with a matter of taste: does > anybody want to have a denormalized version of alias and its > subcategories? I am happy to do it either way. I just need > to use the sets. If nobody else speaks up, I'll just make a > guess and start programming on that basis. We can then look > at the result and figure out whether to use it or redo it in > Madrid. > > Herbert > > > At 4:22 PM -0600 2/1/11, Bollinger, John C wrote: >> Dear Herbert, >> >> On Monday, January 31, 2011 3:09 PM, Herbert J. Bernstein wrote: >> >>> At 1:20 PM -0600 1/31/11, Bollinger, John C wrote: >> >> [...] >> >>> This discussion began with adding what we were then calling styles >>> to group related sets of tags. One tag could have multiple styles. >>> In normalized form, that would mean creating a new relation with >>> the tags and the styles as components of a composite key, so the >>> say key could be repeated with multiple styles and the same >>> style could be repeated with multiple keys. >> >> Indeed so. This is what the ALIAS_DEFINITION_SET category provides >> (by whichever name it's now going). >> >>> Placing that directly in the alias category instead of >>> in a separate relation _is_ a denormalization. >> >> In a formal sense, I think you're saying that the result would not >> satisfy second normal form because _alias.dictionary_uri would >> depend on only part of the key (_alias.definition_id). I agree. >> That does rely on _alias.dictionary_uri not being part of a >> candidate key, but the current definition assumes that. >> >> If the only attributes were _alias.definition_id and >> _alias.definition_set_id, however, and both were elements of the >> key, then the category would comply even with domain-key normal >> form. One might in that case complain that the meaning of the ALIAS >> category was changed, and that would be true, but it would be as >> normalized as can be. >> >>> You happen to >>> have preferred to use the xref_code, but adding that to the >>> alias category key is and was a denormalization. In CIF, until >>> now at least, COMCIFS has tried to maintain a global name space, >>> with a given tag having one meaning across multiple dictionaries. >>> That is why there is a prefix registration system, so adding >>> the dictionary to the alias key should not be necessary. >> >> So this is exactly one of the conversations I said we needed to >> have: "What is the entity being modeled, and what assumptions are >> being made about it? [... T]his question could be framed as 'Should >> a dictionary identifier be added to the ALIAS category key?'" Thank >> you for indulging me. >> >> Xref_code, or some other dictionary identifier, is a different case >> than definition_set_id. Whereas there is no viable argument for >> definition_set_id being part of a candidate key for ALIAS as that >> category is currently defined, there *are* arguments for xref_code >> being part of a candidate key. We can choose how we want to model >> things, but the decision is not arbitrary: it has technical, >> semantic, and policy implications. >> >> > From a technical perspective, the question can be again reframed as >>> "does a definition_id determine the dictionary in which its >>> definition appears?" Inasmuch as the definition does not presently >>> include dictionary_uri in the category key, DDLm as currently >>> constituted appears to say "yes." I think that's erroneous. At >>> minimum, COMCIFs' intention seems to be to redefine many mmCIF data >>> names in a DDLm dictionary, and Herbert has expressed plans to do >>> similarly for imgCIF. Herbert nevertheless offers a contrasting >>> view: >> >>> The idea in CIF is that you _don't_ use the same tag name with >>> different meanings in different dictionaries, but with the introduction >>> of DDL2 and mmCIF we ended up with 2 versions of the same core definitions >>> having the same meanings but different tag names. Thus we needed to >>> have aliases to relate the DDL2 dotted notation versions of the >>> tags to the DDL1 undotted notations of the tags. >> >> I understand the original impetus for aliases. Interpreting DDL2, >> however, I conclude that the concept was broadened during >> development, and that the assumption of data names having global >> scope was intentionally avoided. Others here were closer to the >> process than I, but I observe that the description of the DDL2 >> ITEM_ALIASES category specifically says "Each alias name is >> *identified by* the name and version of the dictionary to which it >> belongs" (emphasis added). Indeed, the category key is >> (_item_aliases.alias_name, _item_aliases.dictionary, >> _item_aliases.version). That's even broader than anything currently >> under discussion for DDLm. ITG remarks that >> "_item_aliases.dictionary [... is] provided to distinguish between >> dictionaries [...]," which would not be necessary if a given data >> name could be assumed to be defined in only one dictionary, or even >> to be defined equivalently in every dictionary where it appears. >> >> As much as the idea may be to globally avoid data name clashes, it >> is not necessary to assume that they are successfully avoided. >> Rejecting that assumption not only protects against failures and >> policy changes in the CIF community, but it also makes DDLm a better >> candidate for adoption in disciplines with less central authority. >> Furthermore, although we do not need to follow DDL2 here, it does >> establish a precedent for scoping aliases to specific dictionaries. >> These are all good reasons to choose that, for DDLm's purposes, >> definition_id PLUS some form of dictionary identifier are required >> to uniquely identify an alias definition. Are there good reasons to >> choose otherwise? >> >> Supposing that we do adopt the view that unique identification of >> definitions requires at least definition_id and a dictionary >> identifier, ALIAS is not even a proper relation unless a dictionary >> identifier (such as xref_code) is added to the category key. >> >> [...] >> >>> I would be very happy having fully normalized DDLm dictionaries, but >>> I can cope with denormalized dictionaries, just as I have to cope >>> with denormalized datafiles -- indeed, for some search procedures, >>> I deliberately denormalize dictionaries internally. It >>> sounds like John B. wants to stick to fully normalized DDLm dictionaries. >> >> Hmm. I would be happy to see dictionaries define data models that >> comply with higher normalization forms, but that is a design >> decision that should rest with their authors and maintainers. I >> would in particular like DDLm itself to describe a highly normalized >> model for its own domain (dictionaries), though exactly which form >> would be most appropriate is an open question. Ensuring that DDLm >> describes a well-normalized data model does not force other DDLm >> dictionaries to describe equally normalized models. *Presentation* >> of these models, on the other hand, remains a separate issue, >> discussed next. >> >>> While this has some impact on software developers, it has very little >>> direct impact on users -- so what do people think: >>> >>> Should all DDLm dictionaries be fully normalized (if so, to which level >>> of normalization) or >>> >>> Should DDLm dictionaries bee allowed the same flexibility as >>> data files in being denormalized? >> >> I see no reason why DDLm instance documents (i.e. dictionaries) >> should have different presentation rules than the instance documents >> they themselves describe. Given a valid, possibly-denormalized >> instance document and a dictionary with which it complies, it must >> be possible to programmatically normalize the instance to the form >> described by the dictionary (else the document contains >> inconsistencies and therefore is invalid). DDLm dictionaries are >> instance documents of DDLm, so there is no need for different >> behavior with respect to them. >> >> Although I think the same applies to DDLm's own presentation, I am >> concerned about what would happen if DDLm were presented in a >> denormalized form that contained inconsistencies. Rather than >> expend continuing effort to ensure that a denormalized presentation >> of DDLm remains consistent, I would rather expend effort to express >> and maintain DDLm in its self-defined normalized form. In any case, >> I emphasize again that allowing a denormalized presentation is not >> at all the same thing as defining a denormalized model. >> >> None of the foregoing settles just what presentation rules DDLm >> should actually require with respect to joined categories. Should >> denormalizing joins be permitted? There is a cost/benefit analysis >> to be performed here, but I'm not up to attempting it at the moment. >> >> >> John >> >> -- >> John C. Bollinger, Ph.D. >> Department of Structural Biology >> St. Jude Children's Research Hospital >> >> >> Email Disclaimer: www.stjude.org/emaildisclaimer >> >> _______________________________________________ >> ddlm-group mailing list >> ddlm-group@iucr.org >> http://scripts.iucr.org/mailman/listinfo/ddlm-group > > -- ****************************************************************** John Westbrook, Ph.D. Rutgers, The State University of New Jersey Department of Chemistry and Chemical Biology 610 Taylor Road Piscataway, NJ 08854-8087 e-mail: jwest@rcsb.rutgers.edu Ph: (732) 445-4290 Fax: (732) 445-4320 ****************************************************************** _______________________________________________ ddlm-group mailing list ddlm-group@iucr.org http://scripts.iucr.org/mailman/listinfo/ddlm-group
Reply to: [list | sender only]
- Follow-Ups:
- Re: [ddlm-group] DDLm aliases (subject changed). .. .. .. .. .. .... .. . (Bollinger, John C)
- Re: [ddlm-group] DDLm aliases (subject changed). .. .. .. .. .. .... . (Herbert J. Bernstein)
- References:
- Re: [ddlm-group] DDLm aliases (subject changed) (James Hester)
- Re: [ddlm-group] DDLm aliases (subject changed). . (David Brown)
- Re: [ddlm-group] DDLm aliases (subject changed). . (Herbert J. Bernstein)
- Re: [ddlm-group] DDLm aliases (subject changed). .. . (Herbert J. Bernstein)
- Re: [ddlm-group] DDLm aliases (subject changed). .. .. . (Herbert J. Bernstein)
- Re: [ddlm-group] DDLm aliases (subject changed). .. .. .. . (John Westbrook)
- Re: [ddlm-group] DDLm aliases (subject changed). .. .. .. .. . (Herbert J. Bernstein)
- Re: [ddlm-group] DDLm aliases (subject changed). .. .. .. .. .. . (Herbert J. Bernstein)
- Re: [ddlm-group] DDLm aliases (subject changed). .. .. .. .. .. .. . (Herbert J. Bernstein)
- Re: [ddlm-group] DDLm aliases (subject changed). .. .. .. .. .. .... . (Herbert J. Bernstein)
- Prev by Date: Re: [ddlm-group] DDLm aliases (subject changed). .. .. .. .. .. .
- Next by Date: Re: [ddlm-group] DDLm aliases (subject changed). .. .. .. .. .. .... .
- Prev by thread: Re: [ddlm-group] DDLm aliases (subject changed). .. .. .. .. .. .... .
- Next by thread: Re: [ddlm-group] DDLm aliases (subject changed). .. .. .. .. .. .... .
- Index(es):