Discussion List Archives

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [ddlm-group] Objectives of CIF2 syntax discussion. .. .. .. .

Dear John B.

   The existing alias mechanism has been used in the past to carry actual 
information about tags from entry to entry and dictionary to dictionary, 
allowing one side or the other of such a definition to be less complete 
than it might otherwise have been, but also creating the sort of conflicts 
noted.  Any software aside, it is certainly confusing to users of the 
actual dictionaries to have such conflicting information around.  Just 
think of this as going back to Codd and trying to make sure we only have 
to update our data in one place and not several. It reduces the chance of 
error to gather the information about a group of tags with the same 
meaning, but different names in one place.

   As David and John W. note, there are important differences in data file 
construction rules, dare I say "styles" for DDL1, DDL2 and DDLm 
dictionary-based data files.  However, once we have a tag-by-tag style 
identification so we get the right tags, there is also no reason we cannot 
give our validators and writers knowledge of which of the several over-all 
styles we are trying to conform to, even the ever troubling difference in 
approach as to what can and cannot be looped, and whether the use of the 
period for category identification is mandatory (DDL2), optional (DDLm) or 
not normally used (DDL1)

   David has made a nice case for even more precise detail in the
alias category.  Rather than trying to overload the URI or depend
on other external resources, I urge that we provide the details
we need in the alias category in the alias catgeory.  Extending
the key is an interesting issue.   We may need to add the
implicit concept from DDL2 to DDLm or to make a subcategory.

> I think you slightly mischaracterize the problem here, which is to 
> maintain and use two or more related sub-dictionaries within the 
> framework of one DDLm dictionary.  The fact that one uses DDL1 formalism 
> and the other DDL2 formalism is a distinction that could be used in this 
> particular case, but not in general.  There is no reason to suppose that 
> any distinction weaker than dictionary identifiers (i.e. 
> _alias.dictionary_uri) would suffice.

Umm, I really don't follow your logic.  I agree that DDL1 versus
DDL2 is _not_ a sufficient range of possible distinctions.  How
does it then follow that the dictionary_uri will do the job.
Especially once we start acquiring multipurpose dictionaries,
the dictionary_uri becomes impossible to use for this purpose,
and very confusing it that uri leads to other uris.  It is a
lot simpler to just put the information needed directly in
the alias category.

The rest of what you are saying is that, in addition to using
the dictionary_uri as a real URI, it should also be overloaded
as a style, and that all dictionaries are registered with the
IUCr, so no confusion will result. I just checked the IUCr
web page and it does not have the very critically important
PDBx dictionary from wwPDB, and with the DDLm import mechanism
we are likely to end up with a very large number of cached
variants of dictionaries in various states of assembly.  Why
burden the IUCr with trying to untangle all that just to avoid
putting the real information we need (or at least I need) in
the alias category.

   I would suggest we give David the extra alias category tags
he is asking for as well as my tag_style identifier, with
reasonable default assumptions when they are not used.  If
I fail in what David calls my "noble" goal in DDLm-ing the
imgCIF dictionary and he fails in whatever use he makes of
the extended alias catgeory, what will it have cost those
who choose not to use these features in their dictionaries.
If we succeed on the other hand, you will have a few extra
useful tools you might decide to use in the future.


   Regards,
     Herbert

=====================================================
  Herbert J. Bernstein, Professor of Computer Science
    Dowling College, Kramer Science Center, KSC 121
         Idle Hour Blvd, Oakdale, NY, 11769

                  +1-631-244-3035
                  yaya@dowling.edu
=====================================================

On Thu, 20 Jan 2011, Bollinger, John C wrote:

>
> On Thursday, January 20, 2011 4:09 AM, Herbert J. Bernstein wrote:
>
>>   If a DDLm dictionary is to be a fully functional replacement for, 
>> say, a DDL1 dictionary, a dictionary against which one can validate the 
>> use of purely DDL1 tags, we need a way to not only specify the desired 
>> DDL1 tag as an alias to the DDLm tag used in the dictionary, but also 
>> to specify that we do _not_ want to accept the DDLm tag used as the 
>> save frame name as the valud name.  As David has noted, in order not to 
>> still be maintaining both a DDL1 and a DDLm dictionary, we want this 
>> information _in_ the DDLm dictionary, so simply aliasing back to some 
>> other
> DDL1 dictionary to use it as a way to say -- "use that dictionary URI as 
> the style indicator" is suboptimal. Worse, it is a source of future 
> errors and confusion in that it is defining properties of the tag that 
> may end up disagreeing with the properties we wish to actually have that 
> we defined in the DDLm dictionary.
>
>
> We appear to have an important difference of understanding here, which 
> requires a clarification of the semantics of the DDLm aliasing.  My 
> inference from the somewhat terse definitions of these attributes is 
> that _alias.dictionary_uri serves primarily as provenance information 
> and a namespace identifier.  I do not take it that the actual 
> definition, if any, in the referenced dictionary is intended to 
> contribute anything to the DDLm definition of the item.  The DDLm 
> definition is self-contained, and it is the responsibility of the author 
> of the DDLm dictionary to ensure that his definition is consistent with 
> those of the aliased items.  To the extent that there are any dictionary 
> compatibility problems involved, we already have them.
>
>
>> OK, so far, so good -- all we need then is John B.'s tag-by-tag style 
>> preference flag to say, for this dictionary we want to be DDL1'ish.
>
>
> I don't think you are interpreting my suggestion as I intended, but I'll 
> comment more fully on that elsewhere, if necessary.
>
>
>> Ah, but now we say, we are in the situation of maintaining the core 
>> (David's problem) in which we have to maintain a dictionary for 
>> validation against both DDL1 and DDL2 tag names.  Now there are times 
>> when we wish the DDL1 alias to be the preferred alias and for both the 
>> DDL2 and DDLm tags to fail a validation check and other times when we 
>> wish the DDL2 alias to be the prefeered alias and for both the DDL1 and 
>> DDLm tags to fail a validation check. Now it becomes simpler to just 
>> have a common style key, such as "DDL1" or "DDL2" and to select just 
>> the way we do for alternate conformers on that key.
>
>
> I think you slightly mischaracterize the problem here, which is to 
> maintain and use two or more related sub-dictionaries within the 
> framework of one DDLm dictionary.  The fact that one uses DDL1 formalism 
> and the other DDL2 formalism is a distinction that could be used in this 
> particular case, but not in general.  There is no reason to suppose that 
> any distinction weaker than dictionary identifiers (i.e. 
> _alias.dictionary_uri) would suffice.
>
> Consider, for instance, bringing the symmetry dictionary also into the 
> combined, DDLm-form core and mmCIF dictionary.  If I want to select only 
> symCIF aliases, or only mmCIF aliases for that matter, then how would I 
> do it?  I could define tag_styles that serve, but at that point those 
> tag styles are filling exactly the same role that the dictionary URIs 
> could and would.  Dictionary URIs will always suffice for this 
> particular job, however, because they express exactly the distinction 
> that is required.
>
> That's why I asked for use cases that don't map onto distinguishing tags 
> based on dictionary or attributes.
>
>
>> OK, that was not so bad, but now we are at, say, the PDB and in 
>> addition to having DDL1 and DDL2 style tags from the core, we also have 
>> prefixed tags (pdbx) that should eventually get promoted to be 
>> prefix-free.  Now we can use the styles to validate for strict use of 
>> the prefixes when we are producing output that we want to be certain 
>> actually does use the prefixes, or relax the validation to allow both 
>> the prefixed and promoted tags, or go strict again on the far side to 
>> be sure be are only producting promoted tags.
>
>
> So here we are getting into potential use cases such as I had requested, 
> but this one by itself doesn't yet persuade me.  As I study the topic, I 
> suspect I am becoming harder to persuade.  I have realized that "group 
> of tags" is a fairly good minimal description of "dictionary" in the 
> sense that we are (I am) using the term, so I am having more difficulty 
> seeing the tag_style proposal as introducing anything new.  In this 
> particular case, I don't see why pdbx tag aliases should not anyway have 
> their own distinguishing dictionary_uri, and if they did, I don't see 
> why that dictionary URI would not support all of the proposed operations 
> just as well as tag_style might.  Even if they didn't, the pdbx alias is 
> a characteristic of the tags themselves, so there are multiple 
> straightforward ways that an application could perform the PDB-specific 
> validation you describe without relying on tag_style or dictionary_uri.
>
>
>> Note that none of these style based input validation choices are based 
>> on the choice of dictionary -- it is one dictionary, so it does not 
>> really help to be maintaining the styles dictionary by dictionary. 
>> The grain of identification is too coarse, and involves multiple 
>> maintenance issues when in reality only one, nice new, DDLm dictionary 
>> needs to be maintained.
>
>
> I see no maintenance issue here.  The DDLm dictionary could indeed be 
> the only one maintained, and to the extent that stand-alone versions of 
> its sub-dictionaries were desired, they could be generated 
> programmatically from the DDLm version -- provided that we retain a 
> mechanism for identifying which aliases represent tags in which 
> sub-dictionary.  The _alias.dictionary_uri attribute does that nicely.
>
> To some extent, this argument seems to revolve around the idea of a 
> dictionary URI necessarily referring to a physical, independently 
> addressable dictionary.  As I said before, I see no reason to place that 
> limitation on the item's use.  Not restricting it in that way would 
> provide considerable freedom, and I propose that we in fact do clarify 
> that DDLm places no such restriction.
>
> To the limited extent that there might be any need to retrieve the 
> source dictionaries of aliases, the IUCr dictionary register already 
> provides a mechanism for doing so.  If it were desired to record that 
> information directly in DDLm dictionaries, then it would be useful to 
> distinguish identifier from location, as XML Schema does (namespace URI 
> vs. schema location), and to record the location associated with each 
> identifier once per DDLm dictionary rather than at every use of the 
> identifier.
>
>
>> On the output side, essentially the same issues arise, but there are 
>> fewer users, but as I said, it is a harmless addition to the DDLm spec 
>> for those who do not wish to be aware of it, and for those of use for 
>> whom it is useful, it really is useful.
>
>
> On the output side, the same congruence between tag_style and 
> dictionary_uri still applies.
>
> Users do not need to be aware of the feature to be negatively affected 
> by its inclusion.
>
> As for whether it is useful, as far as I am concerned that depends on 
> whether an essential and useful difference between tag_style and 
> dictionary_uri can be drawn.  So far, all the proposed uses that have 
> been raised seem natural fits for dictionary_uri.
>
>
>> The fundamental diagreement is on whether we will have to have a DDL1 
>> dictionary, a DDL2 dictionary, a DDLm dictionary, a prefix dictionary, 
>> etc., and plant them on assorted web sites, or just one DDLm dictionary 
>> that handles everything and can be local or remote or in local and 
>> remote pieces without changing the behavior of the validation or of the 
>> output.
>
>
> No, I think at this point the fundamental disagreement is about the 
> meaning and semantics of _alias.dictionary_uri.  As I conceive it, use 
> of a dictionary URI to group aliases serves every purpose so far 
> proposed for tag_style, cleanly and naturally, *and does not require or 
> imply independent existence or maintenance of any other dictionary*. 
> It is in fact exceedingly similar to my current understanding of 
> tag_style, especially when tag_style is paired with a registry of 
> allowed values.  Thus arises my strengthening objection to adding a new 
> attribute that to me appears redundant.
>
> Let's settle the question of _alias.dictionary_uri first.  That will be 
> worthwhile in its own right, and the results will bear directly on 
> whether a new attribute is warranted.  Specifically, I propose the 
> following DDLm changes:
>
> 1) In the ALIAS category, attribute _category_key.generic is replaced 
> by:
>    _category_key.primitive [ '_alias.dictionary_uri' '_alias.definition_id' ]
>
> This is useful to cover cases where the same data name appears in more 
> than one dictionary, and we want to mark both appearances as aliases of 
> the defined item.  I raise it in this context, however, because it 
> emphasizes _alias.dictionary_uri's use as a namespace for the alias. 
> In conjunction with this change, it might be appropriate to change the 
> _type.purpose value for one of these tags.
>
> 2) The definition text for _alias.dictionary_uri is amended to 
> "Specifies the universal resource identifier of the abstract or physical 
> dictionary containing the definition of an item aliased to the item in 
> the current definition.  This serves to categorize and fully identify 
> the alias, but does not imply that the URI can be used to retrieve a 
> physical dictionary defining it."
>
> This clarifies the attribute's meaning in the direction that makes the 
> most sense to me.  Alternative clarifications are possible, of course.
>
>
> Best Regards,
>
> John
> --
> John C. Bollinger, Ph.D.
> Department of Structural Biology
> St. Jude Children's Research Hospital
>
>
> Email Disclaimer:  www.stjude.org/emaildisclaimer
>
> _______________________________________________
> ddlm-group mailing list
> ddlm-group@iucr.org
> http://scripts.iucr.org/mailman/listinfo/ddlm-group
>
_______________________________________________
ddlm-group mailing list
ddlm-group@iucr.org
http://scripts.iucr.org/mailman/listinfo/ddlm-group

Reply to: [list | sender only]
International Union of Crystallography

Scientific Union Member of the International Science Council (admitted 1947). Member of CODATA, the ISC Committee on Data. Partner with UNESCO, the United Nations Educational, Scientific and Cultural Organization in the International Year of Crystallography 2014.

International Science Council Scientific Freedom Policy

The IUCr observes the basic policy of non-discrimination and affirms the right and freedom of scientists to associate in international scientific activity without regard to such factors as ethnic origin, religion, citizenship, language, political stance, gender, sex or age, in accordance with the Statutes of the International Council for Science.