[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [ddlm-group] Objectives of CIF2 syntax discussion. .. .. .

Dear John,

   Clearly there will be cases in which a true DDL1 dictionary
will be necessary for DDL1 validation and a true DDL2 dictionary will be 
necessary for DDL2 dictionary, but I think there is a significant 
possibility of being able to create a single DDLm dictionary that
can be used in different modes, but with exactly the same text
and methods in the dictionary itself, for true DDL1, true DDL2 and DDLm
validation.  Personally, my intention is to do precisely that
for imgCIF, so that there will be one, new DDLm imgCIF dictionary
that can be used to validate exsiting DDL2-style CBFs, that
will be suitable to validate new CIF2 style CBFs, and, as
a bonus, add the ability to have DDL1-style CBFs.  You may
or may not choose to do that for the mmCIF dictionary and PDB files that 
you generate, but I suggest you may find it useful to make a
DDLm mmCIF dictionary that can be used to validate the wild
and hairy varieties (styles )of CIFS that depositors are likely to come
up with to give you data.

A disadvantage of the current alias mechanism is that it
is unclear about the priority to be given to tags from
other dictionaries.   That is fine when all you want from
the input validation is to do a translation of the foreign
dictiionary tags to the local equivalents, but in the world
we are now entering it is likely be be very useful to be
able to work in that older mode, but also to be able to
validate against the style of the older dictionaries without
having to keep maintaining them, and for filling in missing
values, to be able to generate the old-style tags, rather
than the new-style tags to not give indigestion to legacy
sofware that does not know about DDLm.  Perhaps 10 or 20
years from now, all data and software will have been converted
to just use DDLm style tags, but in the meantime, flexible
alias mechanisms will help with the transition.  _alias.tag_style
makes the alias mechanism more flexible and less dependent
on keeping and maintaining the old dictionaries.  It is just
an option.  Perhaps it will only be used in the imgCIF
dictionary, but it is a useful option, and I suspect it
will get used elsewhere as well.

Regards,
   Herbert


=====================================================
  Herbert J. Bernstein, Professor of Computer Science
    Dowling College, Kramer Science Center, KSC 121
         Idle Hour Blvd, Oakdale, NY, 11769

                  +1-631-244-3035
                  yaya@dowling.edu
=====================================================

On Thu, 20 Jan 2011, John Westbrook wrote:

>
> Herbert and David,
>
> Could I ask for some clarification on the requirements for the aliasing
> mechanism.     In particular is this intended to provide more than naming
> correspondence between the current dictionary and the some prior dictionary.
> In DDL2 we have used ITEM_ALIASES as in the example below to provide name
> correspondences between our dictionary, the CIF core dictionary, and other
> recognized variant dictionaries.   The image dictionary has done this
> similarly I believe.
>
> I am confused by the last set of messages about how this will be used
> backwardly for validation.   The semantics of new dictionaries may
> enforce a new potentially stricter set of rules that are not necessarily
> backwardly compatible.   I am wondering what is expected here.
>
> John
>
>
>
> save__atom_site.Cartn_z
>     _item_description.description
> ;              The z atom-site coordinate in angstroms specified according to
>                a set of orthogonal Cartesian axes related to the cell axes as
>                specified by the description given in
>                _atom_sites.Cartn_transform_axes.
> ;
>     _item.name                  '_atom_site.Cartn_z'
>     _item.category_id             atom_site
>     _item.mandatory_code          no
>     _item_aliases.alias_name    '_atom_site_Cartn_z'
>     _item_aliases.dictionary      cif_core.dic
>     _item_aliases.version         2.0.1
>     loop_
>     _item_dependent.dependent_name
>                                 '_atom_site.Cartn_x'
>                                 '_atom_site.Cartn_y'
>     _item_related.related_name  '_atom_site.Cartn_z_esd'
>     _item_related.function_code   associated_esd
>     _item_sub_category.id         cartesian_coordinate
>     _item_type.code               float
>     _item_type_conditions.code    esd
>     _item_units.code              angstroms
>      save_
>
> On 1/20/11 6:10 AM, Herbert J. Bernstein wrote:
>> Dear Colleagues,
>>
>> There is an importantr part of James' suggestions that,
>> if Brian is willing, I think it would be a good idea
>> to add to the _alias.tag_style proposal and that is
>> a central registry of styles to facilty dictionary
>> merging. The ground rules would be:
>>
>> COMCIFS approval for any style, such as DDL1, DDL2,
>> DDLm, etc., unless prefixed by a prefix from Brian's
>> prefix registry, e.g. pdbx_. The special prefix
>> local_ could be used for styles for use purely
>> locally, i.e. for private dictionaries for which]
>> collisions on merging are not a concern.
>>
>> Regards,
>> Herbert
>>
>> =====================================================
>> Herbert J. Bernstein, Professor of Computer Science
>> Dowling College, Kramer Science Center, KSC 121
>> Idle Hour Blvd, Oakdale, NY, 11769
>>
>> +1-631-244-3035
>> yaya@dowling.edu
>> =====================================================
>>
>> On Thu, 20 Jan 2011, Herbert J. Bernstein wrote:
>>
>>> Dear Colleagues,
>>>
>>> If a DDLm dictionary is to be a fully functional replacement
>>> for, say, a DDL1 dictionary, a dictionary against which one
>>> can validate the use of purely DDL1 tags, we need a way to
>>> not only specify the desired DDL1 tag as an alias to the
>>> DDLm tag used in the dictionary, but also to specify that
>>> we do _not_ want to accept the DDLm tag used as the save
>>> frame name as the valud name. As David has noted, in
>>> order not to still be maintaining both a DDL1 and a
>>> DDLm dictionary, we want this information _in_ the DDLm
>>> dictionary, so simply aliasing back to some other
>>> DDL1 dictionary to use it as a way to say -- "use that dictionary
>>> URI as the style indicator" is suboptimal. Worse, it is a
>>> source of future errors and confusion in that it is
>>> defining properties of the tag that may end up disagreeing
>>> with the properties we wish to actually have that we
>>> defined in the DDLm dictionary.
>>>
>>> OK, so far, so good -- all we need then is John B.'s tag-by-tag
>>> style preference flag to say, for this dictionary we want to
>>> be DDL1'ish.
>>>
>>> Ah, but now we say, we are in the situation of maintaining
>>> the core (David's problem) in which we have to maintain
>>> a dictionary for validation against both DDL1 and DDL2
>>> tag names. Now there are times when we wish the DDL1
>>> alias to be the preferred alias and for both the DDL2
>>> and DDLm tags to fail a validation check and other times when
>>> we wish the DDL2 alias to be the prefeered alias and
>>> for both the DDL1 and DDLm tags to fail a validation check.
>>> Now it becomes simpler to just have a common style key,
>>> such as "DDL1" or "DDL2" and to select just the way we
>>> do for alternate conformers on that key.
>>>
>>> OK, that was not so bad, but now we are at, say, the PDB
>>> and in addition to having DDL1 and DDL2 style tags from
>>> the core, we also have prefixed tags (pdbx) that should
>>> eventually get promoted to be prefix-free. Now we can
>>> use the styles to validate for strict use of
>>> the prefixes when we are producing output that we want
>>> to be certain actually does use the prefixes, or
>>> relax the validation to allow both the prefixed and promoted
>>> tags, or go strict again on the far side to be sure be
>>> are only producting promoted tags.
>>>
>>> Note that none of these style based input validation choices
>>> are based on the choice of dictionary -- it is one dictionary,
>>> so it does not really help to be maintaining the styles
>>> dictionary by dictionary. The grain of identification is
>>> too coarse, and involves multiple maintenance issues when
>>> in reality only one, nice new, DDLm dictionary needs to be
>>> maintained.
>>>
>>> On the output side, essentially the same issues arise, but
>>> there are fewer users, but as I said, it is a harmless
>>> addition to the DDLm spec for those who do not wish to
>>> be aware of it, and for those of use for whom it is
>>> useful, it really is useful.
>>>
>>> The fundamental diagreement is on whether we will have
>>> to have a DDL1 dictionary, a DDL2 dictionary, a DDLm
>>> dictionary, a prefix dictionary, etc., and plant them
>>> on assorted web sites, or just one DDLm dictionary that
>>> handles everything and can be local or remote or in
>>> local and remote pieces without changing the behavior
>>> of the validation or of the output.
>>>
>>> I hope that those who are uncomfortable with this change
>>> will reconsider and support it. Thanks to David's clear
>>> thinking it is a clean, simple and useful idea, much
>>> better than my original import suggestion.
>>>
>>> Please support it.
>>>
>>> Regards,
>>> Herbert
>>>
>>> =====================================================
>>> Herbert J. Bernstein, Professor of Computer Science
>>> Dowling College, Kramer Science Center, KSC 121
>>> Idle Hour Blvd, Oakdale, NY, 11769
>>>
>>> +1-631-244-3035
>>> yaya@dowling.edu
>>> =====================================================
>>>
>>> On Thu, 20 Jan 2011, James Hester wrote:
>>>
>>>> I'm trying to get a grip on what problem the tag_style proposal
>>>> solves. I'll just emphasise at the outset in case there are any
>>>> misconceptions that it is incorrect to suppose that the dREL method
>>>> knows or needs to know anything about the particular syntax in which
>>>> an input or output value is expressed; dREL is concerned purely with
>>>> describing relationships.
>>>>
>>>> Here are the two scenarios that I think are being discussed under the
>>>> rubric of DDLm compatibility with CIF1:
>>>>
>>>> Scenario 1: given a DDLm dictionary, a program wishes to generate and
>>>> (validate/insert) the value for some given CIF1 dataname in a CIF1
>>>> datafile, using other CIF1 tags found in that datafile. We are all
>>>> agreed (I think) that locating the relevant DDLm dictionary entries
>>>> for a CIF1 dataname is a simple and well-defined task. The formatting
>>>> of the eventual output value of the DDLm method is also not in the
>>>> purvey of the dictionary, but rather of the application that is using
>>>> the dictionary. The particular CIF1 tag to put in the datafile is
>>>> also not an issue, as that was given at the beginning. So the
>>>> tag_style proposal is not relevant here.
>>>>
>>>> Scenario 2: given a CIF2 datafile, a DDLm application wishes to
>>>> produce an equivalent CIF1 datafile. For many of the CIF2 datanames
>>>> found in the CIF2 datafile, there are multiple possible datanames
>>>> listed as aliases. How is the application to ensure that it writes a
>>>> set of datanames from DDL1 dictionaries only or DDL2 dictionaries
>>>> only? The simple solution alluded to by John B would be to do as
>>>> follows: for each dictionary URI mentioned in the alias list, use the
>>>> IUCr CIF dictionary register (and/or other canonical sources) to
>>>> determine the DDL version of that dictionary. DDL conformance is a
>>>> standard entry in the dictionary register. The latest dictionary
>>>> version as given in the dictionary register could be selected where
>>>> multiple versions are presented (URL for the register is
>>>> ftp://ftp.iucr.org/pub/cifdics/cifdic.register).
>>>>
>>>> Of course, any program wanting to do such conversions efficiently
>>>> would pregenerate a DDL version - dictionary table once and refer to
>>>> that. I therefore see no use, either in terms of efficiency or new
>>>> functionality, for the tag_style attribute.
>>>>
>>>> Please advise if I have misunderstood the problem.
>>>>
>>>> James.
>>>> On Thu, Jan 20, 2011 at 11:20 AM, Herbert J. Bernstein
>>>> <yaya@bernstein-plus-sons.com> wrote:
>>>>> No, a tag style is simply supposed to identify a grouping of alias
>>>>> tag choices that belong together, so you can decide to put out
>>>>> those particular versions of tags.  It is just a text string,
>>>>> just like a alternate conformer identifier.
>>>>>
>>>>> The same tag name could be marked with a many tag styles as
>>>>> you choose.  It is just text.  But you could not give multiple
>>>>> aliases for the same DDLm tag for the same tag style when allowing
>>>>> DDLm missing value generation or you would not know which version to put
>>>>> out, and for validation, there is no reason not to use different
>>>>> styles for the different alternatives.
>>>>>
>>>>> The way I will write the extraction algorithm, if you choose
>>>>> a tag style, you will get the DDLm name for the tags that don't
>>>>> have an alias for the chosen style, but the tag alias given for the
>>>>> specified style is there is one.  That way a dictionary that is
>>>>> intended to support DDL1, DDL2 and DDLm for which the DDLm
>>>>> tags happen to be primarily consistent with DDL2 conventions,
>>>>> then for the tags that conform to DDL2 conventions, you will
>>>>> not need a DDL2 style alias, just a DDL1 style alias.  You will
>>>>> only need both a DDL1 style alias and a DDL2 style alias for
>>>>> a tag for which the DDLm tag is different from both, e.g.
>>>>> for _diffrn_standards_decay_% (DDL1), _diffrn_standards.decay_%
>>>>> (DDL2) and _diffrn_standards_decay_percent (DDLm).  When you
>>>>> want DDLm output and validation, you don't specify a style at all.
>>>>>
>>>>> This will be very nice to allow an automatic cleanup for dictionaries
>>>>> using a prefix, say pdbx, for tags that later get promoted to
>>>>> to not need a prefix.
>>>>>
>>>>> Regards,
>>>>>   Herbert
>>>>>
>>>>> =====================================================
>>>>>  Herbert J. Bernstein, Professor of Computer Science
>>>>>    Dowling College, Kramer Science Center, KSC 121
>>>>>         Idle Hour Blvd, Oakdale, NY, 11769
>>>>>
>>>>>                  +1-631-244-3035
>>>>>                  yaya@dowling.edu
>>>>> =====================================================
>>>>>
>>>>> On Wed, 19 Jan 2011, Bollinger, John C wrote:
>>>>>
>>>>>> On Wednesday, January 19, 2011 3:47 PM, Herbert J. Bernstein wrote:
>>>>>>>   The definition_id most certainly does not exhibit the tag
>>>>>>> style.  For example, there is no way to distinguish DDLm
>>>>>>> tag style from DDL2 or DDL2 tag style from context.  That
>>>>>>> is intentionally inherent in the design of DDLm.
>>>>>>
>>>>>> Then I'm afraid I don't quite comprehend the meaning of "tag style".  I
>>>>>> would like to do, so that I can form a well-founded opinion about it.
>>>>>>
>>>>>> As I thought I had understood the idea, the tag style is proposed to
>>>>>> identify the set of DDL conventions with which the given alias complies.
>>>>>> If that were indeed what it was intended to mean, however, then (1) as
>>>>>> you observe, some names would comply with more than one set of
>>>>>> conventions, but also (2) a set of candidate tag styles, at least, could
>>>>>> be generated could be computed for any alias name.
>>>>>>
>>>>>> What would be the significance of marking an alias that conforms with
>>>>>> both DDL2 and DDLm conventions with tag style DDL2?
>>>>>>
>>>>>> Might it ever be needful or useful to mark the same alias with more than
>>>>>> one tag style?
>>>>>>
>>>>>>>   As for defining a hypothetical URI, that can break,
>>>>>>> or each least time-out programs trying to get additional
>>>>>>> information about an aliased tag from that URI.  URIs
>>>>>>> should be for things that really exist on the web,
>>>>>>> not a substitute for a tag that really defines something
>>>>>>> different, in this case the style of tags.
>>>>>>
>>>>>> I don't think the issue is nearly so clear cut.  I would hold, for example, that the primary purpose of a URI is to *identify*
>>>>>> a resource.  That's what the "I" stands for, as I'm sure you're aware.  RFC 3986 (Uniform Resource Identifier (URI): General
>>>>>> Syntax) explicitly provides that a URI may identify an abstract resource.  RFC 2396 (now obsoleted by 3986) says the same.
>>>>>>  Although many URIs fulfill their purpose by serving as resolvable web addresses, some, even among those formatted as URLs, do
>>>>>> not.  Examples of the latter abound in various XML communities.
>>>>>>
>>>>>> Personally, however, I think a bit more like you do: a URL ought to refer to a retrievable resource on the web.  For an
>>>>>> abstract or virtual resource, therefore, I prefer to use a URN.  For something like your virtual DDL1 imgCIF dictionary, I
>>>>>> might choose something like urn:x-imgCIF:DDL1.  If a URN were used, then programs assuming a resolvable URL might still break,
>>>>>> but only if they were poorly crafted indeed would they hang pending a time out.  The whole issue could largely be mooted by
>>>>>> clarifying the purpose and intended usage of _alias.dictionary_uri in its definition.  That need not prevent programs from
>>>>>> attempting to resolve dictionary URIs, but if it specified that dictionary URIs might be permanently unresolvable then
>>>>>> programmers would know to prepare for that possibility.
>>>>>>
>>>>>>>   We already do something very similar to this with
>>>>>>> alternate conformers and with NMR model numbers.  It
>>>>>>> really is a simply concept for organizing information
>>>>>>> that belongs in groups, in this case the group of
>>>>>>> DDL1 or DDL2 or DDLm or ... style tags.
>>>>>>
>>>>>> I think that makes it a bit clearer to me what you want to do, but I'm still interested in the answers to my questions above.
>>>>>>  I'm a bit uncomfortable with defining generic groups of aliases with per-dictionary semantics, if that's indeed what you're
>>>>>> proposing.  For one thing, it does not play well with dictionary merging.  For another, the meaning of the groupings is nowhere
>>>>>> defined, at least not without adding at least one more data names to DDLm for that purpose.
>>>>>>
>>>>>> On the other hand, data names have at least one natural grouping: the dictionaries in which they are defined.  This grouping is
>>>>>> already modeled in DDLm, and as far as I can tell, it is conceptually a perfect fit for what you want to do.
>>>>>>
>>>>>> That doesn't necessarily mean that there is no use for a more general grouping mechanism.  I am curious indeed whether there
>>>>>> are use cases for grouping data names that do not align well with dictionaries or dictionary-defined attributes.  Can anyone
>>>>>> suggest some?
>>>>>>
>>>>>>>  It solves
>>>>>>> a very real problem for me with imgCIF.  It does
>>>>>>> not harm to anybody else.  If nobody uses it in
>>>>>>> another dictionary, it still would have been a useful
>>>>>>> addition to DDLm.
>>>>>>
>>>>>> I very much want you to have a solution to your problem, and I have suggested one that still seems absolutely natural to me.
>>>>>>  It may be that there are better alternatives, and perhaps even that tag style would be one such.  Of the latter, however, I am
>>>>>> not yet persuaded.
>>>>>>
>>>>>> Perhaps "harm" is too charged a word, but adding an additional attribute to DDLm certainly does cost everyone else.  Every DDLm
>>>>>> application must support all the DDLm attributes, so every additional attribute places a development and maintenance burden on
>>>>>> multiple developers.  That incrementally slows software release cycles and introduces additional space for bugs and
>>>>>> incompatibilities to hide.  It's a small cost for most people, but everyone pays it.  The proposed tag style is no different in
>>>>>> that regard from any other DDLm attribute, of course, but that doesn't mean that its cost should be ignored.
>>>>>>
>>>>>> As for whether it would be a useful addition to DDLm, that is exactly what I am trying to decide.  Potential use cases such as
>>>>>> I solicited above would help me make that decision.
>>>>>>
>>>>>>>   In the end, I suspect that both core and mmCIF DDLm
>>>>>>> dictionaries will be built this way, because it
>>>>>>> make it simpler and clearer and allows multi-purpose
>>>>>>> dictionaries to be self-contained and avoid the
>>>>>>> maintenance headache David spotted.
>>>>>>
>>>>>> If by "multi-purpose dictionaries" you mean defining multiple virtual dictionaries via a single DDLm dictionary, such as you
>>>>>> plan, then I still see the dictionary_uri as the natural way to use aliases for that purpose.  If there is a broader concept
>>>>>> here then please help me see it.
>>>>>>
>>>>>>
>>>>>> Regards,
>>>>>>
>>>>>> John
>>>>>>
>>>>>> --
>>>>>> John C. Bollinger, Ph.D.
>>>>>> Department of Structural Biology
>>>>>> St. Jude Children's Research Hospital
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> Email Disclaimer:  www.stjude.org/emaildisclaimer
>>>>>>
>>>>>> _______________________________________________
>>>>>> ddlm-group mailing list
>>>>>> ddlm-group@iucr.org
>>>>>> http://scripts.iucr.org/mailman/listinfo/ddlm-group
>>>>>>
>>>>> _______________________________________________
>>>>> ddlm-group mailing list
>>>>> ddlm-group@iucr.org
>>>>> http://scripts.iucr.org/mailman/listinfo/ddlm-group
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> T +61 (02) 9717 9907
>>>> F +61 (02) 9717 3145
>>>> M +61 (04) 0249 4148
>>>> _______________________________________________
>>>> ddlm-group mailing list
>>>> ddlm-group@iucr.org
>>>> http://scripts.iucr.org/mailman/listinfo/ddlm-group
>>>
>>
>>
>> _______________________________________________
>> ddlm-group mailing list
>> ddlm-group@iucr.org
>> http://scripts.iucr.org/mailman/listinfo/ddlm-group
>
> -- 
> ******************************************************************
>   John Westbrook, Ph.D.
>   Rutgers, The State University of New Jersey
>   Department of Chemistry and Chemical Biology
>   610 Taylor Road
>   Piscataway, NJ 08854-8087
>   e-mail: jwest@rcsb.rutgers.edu
>   Ph:  (732) 445-4290  Fax: (732) 445-4320
> ******************************************************************
> _______________________________________________
> ddlm-group mailing list
> ddlm-group@iucr.org
> http://scripts.iucr.org/mailman/listinfo/ddlm-group
>
_______________________________________________
ddlm-group mailing list
ddlm-group@iucr.org
http://scripts.iucr.org/mailman/listinfo/ddlm-group

Reply to: [list | sender only]