[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [ddlm-group] Objectives of CIF2 syntax discussion. .. .. .

To: Group finalising DDLm and associated dictionaries <[email protected]>
Subject: Re: [ddlm-group] Objectives of CIF2 syntax discussion. .. .. .
From: "Herbert J. Bernstein" <[email protected]>
Date: Thu, 20 Jan 2011 15:12:12 -0500 (EST)
In-Reply-To: <[email protected]>
References: <[email protected]><[email protected]><8F77913624F7524AACD2A92EAF3BFA54166D7D1ECE@SJMEMXMBS11.stjude.sjcrh.local><[email protected]><[email protected]><[email protected]><8F77913624F7524AACD2A92EAF3BFA54166D7D1ED0@SJMEMXMBS11.stjude.sjcrh.local><[email protected]><8F77913624F7524AACD2A92EAF3BFA54166D7D1ED1@SJMEMXMBS11.stjude.sjcrh.local><[email protected]><[email protected]><[email protected]><[email protected]><[email protected]> <[email protected]>

Dear David,

   If we give it a try, we might succeed.  If we don't even
try we definitely won't succeed.

   Regards,
     Herbert

=====================================================
  Herbert J. Bernstein, Professor of Computer Science
    Dowling College, Kramer Science Center, KSC 121
         Idle Hour Blvd, Oakdale, NY, 11769

                  +1-631-244-3035
                  [email protected]
=====================================================

On Thu, 20 Jan 2011, David Brown wrote:

> Dear Colleagues,
> 
> It is unlikely that we will be able to abandon our DDL1 and DDL2
> dictionaries as lightly as Herbert suggests, though the goal is a noble
> one.� We must remember that it is not just the tags that are different
> between the different dictionaries but also the structure.� DDLm is more
> structured than DDL2 which in turn is more structured that DDL1.� Many CIFs
> based on DDL1 have bent the rules while we were learning how best to
> structure CIF and it is possible that some problems will occur in reading
> early CIFs because of this.� We might also find cases where a CIF includes a
> loop that is not allowed in the DDLm dictionaries but we will only discover
> this by experiment, which requires working software.� Similarly any datafile
> written using a DDLm dictionary will be able to produce a CIF filled with
> DDL1 dictionary datanames, but the structure will still correspond to DDLm.�
> In most cases this is unlikely to be a problem, but we will only find out
> when we have working software.�
> 
> The aliases should contain the following information: The tag, the
> dictionary in which it appears, the version of this dictionary, the DDL in
> which the dictionary is written (a given dictionaray may be written using
> different DDLs as for example the symmetry dictinoary was written in DDL2
> and parts converted to DDL1), a flag to indicate whether the dataname is
> deprecated (needed for writing files) and a pointer to where the named
> dictionary can be found.� This may be a public archive or a local file that
> in turn points either to a local source or the public archive depending on
> the local institution.� There may be rare occasions when someone may want to
> write a program to produce a CIF in an earlier version that is compatible
> with software that is unaware of the later datanames.
> 
> The goal, as I say, is noble and is worth shooting for.� Whether we reach
> the goal can only be determined when we have working software, but we should
> design the system on the assumption that it will work.
> 
> David
> 
> 
> 
> 
> John Westbrook wrote:
> 
> Herbert and David,
> 
> Could I ask for some clarification on the requirements for the aliasing
> mechanism.     In particular is this intended to provide more than naming
> correspondence between the current dictionary and the some prior dictionary.
> In DDL2 we have used ITEM_ALIASES as in the example below to provide name
> correspondences between our dictionary, the CIF core dictionary, and other
> recognized variant dictionaries.   The image dictionary has done this
> similarly I believe.
> 
> I am confused by the last set of messages about how this will be used
> backwardly for validation.   The semantics of new dictionaries may
> enforce a new potentially stricter set of rules that are not necessarily
> backwardly compatible.   I am wondering what is expected here.
> 
> John
> 
> 
> 
> save__atom_site.Cartn_z
>      _item_description.description
> ;              The z atom-site coordinate in angstroms specified according t
> o
>                 a set of orthogonal Cartesian axes related to the cell axes 
> as
>                 specified by the description given in
>                 _atom_sites.Cartn_transform_axes.
> ;
>      _item.name                  '_atom_site.Cartn_z'
>      _item.category_id             atom_site
>      _item.mandatory_code          no
>      _item_aliases.alias_name    '_atom_site_Cartn_z'
>      _item_aliases.dictionary      cif_core.dic
>      _item_aliases.version         2.0.1
>      loop_
>      _item_dependent.dependent_name
>                                  '_atom_site.Cartn_x'
>                                  '_atom_site.Cartn_y'
>      _item_related.related_name  '_atom_site.Cartn_z_esd'
>      _item_related.function_code   associated_esd
>      _item_sub_category.id         cartesian_coordinate
>      _item_type.code               float
>      _item_type_conditions.code    esd
>      _item_units.code              angstroms
>       save_
> 
> On 1/20/11 6:10 AM, Herbert J. Bernstein wrote:
> 
> Dear Colleagues,
> 
> There is an importantr part of James' suggestions that,
> if Brian is willing, I think it would be a good idea
> to add to the _alias.tag_style proposal and that is
> a central registry of styles to facilty dictionary
> merging. The ground rules would be:
> 
> COMCIFS approval for any style, such as DDL1, DDL2,
> DDLm, etc., unless prefixed by a prefix from Brian's
> prefix registry, e.g. pdbx_. The special prefix
> local_ could be used for styles for use purely
> locally, i.e. for private dictionaries for which]
> collisions on merging are not a concern.
> 
> Regards,
> Herbert
> 
> =====================================================
> Herbert J. Bernstein, Professor of Computer Science
> Dowling College, Kramer Science Center, KSC 121
> Idle Hour Blvd, Oakdale, NY, 11769
> 
> +1-631-244-3035
> [email protected]
> =====================================================
> 
> On Thu, 20 Jan 2011, Herbert J. Bernstein wrote:
> 
> Dear Colleagues,
> 
> If a DDLm dictionary is to be a fully functional replacement
> for, say, a DDL1 dictionary, a dictionary against which one
> can validate the use of purely DDL1 tags, we need a way to
> not only specify the desired DDL1 tag as an alias to the
> DDLm tag used in the dictionary, but also to specify that
> we do _not_ want to accept the DDLm tag used as the save
> frame name as the valud name. As David has noted, in
> order not to still be maintaining both a DDL1 and a
> DDLm dictionary, we want this information _in_ the DDLm
> dictionary, so simply aliasing back to some other
> DDL1 dictionary to use it as a way to say -- "use that dictionary
> URI as the style indicator" is suboptimal. Worse, it is a
> source of future errors and confusion in that it is
> defining properties of the tag that may end up disagreeing
> with the properties we wish to actually have that we
> defined in the DDLm dictionary.
> 
> OK, so far, so good -- all we need then is John B.'s tag-by-tag
> style preference flag to say, for this dictionary we want to
> be DDL1'ish.
> 
> Ah, but now we say, we are in the situation of maintaining
> the core (David's problem) in which we have to maintain
> a dictionary for validation against both DDL1 and DDL2
> tag names. Now there are times when we wish the DDL1
> alias to be the preferred alias and for both the DDL2
> and DDLm tags to fail a validation check and other times when
> we wish the DDL2 alias to be the prefeered alias and
> for both the DDL1 and DDLm tags to fail a validation check.
> Now it becomes simpler to just have a common style key,
> such as "DDL1" or "DDL2" and to select just the way we
> do for alternate conformers on that key.
> 
> OK, that was not so bad, but now we are at, say, the PDB
> and in addition to having DDL1 and DDL2 style tags from
> the core, we also have prefixed tags (pdbx) that should
> eventually get promoted to be prefix-free. Now we can
> use the styles to validate for strict use of
> the prefixes when we are producing output that we want
> to be certain actually does use the prefixes, or
> relax the validation to allow both the prefixed and promoted
> tags, or go strict again on the far side to be sure be
> are only producting promoted tags.
> 
> Note that none of these style based input validation choices
> are based on the choice of dictionary -- it is one dictionary,
> so it does not really help to be maintaining the styles
> dictionary by dictionary. The grain of identification is
> too coarse, and involves multiple maintenance issues when
> in reality only one, nice new, DDLm dictionary needs to be
> maintained.
> 
> On the output side, essentially the same issues arise, but
> there are fewer users, but as I said, it is a harmless
> addition to the DDLm spec for those who do not wish to
> be aware of it, and for those of use for whom it is
> useful, it really is useful.
> 
> The fundamental diagreement is on whether we will have
> to have a DDL1 dictionary, a DDL2 dictionary, a DDLm
> dictionary, a prefix dictionary, etc., and plant them
> on assorted web sites, or just one DDLm dictionary that
> handles everything and can be local or remote or in
> local and remote pieces without changing the behavior
> of the validation or of the output.
> 
> I hope that those who are uncomfortable with this change
> will reconsider and support it. Thanks to David's clear
> thinking it is a clean, simple and useful idea, much
> better than my original import suggestion.
> 
> Please support it.
> 
> Regards,
> Herbert
> 
> =====================================================
> Herbert J. Bernstein, Professor of Computer Science
> Dowling College, Kramer Science Center, KSC 121
> Idle Hour Blvd, Oakdale, NY, 11769
> 
> +1-631-244-3035
> [email protected]
> =====================================================
> 
> On Thu, 20 Jan 2011, James Hester wrote:
> 
> I'm trying to get a grip on what problem the tag_style proposal
> solves. I'll just emphasise at the outset in case there are any
> misconceptions that it is incorrect to suppose that the dREL method
> knows or needs to know anything about the particular syntax in which
> an input or output value is expressed; dREL is concerned purely with
> describing relationships.
> 
> Here are the two scenarios that I think are being discussed under the
> rubric of DDLm compatibility with CIF1:
> 
> Scenario 1: given a DDLm dictionary, a program wishes to generate and
> (validate/insert) the value for some given CIF1 dataname in a CIF1
> datafile, using other CIF1 tags found in that datafile. We are all
> agreed (I think) that locating the relevant DDLm dictionary entries
> for a CIF1 dataname is a simple and well-defined task. The formatting
> of the eventual output value of the DDLm method is also not in the
> purvey of the dictionary, but rather of the application that is using
> the dictionary. The particular CIF1 tag to put in the datafile is
> also not an issue, as that was given at the beginning. So the
> tag_style proposal is not relevant here.
> 
> Scenario 2: given a CIF2 datafile, a DDLm application wishes to
> produce an equivalent CIF1 datafile. For many of the CIF2 datanames
> found in the CIF2 datafile, there are multiple possible datanames
> listed as aliases. How is the application to ensure that it writes a
> set of datanames from DDL1 dictionaries only or DDL2 dictionaries
> only? The simple solution alluded to by John B would be to do as
> follows: for each dictionary URI mentioned in the alias list, use the
> IUCr CIF dictionary register (and/or other canonical sources) to
> determine the DDL version of that dictionary. DDL conformance is a
> standard entry in the dictionary register. The latest dictionary
> version as given in the dictionary register could be selected where
> multiple versions are presented (URL for the register is
> ftp://ftp.iucr.org/pub/cifdics/cifdic.register).
> 
> Of course, any program wanting to do such conversions efficiently
> would pregenerate a DDL version - dictionary table once and refer to
> that. I therefore see no use, either in terms of efficiency or new
> functionality, for the tag_style attribute.
> 
> Please advise if I have misunderstood the problem.
> 
> James.
> On Thu, Jan 20, 2011 at 11:20 AM, Herbert J. Bernstein
> <[email protected]> wrote:
> 
> No, a tag style is simply supposed to identify a grouping of alias
> tag choices that belong together, so you can decide to put out
> those particular versions of tags.  It is just a text string,
> just like a alternate conformer identifier.
> 
> The same tag name could be marked with a many tag styles as
> you choose.  It is just text.  But you could not give multiple
> aliases for the same DDLm tag for the same tag style when allowing
> DDLm missing value generation or you would not know which version to put
> out, and for validation, there is no reason not to use different
> styles for the different alternatives.
> 
> The way I will write the extraction algorithm, if you choose
> a tag style, you will get the DDLm name for the tags that don't
> have an alias for the chosen style, but the tag alias given for the
> specified style is there is one.  That way a dictionary that is
> intended to support DDL1, DDL2 and DDLm for which the DDLm
> tags happen to be primarily consistent with DDL2 conventions,
> then for the tags that conform to DDL2 conventions, you will
> not need a DDL2 style alias, just a DDL1 style alias.  You will
> only need both a DDL1 style alias and a DDL2 style alias for
> a tag for which the DDLm tag is different from both, e.g.
> for _diffrn_standards_decay_% (DDL1), _diffrn_standards.decay_%
> (DDL2) and _diffrn_standards_decay_percent (DDLm).  When you
> want DDLm output and validation, you don't specify a style at all.
> 
> This will be very nice to allow an automatic cleanup for dictionaries
> using a prefix, say pdbx, for tags that later get promoted to
> to not need a prefix.
> 
> Regards,
>   Herbert
> 
> =====================================================
>  Herbert J. Bernstein, Professor of Computer Science
>    Dowling College, Kramer Science Center, KSC 121
>         Idle Hour Blvd, Oakdale, NY, 11769
>
>                  +1-631-244-3035
>                  [email protected]
> =====================================================
> 
> On Wed, 19 Jan 2011, Bollinger, John C wrote:
> 
> On Wednesday, January 19, 2011 3:47 PM, Herbert J. Bernstein wrote:
>
>   The definition_id most certainly does not exhibit the tag
> style.  For example, there is no way to distinguish DDLm
> tag style from DDL2 or DDL2 tag style from context.  That
> is intentionally inherent in the design of DDLm.
> 
> Then I'm afraid I don't quite comprehend the meaning of "tag style".  I
> would like to do, so that I can form a well-founded opinion about it.
> 
> As I thought I had understood the idea, the tag style is proposed to
> identify the set of DDL conventions with which the given alias complies.
> If that were indeed what it was intended to mean, however, then (1) as
> you observe, some names would comply with more than one set of
> conventions, but also (2) a set of candidate tag styles, at least, could
> be generated could be computed for any alias name.
> 
> What would be the significance of marking an alias that conforms with
> both DDL2 and DDLm conventions with tag style DDL2?
> 
> Might it ever be needful or useful to mark the same alias with more than
> one tag style?
>
>   As for defining a hypothetical URI, that can break,
> or each least time-out programs trying to get additional
> information about an aliased tag from that URI.  URIs
> should be for things that really exist on the web,
> not a substitute for a tag that really defines something
> different, in this case the style of tags.
> 
> I don't think the issue is nearly so clear cut.  I would hold, for example, 
> that the primary purpose of a URI is to *identify*
> a resource.  That's what the "I" stands for, as I'm sure you're aware.  RFC 
> 3986 (Uniform Resource Identifier (URI): General
> Syntax) explicitly provides that a URI may identify an abstract resource.  R
> FC 2396 (now obsoleted by 3986) says the same.
>  Although many URIs fulfill their purpose by serving as resolvable web addre
> sses, some, even among those formatted as URLs, do
> not.  Examples of the latter abound in various XML communities.
> 
> Personally, however, I think a bit more like you do: a URL ought to refer to
>  a retrievable resource on the web.  For an
> abstract or virtual resource, therefore, I prefer to use a URN.  For somethi
> ng like your virtual DDL1 imgCIF dictionary, I
> might choose something like urn:x-imgCIF:DDL1.  If a URN were used, then pro
> grams assuming a resolvable URL might still break,
> but only if they were poorly crafted indeed would they hang pending a time o
> ut.  The whole issue could largely be mooted by
> clarifying the purpose and intended usage of _alias.dictionary_uri in its de
> finition.  That need not prevent programs from
> attempting to resolve dictionary URIs, but if it specified that dictionary U
> RIs might be permanently unresolvable then
> programmers would know to prepare for that possibility.
>
>   We already do something very similar to this with
> alternate conformers and with NMR model numbers.  It
> really is a simply concept for organizing information
> that belongs in groups, in this case the group of
> DDL1 or DDL2 or DDLm or ... style tags.
> 
> I think that makes it a bit clearer to me what you want to do, but I'm still
>  interested in the answers to my questions above.
>  I'm a bit uncomfortable with defining generic groups of aliases with per-di
> ctionary semantics, if that's indeed what you're
> proposing.  For one thing, it does not play well with dictionary merging.  F
> or another, the meaning of the groupings is nowhere
> defined, at least not without adding at least one more data names to DDLm fo
> r that purpose.
> 
> On the other hand, data names have at least one natural grouping: the dictio
> naries in which they are defined.  This grouping is
> already modeled in DDLm, and as far as I can tell, it is conceptually a perf
> ect fit for what you want to do.
> 
> That doesn't necessarily mean that there is no use for a more general groupi
> ng mechanism.  I am curious indeed whether there
> are use cases for grouping data names that do not align well with dictionari
> es or dictionary-defined attributes.  Can anyone
> suggest some?
>
>  It solves
> a very real problem for me with imgCIF.  It does
> not harm to anybody else.  If nobody uses it in
> another dictionary, it still would have been a useful
> addition to DDLm.
> 
> I very much want you to have a solution to your problem, and I have suggeste
> d one that still seems absolutely natural to me.
>  It may be that there are better alternatives, and perhaps even that tag sty
> le would be one such.  Of the latter, however, I am
> not yet persuaded.
> 
> Perhaps "harm" is too charged a word, but adding an additional attribute to 
> DDLm certainly does cost everyone else.  Every DDLm
> application must support all the DDLm attributes, so every additional attrib
> ute places a development and maintenance burden on
> multiple developers.  That incrementally slows software release cycles and i
> ntroduces additional space for bugs and
> incompatibilities to hide.  It's a small cost for most people, but everyone 
> pays it.  The proposed tag style is no different in
> that regard from any other DDLm attribute, of course, but that doesn't mean 
> that its cost should be ignored.
> 
> As for whether it would be a useful addition to DDLm, that is exactly what I
>  am trying to decide.  Potential use cases such as
> I solicited above would help me make that decision.
>
>   In the end, I suspect that both core and mmCIF DDLm
> dictionaries will be built this way, because it
> make it simpler and clearer and allows multi-purpose
> dictionaries to be self-contained and avoid the
> maintenance headache David spotted.
> 
> If by "multi-purpose dictionaries" you mean defining multiple virtual dictio
> naries via a single DDLm dictionary, such as you
> plan, then I still see the dictionary_uri as the natural way to use aliases 
> for that purpose.  If there is a broader concept
> here then please help me see it.
> 
> 
> Regards,
> 
> John
> 
> --
> John C. Bollinger, Ph.D.
> Department of Structural Biology
> St. Jude Children's Research Hospital
> 
> 
> 
> 
> Email Disclaimer:  www.stjude.org/emaildisclaimer
> 
> _______________________________________________
> ddlm-group mailing list
> [email protected]
> http://scripts.iucr.org/mailman/listinfo/ddlm-group
> 
> _______________________________________________
> ddlm-group mailing list
> [email protected]
> http://scripts.iucr.org/mailman/listinfo/ddlm-group
> 
> 
> --
> T +61 (02) 9717 9907
> F +61 (02) 9717 3145
> M +61 (04) 0249 4148
> _______________________________________________
> ddlm-group mailing list
> [email protected]
> http://scripts.iucr.org/mailman/listinfo/ddlm-group
> 
> _______________________________________________
> ddlm-group mailing list
> [email protected]
> http://scripts.iucr.org/mailman/listinfo/ddlm-group
>
> 
> 
> 
> nlikely to be a problem.� The file will be more structured that a file that
> comforms to a DDL1 compliant CIF, but there may be some legacy software that
> will not be able to read it.� Again we canc only guage the extent of the
> problem by experiment.� No amount of hot air emails will solve that
> problem.� If all goes well we may be able to abandon the older dictionaries
> in favour of DDLm.� Let;s hope.
> 
> The information that should be available in an alias are the tag, the
> dictionary in which the tag first appears (including the version number),
> the DDL that the dictionary conforms to, a flag to indicate if a particular
> name has been deprecated, and a pointer to where the dictionary can be
> found.� It is better to keep these pieces of information separate (as is
> done in the mmCIF dictionary), since using a single item to convey two
> distinct types of information is inelegant and can lead to problems if there
> is conflict between the two meanings.� The pointer might be to a public
> archive, but it may make more sense for it to point to a local source that
> in turn can point either to a local source or an archive.� The deprecation
> flag is not needed on reading, but is needed on writing to ensure that only
> the current tag is used.� Alternatively, if the output file is to conform to
> a particular version of the CIF dictionary (so as to use an early piece of
> softward that is only aware of the deprecated name), the fils can be written
> in this version.� It is unlikely tht general software would include this as
> an option, but the dictionary should make it possible.
> 
> I agree with Herbert that the possibility of using the DDLm dictionaries to
> manage the whole CIF archive is worthy of pursuing, even if it eventually
> proves not to be possible.
> 
> David
> 
> 
> 
> 
>

_______________________________________________
ddlm-group mailing list
[email protected]
http://scripts.iucr.org/mailman/listinfo/ddlm-group

Reply to: [list | sender only]

References:

Re: [ddlm-group] Objectives of CIF2 syntax discussion (James Hester)

Re: [ddlm-group] Objectives of CIF2 syntax discussion (Herbert J. Bernstein)

Re: [ddlm-group] Objectives of CIF2 syntax discussion. . (Bollinger, John C)

Re: [ddlm-group] Objectives of CIF2 syntax discussion. . (Herbert J. Bernstein)

Re: [ddlm-group] Objectives of CIF2 syntax discussion. . (David Brown)

Re: [ddlm-group] Objectives of CIF2 syntax discussion. . (Herbert J. Bernstein)

Re: [ddlm-group] Objectives of CIF2 syntax discussion. .. . (Bollinger, John C)

Re: [ddlm-group] Objectives of CIF2 syntax discussion. .. . (Herbert J. Bernstein)

Re: [ddlm-group] Objectives of CIF2 syntax discussion. .. .. . (Bollinger, John C)

Re: [ddlm-group] Objectives of CIF2 syntax discussion. .. .. . (Herbert J. Bernstein)

Re: [ddlm-group] Objectives of CIF2 syntax discussion. .. .. . (James Hester)

Re: [ddlm-group] Objectives of CIF2 syntax discussion. .. .. . (Herbert J. Bernstein)

Re: [ddlm-group] Objectives of CIF2 syntax discussion. .. .. . (Herbert J. Bernstein)

Re: [ddlm-group] Objectives of CIF2 syntax discussion. .. .. . (John Westbrook)

Re: [ddlm-group] Objectives of CIF2 syntax discussion. .. .. . (David Brown)

Prev by Date: Re: [ddlm-group] Objectives of CIF2 syntax discussion. .. .. .. .

Next by Date: Re: [ddlm-group] Objectives of CIF2 syntax discussion. .. .. .

Prev by thread: Re: [ddlm-group] Objectives of CIF2 syntax discussion. .. .. .

Next by thread: Re: [ddlm-group] Objectives of CIF2 syntax discussion. .. .. .

Index(es):

Date

Thread

Discussion List Archives

Re: [ddlm-group] Objectives of CIF2 syntax discussion. .. .. .