Discussion List Archives

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [ddlm-group] Objectives of CIF2 syntax discussion. .. .. .

Title:
Dear Colleagues,

It is unlikely that we will be able to abandon our DDL1 and DDL2 dictionaries as lightly as Herbert suggests, though the goal is a noble one.  We must remember that it is not just the tags that are different between the different dictionaries but also the structure.  DDLm is more structured than DDL2 which in turn is more structured that DDL1.  Many CIFs based on DDL1 have bent the rules while we were learning how best to structure CIF and it is possible that some problems will occur in reading early CIFs because of this.  We might also find cases where a CIF includes a loop that is not allowed in the DDLm dictionaries but we will only discover this by experiment, which requires working software.  Similarly any datafile written using a DDLm dictionary will be able to produce a CIF filled with DDL1 dictionary datanames, but the structure will still correspond to DDLm.  In most cases this is unlikely to be a problem, but we will only find out when we have working software. 

The aliases should contain the following information: The tag, the dictionary in which it appears, the version of this dictionary, the DDL in which the dictionary is written (a given dictionaray may be written using different DDLs as for example the symmetry dictinoary was written in DDL2 and parts converted to DDL1), a flag to indicate whether the dataname is deprecated (needed for writing files) and a pointer to where the named dictionary can be found.  This may be a public archive or a local file that in turn points either to a local source or the public archive depending on the local institution.  There may be rare occasions when someone may want to write a program to produce a CIF in an earlier version that is compatible with software that is unaware of the later datanames.

The goal, as I say, is noble and is worth shooting for.  Whether we reach the goal can only be determined when we have working software, but we should design the system on the assumption that it will work.

David




John Westbrook wrote:
Herbert and David,

Could I ask for some clarification on the requirements for the aliasing
mechanism.     In particular is this intended to provide more than naming
correspondence between the current dictionary and the some prior dictionary.
In DDL2 we have used ITEM_ALIASES as in the example below to provide name
correspondences between our dictionary, the CIF core dictionary, and other
recognized variant dictionaries.   The image dictionary has done this
similarly I believe.

I am confused by the last set of messages about how this will be used
backwardly for validation.   The semantics of new dictionaries may
enforce a new potentially stricter set of rules that are not necessarily
backwardly compatible.   I am wondering what is expected here.

John



save__atom_site.Cartn_z
     _item_description.description
;              The z atom-site coordinate in angstroms specified according to
                a set of orthogonal Cartesian axes related to the cell axes as
                specified by the description given in
                _atom_sites.Cartn_transform_axes.
;
     _item.name                  '_atom_site.Cartn_z'
     _item.category_id             atom_site
     _item.mandatory_code          no
     _item_aliases.alias_name    '_atom_site_Cartn_z'
     _item_aliases.dictionary      cif_core.dic
     _item_aliases.version         2.0.1
     loop_
     _item_dependent.dependent_name
                                 '_atom_site.Cartn_x'
                                 '_atom_site.Cartn_y'
     _item_related.related_name  '_atom_site.Cartn_z_esd'
     _item_related.function_code   associated_esd
     _item_sub_category.id         cartesian_coordinate
     _item_type.code               float
     _item_type_conditions.code    esd
     _item_units.code              angstroms
      save_

On 1/20/11 6:10 AM, Herbert J. Bernstein wrote:
Dear Colleagues,

There is an importantr part of James' suggestions that,
if Brian is willing, I think it would be a good idea
to add to the _alias.tag_style proposal and that is
a central registry of styles to facilty dictionary
merging. The ground rules would be:

COMCIFS approval for any style, such as DDL1, DDL2,
DDLm, etc., unless prefixed by a prefix from Brian's
prefix registry, e.g. pdbx_. The special prefix
local_ could be used for styles for use purely
locally, i.e. for private dictionaries for which]
collisions on merging are not a concern.

Regards,
Herbert

=====================================================
Herbert J. Bernstein, Professor of Computer Science
Dowling College, Kramer Science Center, KSC 121
Idle Hour Blvd, Oakdale, NY, 11769

+1-631-244-3035
yaya@dowling.edu
=====================================================

On Thu, 20 Jan 2011, Herbert J. Bernstein wrote:

Dear Colleagues,

If a DDLm dictionary is to be a fully functional replacement
for, say, a DDL1 dictionary, a dictionary against which one
can validate the use of purely DDL1 tags, we need a way to
not only specify the desired DDL1 tag as an alias to the
DDLm tag used in the dictionary, but also to specify that
we do _not_ want to accept the DDLm tag used as the save
frame name as the valud name. As David has noted, in
order not to still be maintaining both a DDL1 and a
DDLm dictionary, we want this information _in_ the DDLm
dictionary, so simply aliasing back to some other
DDL1 dictionary to use it as a way to say -- "use that dictionary
URI as the style indicator" is suboptimal. Worse, it is a
source of future errors and confusion in that it is
defining properties of the tag that may end up disagreeing
with the properties we wish to actually have that we
defined in the DDLm dictionary.

OK, so far, so good -- all we need then is John B.'s tag-by-tag
style preference flag to say, for this dictionary we want to
be DDL1'ish.

Ah, but now we say, we are in the situation of maintaining
the core (David's problem) in which we have to maintain
a dictionary for validation against both DDL1 and DDL2
tag names. Now there are times when we wish the DDL1
alias to be the preferred alias and for both the DDL2
and DDLm tags to fail a validation check and other times when
we wish the DDL2 alias to be the prefeered alias and
for both the DDL1 and DDLm tags to fail a validation check.
Now it becomes simpler to just have a common style key,
such as "DDL1" or "DDL2" and to select just the way we
do for alternate conformers on that key.

OK, that was not so bad, but now we are at, say, the PDB
and in addition to having DDL1 and DDL2 style tags from
the core, we also have prefixed tags (pdbx) that should
eventually get promoted to be prefix-free. Now we can
use the styles to validate for strict use of
the prefixes when we are producing output that we want
to be certain actually does use the prefixes, or
relax the validation to allow both the prefixed and promoted
tags, or go strict again on the far side to be sure be
are only producting promoted tags.

Note that none of these style based input validation choices
are based on the choice of dictionary -- it is one dictionary,
so it does not really help to be maintaining the styles
dictionary by dictionary. The grain of identification is
too coarse, and involves multiple maintenance issues when
in reality only one, nice new, DDLm dictionary needs to be
maintained.

On the output side, essentially the same issues arise, but
there are fewer users, but as I said, it is a harmless
addition to the DDLm spec for those who do not wish to
be aware of it, and for those of use for whom it is
useful, it really is useful.

The fundamental diagreement is on whether we will have
to have a DDL1 dictionary, a DDL2 dictionary, a DDLm
dictionary, a prefix dictionary, etc., and plant them
on assorted web sites, or just one DDLm dictionary that
handles everything and can be local or remote or in
local and remote pieces without changing the behavior
of the validation or of the output.

I hope that those who are uncomfortable with this change
will reconsider and support it. Thanks to David's clear
thinking it is a clean, simple and useful idea, much
better than my original import suggestion.

Please support it.

Regards,
Herbert

=====================================================
Herbert J. Bernstein, Professor of Computer Science
Dowling College, Kramer Science Center, KSC 121
Idle Hour Blvd, Oakdale, NY, 11769

+1-631-244-3035
yaya@dowling.edu
=====================================================

On Thu, 20 Jan 2011, James Hester wrote:

I'm trying to get a grip on what problem the tag_style proposal
solves. I'll just emphasise at the outset in case there are any
misconceptions that it is incorrect to suppose that the dREL method
knows or needs to know anything about the particular syntax in which
an input or output value is expressed; dREL is concerned purely with
describing relationships.

Here are the two scenarios that I think are being discussed under the
rubric of DDLm compatibility with CIF1:

Scenario 1: given a DDLm dictionary, a program wishes to generate and
(validate/insert) the value for some given CIF1 dataname in a CIF1
datafile, using other CIF1 tags found in that datafile. We are all
agreed (I think) that locating the relevant DDLm dictionary entries
for a CIF1 dataname is a simple and well-defined task. The formatting
of the eventual output value of the DDLm method is also not in the
purvey of the dictionary, but rather of the application that is using
the dictionary. The particular CIF1 tag to put in the datafile is
also not an issue, as that was given at the beginning. So the
tag_style proposal is not relevant here.

Scenario 2: given a CIF2 datafile, a DDLm application wishes to
produce an equivalent CIF1 datafile. For many of the CIF2 datanames
found in the CIF2 datafile, there are multiple possible datanames
listed as aliases. How is the application to ensure that it writes a
set of datanames from DDL1 dictionaries only or DDL2 dictionaries
only? The simple solution alluded to by John B would be to do as
follows: for each dictionary URI mentioned in the alias list, use the
IUCr CIF dictionary register (and/or other canonical sources) to
determine the DDL version of that dictionary. DDL conformance is a
standard entry in the dictionary register. The latest dictionary
version as given in the dictionary register could be selected where
multiple versions are presented (URL for the register is
ftp://ftp.iucr.org/pub/cifdics/cifdic.register).

Of course, any program wanting to do such conversions efficiently
would pregenerate a DDL version - dictionary table once and refer to
that. I therefore see no use, either in terms of efficiency or new
functionality, for the tag_style attribute.

Please advise if I have misunderstood the problem.

James.
On Thu, Jan 20, 2011 at 11:20 AM, Herbert J. Bernstein
<yaya@bernstein-plus-sons.com> wrote:
No, a tag style is simply supposed to identify a grouping of alias
tag choices that belong together, so you can decide to put out
those particular versions of tags.  It is just a text string,
just like a alternate conformer identifier.

The same tag name could be marked with a many tag styles as
you choose.  It is just text.  But you could not give multiple
aliases for the same DDLm tag for the same tag style when allowing
DDLm missing value generation or you would not know which version to put
out, and for validation, there is no reason not to use different
styles for the different alternatives.

The way I will write the extraction algorithm, if you choose
a tag style, you will get the DDLm name for the tags that don't
have an alias for the chosen style, but the tag alias given for the
specified style is there is one.  That way a dictionary that is
intended to support DDL1, DDL2 and DDLm for which the DDLm
tags happen to be primarily consistent with DDL2 conventions,
then for the tags that conform to DDL2 conventions, you will
not need a DDL2 style alias, just a DDL1 style alias.  You will
only need both a DDL1 style alias and a DDL2 style alias for
a tag for which the DDLm tag is different from both, e.g.
for _diffrn_standards_decay_% (DDL1), _diffrn_standards.decay_%
(DDL2) and _diffrn_standards_decay_percent (DDLm).  When you
want DDLm output and validation, you don't specify a style at all.

This will be very nice to allow an automatic cleanup for dictionaries
using a prefix, say pdbx, for tags that later get promoted to
to not need a prefix.

Regards,
  Herbert

=====================================================
 Herbert J. Bernstein, Professor of Computer Science
   Dowling College, Kramer Science Center, KSC 121
        Idle Hour Blvd, Oakdale, NY, 11769

                 +1-631-244-3035
                 yaya@dowling.edu
=====================================================

On Wed, 19 Jan 2011, Bollinger, John C wrote:

On Wednesday, January 19, 2011 3:47 PM, Herbert J. Bernstein wrote:
  The definition_id most certainly does not exhibit the tag
style.  For example, there is no way to distinguish DDLm
tag style from DDL2 or DDL2 tag style from context.  That
is intentionally inherent in the design of DDLm.
Then I'm afraid I don't quite comprehend the meaning of "tag style".  I
would like to do, so that I can form a well-founded opinion about it.

As I thought I had understood the idea, the tag style is proposed to
identify the set of DDL conventions with which the given alias complies.
If that were indeed what it was intended to mean, however, then (1) as
you observe, some names would comply with more than one set of
conventions, but also (2) a set of candidate tag styles, at least, could
be generated could be computed for any alias name.

What would be the significance of marking an alias that conforms with
both DDL2 and DDLm conventions with tag style DDL2?

Might it ever be needful or useful to mark the same alias with more than
one tag style?

  As for defining a hypothetical URI, that can break,
or each least time-out programs trying to get additional
information about an aliased tag from that URI.  URIs
should be for things that really exist on the web,
not a substitute for a tag that really defines something
different, in this case the style of tags.
I don't think the issue is nearly so clear cut.  I would hold, for example, that the primary purpose of a URI is to *identify*
a resource.  That's what the "I" stands for, as I'm sure you're aware.  RFC 3986 (Uniform Resource Identifier (URI): General
Syntax) explicitly provides that a URI may identify an abstract resource.  RFC 2396 (now obsoleted by 3986) says the same.
 Although many URIs fulfill their purpose by serving as resolvable web addresses, some, even among those formatted as URLs, do
not.  Examples of the latter abound in various XML communities.

Personally, however, I think a bit more like you do: a URL ought to refer to a retrievable resource on the web.  For an
abstract or virtual resource, therefore, I prefer to use a URN.  For something like your virtual DDL1 imgCIF dictionary, I
might choose something like urn:x-imgCIF:DDL1.  If a URN were used, then programs assuming a resolvable URL might still break,
but only if they were poorly crafted indeed would they hang pending a time out.  The whole issue could largely be mooted by
clarifying the purpose and intended usage of _alias.dictionary_uri in its definition.  That need not prevent programs from
attempting to resolve dictionary URIs, but if it specified that dictionary URIs might be permanently unresolvable then
programmers would know to prepare for that possibility.

  We already do something very similar to this with
alternate conformers and with NMR model numbers.  It
really is a simply concept for organizing information
that belongs in groups, in this case the group of
DDL1 or DDL2 or DDLm or ... style tags.
I think that makes it a bit clearer to me what you want to do, but I'm still interested in the answers to my questions above.
 I'm a bit uncomfortable with defining generic groups of aliases with per-dictionary semantics, if that's indeed what you're
proposing.  For one thing, it does not play well with dictionary merging.  For another, the meaning of the groupings is nowhere
defined, at least not without adding at least one more data names to DDLm for that purpose.

On the other hand, data names have at least one natural grouping: the dictionaries in which they are defined.  This grouping is
already modeled in DDLm, and as far as I can tell, it is conceptually a perfect fit for what you want to do.

That doesn't necessarily mean that there is no use for a more general grouping mechanism.  I am curious indeed whether there
are use cases for grouping data names that do not align well with dictionaries or dictionary-defined attributes.  Can anyone
suggest some?

 It solves
a very real problem for me with imgCIF.  It does
not harm to anybody else.  If nobody uses it in
another dictionary, it still would have been a useful
addition to DDLm.
I very much want you to have a solution to your problem, and I have suggested one that still seems absolutely natural to me.
 It may be that there are better alternatives, and perhaps even that tag style would be one such.  Of the latter, however, I am
not yet persuaded.

Perhaps "harm" is too charged a word, but adding an additional attribute to DDLm certainly does cost everyone else.  Every DDLm
application must support all the DDLm attributes, so every additional attribute places a development and maintenance burden on
multiple developers.  That incrementally slows software release cycles and introduces additional space for bugs and
incompatibilities to hide.  It's a small cost for most people, but everyone pays it.  The proposed tag style is no different in
that regard from any other DDLm attribute, of course, but that doesn't mean that its cost should be ignored.

As for whether it would be a useful addition to DDLm, that is exactly what I am trying to decide.  Potential use cases such as
I solicited above would help me make that decision.

  In the end, I suspect that both core and mmCIF DDLm
dictionaries will be built this way, because it
make it simpler and clearer and allows multi-purpose
dictionaries to be self-contained and avoid the
maintenance headache David spotted.
If by "multi-purpose dictionaries" you mean defining multiple virtual dictionaries via a single DDLm dictionary, such as you
plan, then I still see the dictionary_uri as the natural way to use aliases for that purpose.  If there is a broader concept
here then please help me see it.


Regards,

John

--
John C. Bollinger, Ph.D.
Department of Structural Biology
St. Jude Children's Research Hospital




Email Disclaimer:  www.stjude.org/emaildisclaimer

_______________________________________________
ddlm-group mailing list
ddlm-group@iucr.org
http://scripts.iucr.org/mailman/listinfo/ddlm-group

_______________________________________________
ddlm-group mailing list
ddlm-group@iucr.org
http://scripts.iucr.org/mailman/listinfo/ddlm-group


--
T +61 (02) 9717 9907
F +61 (02) 9717 3145
M +61 (04) 0249 4148
_______________________________________________
ddlm-group mailing list
ddlm-group@iucr.org
http://scripts.iucr.org/mailman/listinfo/ddlm-group
_______________________________________________
ddlm-group mailing list
ddlm-group@iucr.org
http://scripts.iucr.org/mailman/listinfo/ddlm-group
  

nlikely to be a problem.  The file will be more structured that a file that comforms to a DDL1 compliant CIF, but there may be some legacy software that will not be able to read it.  Again we canc only guage the extent of the problem by experiment.  No amount of hot air emails will solve that problem.  If all goes well we may be able to abandon the older dictionaries in favour of DDLm.  Let;s hope.

The information that should be available in an alias are the tag, the dictionary in which the tag first appears (including the version number), the DDL that the dictionary conforms to, a flag to indicate if a particular name has been deprecated, and a pointer to where the dictionary can be found.  It is better to keep these pieces of information separate (as is done in the mmCIF dictionary), since using a single item to convey two distinct types of information is inelegant and can lead to problems if there is conflict between the two meanings.  The pointer might be to a public archive, but it may make more sense for it to point to a local source that in turn can point either to a local source or an archive.  The deprecation flag is not needed on reading, but is needed on writing to ensure that only the current tag is used.  Alternatively, if the output file is to conform to a particular version of the CIF dictionary (so as to use an early piece of softward that is only aware of the deprecated name), the fils can be written in this version.  It is unlikely tht general software would include this as an option, but the dictionary should make it possible.

I agree with Herbert that the possibility of using the DDLm dictionaries to manage the whole CIF archive is worthy of pursuing, even if it eventually proves not to be possible.

David



begin:vcard
fn:I.David Brown
n:Brown;I.David
org:McMaster University;Brockhouse Institute for Materials Research
adr:;;King St. W;Hamilton;Ontario;L8S 4M1;Canada
email;internet:idbrown@mcmaster.ca
title:Professor Emeritus
tel;work:+905 525 9140 x 24710
tel;fax:+905 521 2773
version:2.1
end:vcard

_______________________________________________
ddlm-group mailing list
ddlm-group@iucr.org
http://scripts.iucr.org/mailman/listinfo/ddlm-group

Reply to: [list | sender only]
International Union of Crystallography

Scientific Union Member of the International Science Council (admitted 1947). Member of CODATA, the ISC Committee on Data. Partner with UNESCO, the United Nations Educational, Scientific and Cultural Organization in the International Year of Crystallography 2014.

International Science Council Scientific Freedom Policy

The IUCr observes the basic policy of non-discrimination and affirms the right and freedom of scientists to associate in international scientific activity without regard to such factors as ethnic origin, religion, citizenship, language, political stance, gender, sex or age, in accordance with the Statutes of the International Council for Science.