Discussion List Archives

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [ddlm-group] DDLm aliases (subject changed)

  • To: Group finalising DDLm and associated dictionaries <ddlm-group@iucr.org>
  • Subject: Re: [ddlm-group] DDLm aliases (subject changed)
  • From: James Hester <jamesrhester@gmail.com>
  • Date: Sun, 23 Jan 2011 12:19:12 +1100
I believe this discussion arose out of a misconception, but will end
up producing something useful.  First of all, we should be clear that
the only well-defined meaning of "DDL1/DDL2/DDLm tag" is "a dataname
defined in a dictionary written using DDL1/DDL2/DDLm".  Note in
particular that all DDL1 and DDL2 tags are consistent with CIF1
syntax, and writing a DDLm dictionary with CIF1-compatible tags is
also not troublesome.  This means that it is simple to write a DDLm
imgCIF or coreCIF dictionary where datanames satisfy CIF1 syntax
rules.  A datafile in CIF1 syntax can then refer to the DDLm
dictionary as the reference for the datanames.

On the other hand, it is not possible to write a DDLm dictionary that
can serve as a DDL1 or DDL2 dictionary, because the DDL languages are
different and incompatible.  Simply rewriting the tags does not change
the fact that the tag is defined in a DDLm dictionary and therefore is
interpretable using DDLm semantics *only*. The concept of a virtual
dictionary generated from a "master" DDLm dictionary but with
DDLm/DDL1/DDL2 flavours is therefore meaningless and should be
abandoned.

Nevertheless, an important use case for rewriting tags has been
identified by Herbert: transitioning from the use of tags with a local
identifier to those using a "global" (ie no namespace) identifier.
With something like the tag_style proposal in place, the DDLm
dictionary writer can write the dictionary as if it were a global
dictionary (this may particularly help with dREL methods) and include
a "local" tag_style which gives an alternate dataname that includes
the local section. In tandem with this, any datafiles containing
datanames defined in this local dictionary would use the audit
category to specify both a dictionary *and* a style.  If only "local"
datanames are in use, then the style would be "local"; if the
dictionary becomes a standard, no rewriting is necessary, and
datafiles can now just use the default value of style ("standard").  I
think this is a compelling use case, but still have to think through
how dictionary merging will work.

The second future use case is that of datanames in a DDLm dictionary
containing non-ASCII code points.  These, and only these, DDLm
datanames are not CIF1-compatible.  A style could therefore be added
giving the "ASCII" equivalent dataname.

As John W was suggesting (at least reading between the lines), the
above two use cases are semantically distinct from aliases.  Aliases
point to definitions in a dictionary and state that the aliased
dataname is the equivalent dataname in a different dictionary.  As the
dictionary DDL languages may be different, there are no explicit
guarantees that all semantic properties (e.g. category relationships)
can be preserved in making this translation.  On the other hand, the
tag_style use is a simple rewriting of the dataname preserving perfect
semantic identity.

Therefore, I believe that the tag_style tag should not be conflated
with aliases, but should be created in a separate category.  Note also
that "local" and "ASCII" are not mutually exclusive designations, so
some further work is necessary to get everything to work together
properly (e.g. how do I transition between "local + ASCII", "local",
"ASCII" and "standard+ASCII"?).  I also think that "style" is probably
not the best terminology to use - perhaps "presentation" or "view"
would be better.

I have so far no objection in principle to normalising out the
dictionary using dictionary_xref as John has proposed.

On Sat, Jan 22, 2011 at 7:47 AM, Herbert J. Bernstein
<yaya@bernstein-plus-sons.com> wrote:
> This can be made to work, but for my uses, there are
> some minor issues:
>
> 1.  I will be grouping the primary DDLm tag.  With the
> _definition.xref_code removed, the primary DDLm tag
> will have to be aliased; and
>
> 2.  With multiple xref codes for a given tag (e.g.
> DDL2 and DDLm), it would be more appropriate to
> normalize and put the tags and xref codes into
> a sub-category, rather than to keep repeating the
> same tag.  This would have the advantage of allowing
> the alias category to return to a non-compound key
> and would also allow all the grouping of
> tags in a dictionary to be gathered on a separate
> block, if desired.
>
> For these reasons, I suggest
>
> 1.  Leave _alias.dictionary_uri, but deprecate it in
> favor of:
>
> 2.  Create an ALIAS_XREF category with the
> following tags, forming a composite key
>
> _alias_group.definition_id
>     a tag identifier belonging to a group
> _alias_group.xref_code
>     a code identifying a real or virtual dictionary
> or other logical groups of tags to which the tag
> belongs
>
> The other tags that John proposes for David's uses
> actually fit better in terms of normalization in this sub-
> category, than on the top level, but that is a decision
> for David to make.  I am happy either way.
>
> The addition to the ddl dictionary would be:
>
> save_ALIAS_XREF
>
>   _definition.id      alias_xref
>   _definition.scope   Category
>   _definition.class   List
>   _definition.update  2011-01-21
> ;
>    The attributes used to specify the actual dictionary,
>    virtual dictionary, or other logical grouping of
>    tags indicated by an xref code to which a given tag belong.
>
>    The default xref code under which all tags for which
>    no xref group is defined is the one specified by
>    a null value.
>
> ;
>    _category.parent_id  alias
>    _category_key.primitive  ['_alias_xref.definition_id',
>                              '_alias_xref.xref_code']
>     save_
>
> save_alias_xref.definition_id
>     _definition.id   '_alias_xref.definition_id'
>     _definition.class  Attribute
>     _definition.update 2011-01-21
>     _description.text
> ;
>     Identifier tag of a definition associated with
>     an xref code by which to group this tag with
>     other tags.  A single tags may be associated
>     with multiple xref codes.  An xref code does
>     not have to be associated with a particular
>     dictionary, nor with a particular DDL format.
>
>     Note that the tag does not have to be a valid
>     tag under DDLm tag construction rules, but
>     it should be a valid tag under the rules of
>     some DDL.
> ;
>     _name.category_id alias_xref
>     _name.object_id   definition_id
>     _type.purpose     Key
>     _type.container   Single
>     _type.contents    Code
>      save_
>
> save_alias_xref.xref_code
>     _definition.id   '_alias_xref.xref_code'
>     _definition.class  Attribute
>     _definition.update 2011-01-21
>     _description.text
> ;
>     A code identifying the actual dictionary,
>     virtual dictionary or other logical grouping
>     to which the identifier tag belongs.
> ;
>     _name.category_id alias_xref
>     _name.object_id   code
>     _type.purpose     Key
>     _type.container   Single
>     _type.contents    Code
>      save_
>
>
>
> =====================================================
>  Herbert J. Bernstein, Professor of Computer Science
>    Dowling College, Kramer Science Center, KSC 121
>         Idle Hour Blvd, Oakdale, NY, 11769
>
>                  +1-631-244-3035
>                  yaya@dowling.edu
> =====================================================
>
> On Fri, 21 Jan 2011, Bollinger, John C wrote:
>
>>
>> On Friday, January 21, 2011 8:57 AM, David Brown wrote:
>> [...]
>>> I would like to know exactly what I am voting on.  There seems to be
>>> general agreement on the information that is needed for an alias, the
>>> only dispute is the format in which it will appear.  If the various
>>> pieces of information I listed each had their own item, this would be
>>> agreeable and we could delegate someone to come up with the requisit
>>> DDLm save frames, but if this information is to be included, explicitly
>>> or implicitly, in a smaller number of items, then I would like to see
>>> the definitions and descriptions so that I could understand how each
>>> piece of information would be retrieved.  John B, can you supply us
>>> with an example of what your normalized item(s) would look like?
>>
>>
>> Indeed, here is the formal proposal I promised, at the end of which is
>> an example:
>>
>>
>> Proposal: Extended Alias Attributes
>> ===================================
>>
>> Introduction / Rationale
>> ------------------------
>>
>> This proposal aims primarily to provide all the ALIAS attributes that several members of this group have recently agreed are needed (at least in principle).  However, attributes that are properties of dictionaries rather than of individual data names are normalized out of the ALIAS category and into the DICTIONARY_XREF category.  The description of the DICTIONARY_XREF category is slightly modified to be explicitly consistent with this usage and with the concept of referencing logical dictionaries that have no independent physical manifestation.
>>
>>
>> Proposed Actions
>> ----------------
>>
>> 1) Replace _alias.dictionary_uri with:
>>
>> _alias.xref_code: Specifies a code that identifies the logical or physical dictionary in which the alias is defined.  This serves to categorize and fully identify the alias.
>>    _type.purpose     Identify
>>    _type.container   Single
>>    _type.contents    Code
>>
>> 2) Add these attributes:
>>
>> _alias.dictionary_version: Specifies the first version of the
>> dictionary identified by _alias.xref_code that defines the alias.
>>    _type.purpose     Identify
>>    _type.container   Single
>>    _type.contents    Code
>>
>> _alias.deprecated: Specifies whether use of the alias is deprecated.
>>    _type.purpose     State
>>    _type.container   Single
>>    _type.contents    YesorNo
>>
>> 3) In the ALIAS category, replace attribute _category_key.generic with:
>>    _category_key.primitive [ '_alias.xref_code' '_alias.definition_id' ]
>>
>> 4) Modify the definition of _dictionary_xref.format by changing its
>> _type.contents attribute to "Code".
>>
>> 5) Remove _definition.xref_code (its purpose will be served via the
>> alias mechanism)
>>
>> 6) Modify the description of the DICTIONARY_XREF category to: "The
>> DICTIONARY_XREF attributes identify and describe logical or physical
>> dictionaries to which items in the current dictionary are
>> cross-referenced using the _alias.xref_code attribute."
>>
>>
>> Comments
>> --------
>>
>> Here is the resulting correspondence between DDLm data names and David's
>> list of alias attributes:
>>
>> "The tag" -> _alias.definition_id (unchanged by this proposal)
>>
>> "the dictionary in which it appears" -> a row/instance of
>> DICTIONARY_XREF, identified by _alias.xref_code (added by this proposal)
>>
>> "the version of this dictionary" -> _alias.dictionary_version (added by
>> this proposal)
>>
>> "the DDL in which the dictionary is written" -> _dictionary_xref.format
>> (type attributes modified by this proposal)
>>
>> "a flag to indicate whether the dataname is deprecated" ->
>> _alias.deprecated (added by this proposal)
>>
>> "a pointer to where the named dictionary can be found" ->
>> _dictionary_xref.uri (unchanged by this proposal)
>>
>>
>> Although this proposal chooses the existing DICTIONARY_XREF category as
>> the normalized location for alias attributes that depend only on
>> dictionary, it would also be possible to instead introduce a new,
>> parallel category for this purpose.  If the _definition.xref_code is
>> merged into the alias feature as I propose, however, then
>> DICTIONARY_XREF no longer has any other purpose.  On the other hand, it
>> is not essential to drop _definition.xref_code.
>>
>> As in my previous proposal concerning _alias.dictionary_uri, the key for
>> the ALIAS category is expended to a compound one containing the
>> dictionary identifier and the data name.  This allows one data name's
>> appearances in multiple dictionaries all to be aliased to the same
>> defined name, without implying that all possible definitions of the name
>> are aliased.  Essentially, it scopes the alias to the dictionary in
>> which it appears.  DDL2's similar ITEM_ALIASES category is keyed not
>> only to name and dictionary identifier, but also to dictionary version;
>> the last seems needless, even in DDL2, because we can assume that once
>> introduced into a dictionary, data names are not removed or incompatibly
>> changed.
>>
>> The type attributes of _dictionary.xref_format are changed so that this
>> attribute represents a computer-interpretable code describing at least
>> the DDL compliance level of the referenced dictionary.  Allowed values
>> could be defined so that they encompass other information as well, very
>> much like the proposed tag_style might do.  It might be desirable for
>> DDLm to enumerate allowed values for this attribute, but it would be
>> more flexible to have an external register, such as Herbert proposed for
>> tag_style.  I presently take no position on the best course in that
>> regard, but this proposal does not provide enumerated values.
>>
>> This proposal is offered for comment.  Although I would be willing to
>> have a vote on it as it stands, it could likely be improved.  I am open
>> to changing some of the details if that will contribute to broader
>> acceptance.
>>
>>
>> Example
>> -------
>>
>> loop_
>>    _dictionary_xref.code
>>    _dictionary_xref.date
>>    _dictionary_xref.format
>>    _dictionary_xref.name
>>    _dictionary_xref.uri
>>    core  '2010-Jun-29'  DDL1  cif_core.dic  ftp://ftp.iucr.org/pub/cif_core.dic
>>    mmcif '2005-Jun-27'  DDL2  mmcif_std.dic ftp://ftp.iucr.org/pub/cif_mm.dic
>>
>> [...]
>>
>> save_diffrn_standards.decay_percent
>>    _definition.id             '_diffrn_standards.decay_percent'
>>
>> [...]
>>
>>    loop_
>>        _alias.xref_code
>>        _alias.definition_id
>>        _alias.dictionary_version
>>        _alias.deprecated
>>        core  '_diffrn_standards_decay_%' . no
>>        mmcif '_diffrn_standards.decay_%' . no
>>
>> save_
>>
>>
>> Regards,
>>
>> John
>>
>> --
>> John C. Bollinger, Ph.D.
>> Department of Structural Biology
>> St. Jude Children's Research Hospital
>>
>>
>> Email Disclaimer:  www.stjude.org/emaildisclaimer
>>
>> _______________________________________________
>> ddlm-group mailing list
>> ddlm-group@iucr.org
>> http://scripts.iucr.org/mailman/listinfo/ddlm-group
>>
> _______________________________________________
> ddlm-group mailing list
> ddlm-group@iucr.org
> http://scripts.iucr.org/mailman/listinfo/ddlm-group
>



-- 
T +61 (02) 9717 9907
F +61 (02) 9717 3145
M +61 (04) 0249 4148
_______________________________________________
ddlm-group mailing list
ddlm-group@iucr.org
http://scripts.iucr.org/mailman/listinfo/ddlm-group


Reply to: [list | sender only]
International Union of Crystallography

Scientific Union Member of the International Science Council (admitted 1947). Member of CODATA, the ISC Committee on Data. Partner with UNESCO, the United Nations Educational, Scientific and Cultural Organization in the International Year of Crystallography 2014.

International Science Council Scientific Freedom Policy

The IUCr observes the basic policy of non-discrimination and affirms the right and freedom of scientists to associate in international scientific activity without regard to such factors as ethnic origin, religion, citizenship, language, political stance, gender, sex or age, in accordance with the Statutes of the International Council for Science.