[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[ddlm-group] Second proposal to allow looping of 'Set' categories

  • To: ddlm-group <ddlm-group@iucr.org>
  • Subject: [ddlm-group] Second proposal to allow looping of 'Set' categories
  • From: James Hester <jamesrhester@gmail.com>
  • Date: Wed, 8 Jun 2016 13:59:08 +1000
  • DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;h=mime-version:from:date:message-id:subject:to;bh=xqr+YaI28vNX0Ji/TjMWu/v0rU9nWA1IRa7UBe/DCtQ=;b=lpy7fougG+m5UP08ZNUPkQKGqY2cz85re/2HGEz+sOpdCr/5ZAwnQPSxwTCXmOdlnlr+G0P1OHdIoGaTI70XvPmxxr5idCTuo74ZLQyBQWLeBpcmms3DwfJnLCr0MoS93qm08dn+uFCaG32d4qhohB4yITssOnsR/BF46afm49ourvauoLYqC0qH8cArCAIq5GuR4lppJPvnW3c5YfKg4+ah3n6gsPLNQ9t5ah6/R6pswHzgWDb0T2F2n995DesCXR8vTm1sDSypsTuqyQoB31zp0sH4l2pncssTo8KJEAnTh6P8AqROzmLg+JcSOI0iWFVXoslZgnFORTYScHKAQg==
Introduction
============

The previous proposal
(http://www.iucr.org/__data/iucr/lists/ddlm-group/msg01428.html) was
deemed inadequate (see discussion in that thread). The two key issues are

(i) current software must not misinterpret files produced
according to any new semantic principles.

(ii) we wish to minimise the number of datanames that software must
potentially check when searching for a particular item of information,
as each new dataname is a required update to all CIF reading software
that processes aliases of that dataname (unfortunately the vast bulk
of software does not use the latest version of the dictionary to find
aliases).

Please carefully consider and improve the following proposal:

Proposal #2 for allowing loopable 'Set' categories
==================================================

Step 1
------

(a) A new dataname '_audit.schema' (or similar) is defined, and all
CIF reading software is expected, after a transitional period, to
check its value; if missing, the value defaults to 'Structural',
corresponding to all current CIF1 datafiles.  Here is a sketchy
definition:

_definition.id    '_audit.schema'
_description.text
;
     This dataname identifies the type of information contained in the datablock. Each
     possible value of this dataname is a list of 'Set' categories that may have more than a
     single value for each dataname in that category (that is, may have more than one row in
     the category loop).
;
loop_
_enumeration_state.code
_enumeration_state.detail
    'Structural'          [ ]
    'Space group tables'  [ space_group ]
_enumeration.default      'Structural'

(b) The 'Set' _definition.class attribute is updated to read as follows ("magic keys"):

                Set
               
;                 Datanames from a Set category usually appear as part of a key-value
                  pair or in a single-row loop, in which case instance files may
                  omit datanames that are linked to the Set category's key (if
                  such a key is defined).
;

Step 2
------

Approval of new values for _audit.schema should consider the
possible impact on the community in light of adoption rates of
the _audit.schema dataname and number of categories affected by
the changes.

Whenever a new value for _audit.schema is approved, the list of
newly-looped categories is added to the above enumeration list, and:

(i) The newly-looped categories are given a category key, probably in
a separate dictionary

(ii) All looped categories that depend on any newly-looped categories
are updated to always include key dataname(s) that point to the
dependent categor(ies).  For example, "atom_site" would have a
"atom_site.cell_id" dataname added if cell parameters were looped.
The precise meaning of 'depends' is that, if the depended-upon
category loop has multiple rows, then the dependent category would
need to include the key dataname pointing to the depended-upon
category in order to uniquely identify a row.  Again, these extra
key datanames would appear in a separate dictionary.

Discussion
==========

Effects on current standards
----------------------------

This proposal affects the DDLm/dREL standards only, and has no
implications for DDL2 or DDL1 dictionaries.  DDLm dictionaries will
still reproduce DDL1 behaviour, that is, all CIF1 files remain
semantically valid after application of DDLm aliases.

DDLm
----

It is no longer possible to specify Set categories as children of
other Set categories, as this would stop the parent becoming looped.
As the Set-Set parent-child relationship had no semantic meaning
(only organisational), this has no semantic implications.  Where
looping of the parent necessarily implies looping of the child, the
parent-child relationship can remain, but in this case it would
additionally allow optional merging of the parent and child loops,
which may not be intended.

dREL
----

dREL item methods reference datanames from 'Set' categories directly,
in "category.object" notation. All dREL methods can be considered to
operate on the current row of the category within which they are
defined, which means that the current value of any future Set category
child key dataname is available whenever such a category.object
reference is made. If dREL is tweaked to say that any category.object
references use the current value of the child key for that category,
the whole system works (and assuming a default key for non-looped
'Set' categories) and indeed is simplified in many cases, as the
explicit "category[foreign_key].object" notation can often be dropped
where a single key dataname to that category is defined. Some
categories use more than one key to the same category (e.g. geom_bond
has two datanames for the two atoms at each end of a bond) in which
case an explicit reference would still be necessary.

Other notes
===========

Datafiles conforming to any of the schema can be automatically
transformed to datafiles conforming to any of the other schema by
splitting items that now need to be single-valued into separate
datablocks, filtering all dependent loops in each new datablock using
the corresponding value of the child key, then dropping the child key
from the filtered loop.

The _audit.schema dataname acts differently to the dictionary
versioning datanames. _audit.schema provides a concise, precise
description of the compatibility of datablock contents, and permits
machine transformation between different schema.  In contrast, the
dictionary versioning mechanism cannot indicate whether or not a given
datablock will be incorrectly interpreted against a later or earlier
dictionary, and given that we have undertaken not to change the
meanings of datanames, it is reasonable for a programmer to assume
that datanames mean what the dictionary they are referring to at
program creation time says, regardless of the dictionary version
stated in a datafile.

Space_group presents no legacy issues as it behaves precisely as
described here.  Furthermore, the original vision of the symmetry
dictionary authors can be safely implemented to e.g. include
transformation matrices between different cell settings in a single
datablock.

Actions
=======

Approve the updated DDLm _definition.class definition in this group,
with note to COMCIFS.

Develop the definition of _audit.schema to link a CIF2 list to
the enumerated states rather than a text string.

Approve the _audit.schema dataname through cif_core and COMCIFS.

Write clear documentation for these enhancements and distribute
to cif-developers and on CIF website.

Update dREL implementations to properly interpret Set category
references.

Create one or more datafiles to test software conformance

Advertise the new dataname and actively work with authors of
popular CIF-reading software to update software.

--
T +61 (02) 9717 9907
F +61 (02) 9717 3145
M +61 (04) 0249 4148
_______________________________________________
ddlm-group mailing list
ddlm-group@iucr.org
http://mailman.iucr.org/cgi-bin/mailman/listinfo/ddlm-group

Reply to: [list | sender only]