Discussion List Archives

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[ddlm-group] Second proposal to allow looping of 'Set' categories

  • To: ddlm-group <ddlm-group@iucr.org>
  • Subject: [ddlm-group] Second proposal to allow looping of 'Set' categories
  • From: James Hester <jamesrhester@gmail.com>
  • Date: Wed, 8 Jun 2016 13:59:08 +1000
  • DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;h=mime-version:from:date:message-id:subject:to;bh=xqr+YaI28vNX0Ji/TjMWu/v0rU9nWA1IRa7UBe/DCtQ=;b=lpy7fougG+m5UP08ZNUPkQKGqY2cz85re/2HGEz+sOpdCr/5ZAwnQPSxwTCXmOdlnlr+G0P1OHdIoGaTI70XvPmxxr5idCTuo74ZLQyBQWLeBpcmms3DwfJnLCr0MoS93qm08dn+uFCaG32d4qhohB4yITssOnsR/BF46afm49ourvauoLYqC0qH8cArCAIq5GuR4lppJPvnW3c5YfKg4+ah3n6gsPLNQ9t5ah6/R6pswHzgWDb0T2F2n995DesCXR8vTm1sDSypsTuqyQoB31zp0sH4l2pncssTo8KJEAnTh6P8AqROzmLg+JcSOI0iWFVXoslZgnFORTYScHKAQg==

The previous proposal
(http://www.iucr.org/__data/iucr/lists/ddlm-group/msg01428.html) was
deemed inadequate (see discussion in that thread). The two key issues are

(i) current software must not misinterpret files produced
according to any new semantic principles.

(ii) we wish to minimise the number of datanames that software must
potentially check when searching for a particular item of information,
as each new dataname is a required update to all CIF reading software
that processes aliases of that dataname (unfortunately the vast bulk
of software does not use the latest version of the dictionary to find

Please carefully consider and improve the following proposal:

Proposal #2 for allowing loopable 'Set' categories

Step 1

(a) A new dataname '_audit.schema' (or similar) is defined, and all
CIF reading software is expected, after a transitional period, to
check its value; if missing, the value defaults to 'Structural',
corresponding to all current CIF1 datafiles.  Here is a sketchy

_definition.id    '_audit.schema'
     This dataname identifies the type of information contained in the datablock. Each
     possible value of this dataname is a list of 'Set' categories that may have more than a
     single value for each dataname in that category (that is, may have more than one row in
     the category loop).
    'Structural'          [ ]
    'Space group tables'  [ space_group ]
_enumeration.default      'Structural'

(b) The 'Set' _definition.class attribute is updated to read as follows ("magic keys"):

;                 Datanames from a Set category usually appear as part of a key-value
                  pair or in a single-row loop, in which case instance files may
                  omit datanames that are linked to the Set category's key (if
                  such a key is defined).

Step 2

Approval of new values for _audit.schema should consider the
possible impact on the community in light of adoption rates of
the _audit.schema dataname and number of categories affected by
the changes.

Whenever a new value for _audit.schema is approved, the list of
newly-looped categories is added to the above enumeration list, and:

(i) The newly-looped categories are given a category key, probably in
a separate dictionary

(ii) All looped categories that depend on any newly-looped categories
are updated to always include key dataname(s) that point to the
dependent categor(ies).  For example, "atom_site" would have a
"atom_site.cell_id" dataname added if cell parameters were looped.
The precise meaning of 'depends' is that, if the depended-upon
category loop has multiple rows, then the dependent category would
need to include the key dataname pointing to the depended-upon
category in order to uniquely identify a row.  Again, these extra
key datanames would appear in a separate dictionary.


Effects on current standards

This proposal affects the DDLm/dREL standards only, and has no
implications for DDL2 or DDL1 dictionaries.  DDLm dictionaries will
still reproduce DDL1 behaviour, that is, all CIF1 files remain
semantically valid after application of DDLm aliases.


It is no longer possible to specify Set categories as children of
other Set categories, as this would stop the parent becoming looped.
As the Set-Set parent-child relationship had no semantic meaning
(only organisational), this has no semantic implications.  Where
looping of the parent necessarily implies looping of the child, the
parent-child relationship can remain, but in this case it would
additionally allow optional merging of the parent and child loops,
which may not be intended.


dREL item methods reference datanames from 'Set' categories directly,
in "category.object" notation. All dREL methods can be considered to
operate on the current row of the category within which they are
defined, which means that the current value of any future Set category
child key dataname is available whenever such a category.object
reference is made. If dREL is tweaked to say that any category.object
references use the current value of the child key for that category,
the whole system works (and assuming a default key for non-looped
'Set' categories) and indeed is simplified in many cases, as the
explicit "category[foreign_key].object" notation can often be dropped
where a single key dataname to that category is defined. Some
categories use more than one key to the same category (e.g. geom_bond
has two datanames for the two atoms at each end of a bond) in which
case an explicit reference would still be necessary.

Other notes

Datafiles conforming to any of the schema can be automatically
transformed to datafiles conforming to any of the other schema by
splitting items that now need to be single-valued into separate
datablocks, filtering all dependent loops in each new datablock using
the corresponding value of the child key, then dropping the child key
from the filtered loop.

The _audit.schema dataname acts differently to the dictionary
versioning datanames. _audit.schema provides a concise, precise
description of the compatibility of datablock contents, and permits
machine transformation between different schema.  In contrast, the
dictionary versioning mechanism cannot indicate whether or not a given
datablock will be incorrectly interpreted against a later or earlier
dictionary, and given that we have undertaken not to change the
meanings of datanames, it is reasonable for a programmer to assume
that datanames mean what the dictionary they are referring to at
program creation time says, regardless of the dictionary version
stated in a datafile.

Space_group presents no legacy issues as it behaves precisely as
described here.  Furthermore, the original vision of the symmetry
dictionary authors can be safely implemented to e.g. include
transformation matrices between different cell settings in a single


Approve the updated DDLm _definition.class definition in this group,
with note to COMCIFS.

Develop the definition of _audit.schema to link a CIF2 list to
the enumerated states rather than a text string.

Approve the _audit.schema dataname through cif_core and COMCIFS.

Write clear documentation for these enhancements and distribute
to cif-developers and on CIF website.

Update dREL implementations to properly interpret Set category

Create one or more datafiles to test software conformance

Advertise the new dataname and actively work with authors of
popular CIF-reading software to update software.

T +61 (02) 9717 9907
F +61 (02) 9717 3145
M +61 (04) 0249 4148
ddlm-group mailing list

Reply to: [list | sender only]
International Union of Crystallography

Scientific Union Member of the International Science Council (admitted 1947). Member of CODATA, the ISC Committee on Data. Partner with UNESCO, the United Nations Educational, Scientific and Cultural Organization in the International Year of Crystallography 2014.

International Science Council Scientific Freedom Policy

The IUCr observes the basic policy of non-discrimination and affirms the right and freedom of scientists to associate in international scientific activity without regard to such factors as ethnic origin, religion, citizenship, language, political stance, gender, sex or age, in accordance with the Statutes of the International Council for Science.