Discussion List Archives

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[ddlm-group] dREL changes and removal of keys in cif_core

  • To: ddlm-group <ddlm-group@iucr.org>
  • Subject: [ddlm-group] dREL changes and removal of keys in cif_core
  • From: James Hester <jamesrhester@gmail.com>
  • Date: Thu, 1 Sep 2016 15:29:54 +1000
  • DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;h=mime-version:from:date:message-id:subject:to;bh=B0ewYF5pcGx6mODiW69RXMTo1F3vrUPUjqaKlsdKe0s=;b=jW3GrC4T7Ny76LRpQnbi/4fh7MeNigYT9CFUjOiBRgpEpxUxRIyqvaKR7PDfpDWltl6QO2Ya5gkY1oY3ueGl1tcYCXHbGL73iiJrXzChnydmGZpnT6w1fUujPWPql7P7kIu9ECe505agOKm3VYTWWDREBPdjJGPRfUd4RHN/gDIMBhoVAYzxZGk++PdDtskJMWl9D3etL4KYqfPIma5WZIsnb+toA5oetjQus8Dgq8aki3foLd5JJTN7FBomKQQOYtoOOY7LSKhBNkCAQl20aY1TFkez2lzeDBlKL4HsBlEYrADPHtUdY3LCNZt/JXzCyHRZuGRb+ynx71T3jQn/3A==
Dear DDLm group,

Below is a document following on from the hub and spoke / _audit.schema proposal.  The main effects will be an expansion of the category[key].name syntax in dREL, and the removal of synthetic primary keys in cif_core.  Due to the removal of these datanames, it is better that this is agreed before the cif_core dictionary is formally approved and these draft datanames become official.  For reasons which I will provide in the COMCIFS forum, the timetable for cif_core approval is starting to become tight, and so I would welcome comments on this (hopefully final) proposal in a timely fashion. If no objections are forthcoming, I will advise the cif_core DMG and expedite the presentation of the latest cif_core dictionary to COMCIFS for final approval.

James.

Proposal to adjust key operation in dREL/DDLm
=============================================

Introduction
==========

This note follows through on the consequences for dREL of additional
keys being added to loops as per the hub-spoke proposal.

The current draft cif_core dictionary is written around the
requirement that each category has a single dataname that
acts as a key. Where necessary, this dataname is synthesised
from compound keys (e.g. '_refln.hkl = [_refln.h,refln.k,refln.l]').
This requirement appears to have arisen to cater to the
dREL construction 'category[key].name'.

In a situation where additional datanames might join the category key,
the previous primary key no longer functions as a key, and so DDLm
and/or dREL have to be retooled to allow for this eventuality.

  
Motivating example
==================

The most common use of the category[key].name construction is searching
for atomic position by atomic label, for example the assignment to 'xf'
below:

;
     With b as geom_bond
     xc = List()
     For [label,symop] in  [[b.atom_site_1,b.site_symmetry_1],
                            [b.atom_site_2,b.site_symmetry_2]]   {
         xf   =   SymEquiv(symop, _atom_site[label].fract_xyz)
         xc ++=  _atom_sites_Cartn_transform.matrix * xf
      }
      _geom_bond.distance =  Norm ( xc[0] - xc[1] )
;

'label' is a key value in the atom_site list, and 'symop' is a key
value in the symmetry operator list (inside the Symop function, not
shown here).

Imagine now that we have switched to the 'variant' _audit.schema, in
which the atom_site list has an additional variant id tag for
e.g. different possible structural solutions.  How do we adjust or
rewrite the DDLm and dREL above to cater to this?

Proposal to Enhance dREL/DDLm
=============================

The following adjustments would require minimal changes to dREL code
when moving between different schemas, and significantly declutter the
current dictionary.
  
1.  The category[key].name construction is enhanced to become
    category[dataname_1=value1,dataname_2=value2,...].name or
    category[dataname_1=local_dataname_1,...].name, where 'dataname_n'
    is an object_id in 'category' and 'local_dataname_1' is an
    object_id in the category of the dREL method.
    'dataname=local_dataname_1' can become simply 'local_dataname_1',
    if an unambiguous sibling dataname is available in the referenced
    category. 'local_dataname_1' can be omitted altogether if both the
    present and referenced category have unambiguous sibling key
    datanames.  A single 'value' can be provided instead of
    'dataname=value' if there is only one key component in the
    referenced category that cannot be matched with a sibling in the
    present category.

2. 'Synthetic' primary keys are no longer needed and are removed from
   the dictionary.
     
In the above example, atom_site has only a single key, so the usage of
'label' is unambiguous and can remain as is.  In our 'variant' schema
scenario, both atom_site and geom_bond are provided with an additional
key linking to the variant 'hub' category. Again, no change in the
above dREL is required, as there is an unambiguous link between the
two variant key datanames and the only remaining element of the compound
key is the atom_site label.

Discussion
==========

The only changes to the current cif_core dictionary are to remove
unnecessary synthetic keys and to rewrite the dREL that assumed a
certain internal structure for these keys (I have done this already).
The major cost is in rewriting dREL implementations to perform the key
matching. This cost can be deferred until different _audit.schemas are
approved, as the current cif_core does not actually use synthetic keys
for lookup (although it creates them).

The current cif_core dREL approach is untidy in a situation where
different categories may depend on different subsets of the keys of a
third category (I gather this occurs in the macromolecular context).
So, for example, Category A may have a compound key composed of items
1,2,3,4,5. Category B may have a compound key of items 1,2,3 and
Category C of items 2,3,4. In the current cif_core approach, each of
A,B,C have primary key values created from the values of the
respective compound key items.  So in order for category A to index
the appropriate entry in category B or C, it must use its 'master'
primary key to find the values of the appropriate items, and then use
these to construct the 'master' key of the respective category, and
then finally use this key to find the row in the target category.
The suggested approach allows each key equivalence to be expressed
explicitly and directly in the lookup, if they are not already
available from DDLm.

For this reason, redefinition and/or renaming of the primary key was
not considered as a viable option when adding new key components to
categories.  As well as causing an undesirable proliferation of
datanames, the above key construction step would have to be rewritten
in dREL for each _audit.schema and for every method that accessed
external categories affected by the schema.


--
T +61 (02) 9717 9907
F +61 (02) 9717 3145
M +61 (04) 0249 4148
_______________________________________________
ddlm-group mailing list
ddlm-group@iucr.org
http://mailman.iucr.org/cgi-bin/mailman/listinfo/ddlm-group

Reply to: [list | sender only]
International Union of Crystallography

Scientific Union Member of the International Science Council (admitted 1947). Member of CODATA, the ISC Committee on Data. Partner with UNESCO, the United Nations Educational, Scientific and Cultural Organization in the International Year of Crystallography 2014.

International Science Council Scientific Freedom Policy

The IUCr observes the basic policy of non-discrimination and affirms the right and freedom of scientists to associate in international scientific activity without regard to such factors as ethnic origin, religion, citizenship, language, political stance, gender, sex or age, in accordance with the Statutes of the International Council for Science.