Re: [ddlm-group] Proposal to update dREL, part II
- To: ddlm-group <ddlm-group@iucr.org>
- Subject: Re: [ddlm-group] Proposal to update dREL, part II
- From: James Hester <jamesrhester@gmail.com>
- Date: Tue, 9 Oct 2018 17:43:32 +1100
- DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025;h=mime-version:references:in-reply-to:reply-to:from:date:message-id:subject:to; bh=t/FktH3Yjw/7jHB+GtX41fvZV85EnvXMnU7izO1/lGI=;b=lC14EkXl6cfB3AQsLD/vOLfZtEC4sBwrDb0cQT4IR6QoUMgTM8Yxntzdda+v4buCeyRqps1XNZYSnMtz9WDOoaimUiqI64ZgN379j7cLLXIUXycsHxh1CzCsNqEKexjDZyeyXRv2Ehc7R1+BKQ+KlgcQdKo/celfY8tq8MnA4frmuEfyuuquwP2IiIWfX+2i5DjIPIk6IWFvi5z5L58mixqI/o4dIolVstQWIZbR0e5/hC+MQkw4HJKmXgKx8/jaG6ky/Zh2l9fpX4DMMgnBJTl2wNjzBV9O7UlpWc3+ylIyJw6wG1hMvTY/S7vpk4ClcfUaXjbNZ9rfYFlHU0owPw==
- In-Reply-To: <DM6PR04MB455619B3F65822D08D557122E01C0@DM6PR04MB4556.namprd04.prod.outlook.com>
- References: <CAM+dB2c-rOHjEDqQcZ3+AJsoKP9JwERvV2f=B5_HMVXoM3kKMw@mail.gmail.com><DM6PR04MB455619B3F65822D08D557122E01C0@DM6PR04MB4556.namprd04.prod.outlook.com>
Dear DDLm group,
With respect to proposal 3, I agree in principle that the proposed syntax extension seems to yield an improvement, but the details are not completely clear to me. Specifically,
- May the _category.key_id be used in the expanded syntax? Including if it is not named as a _category_key.name?
- More generally, which attributes are permitted to be used to index a category? Must they be among those whose names are listed in the category’s own _category_key.name attribute, or is this to become a more general facility?
- Is it necessary to specify a complete key when this syntax is used for a category with a compound natural key?
- In the proposed syntax, are the key names given as simple attribute names or as full CIF item names?
----
With respect to proposal 4, I agree with the general idea that dREL should prefer to avoid requiring method implementations to explicitly express category keys that can reliably be determined from context. How that applies here depends to some extent on proposal 3, however.
Additionally, before considering going forward with this proposal, I think we need to describe more formally the cases in which the key values can be conveyed implicitly. For example, the description remarks that “this short cut is not possible where more than one data name is linked to the same category key”, but I’m not confident that I know how to recognize all such cases programmatically.
Also relevant: are we assuming that linked items are always [components of] their categories’ keys? Does anything break under this proposal if non-key attributes are linked?
_______________________________________________
Best,
John
--
John C. Bollinger, Ph.D.
Computing and X-Ray Scientist
Department of Structural Biology
St. Jude Children's Research Hospital
(901) 595-3166 [office]
From: ddlm-group [mailto:ddlm-group-bounces@iucr.org] On Behalf Of James Hester
Sent: Monday, September 17, 2018 5:39 PM
To: ddlm-group <ddlm-group@iucr.org>
Subject: [ddlm-group] Proposal to update dREL, part II
It appears that after preparing part II I completely forgot to send it to the group. The marked-up version of this second proposal is available at https://github.com/COMCIFS/dREL/blob/master/drel_changes_2.rst
Proposed changes to dREL, part II
=================================
Introduction
------------
dREL is a machine-actionable language describing data relationships
and designed to be embedded in DDLm dictionaries. The language is
defined both explicitly in the dREL publication [1] and implicitly by
the dREL code appearing in the DDLm core CIF dictionary. Note that
the code in the core CIF dictionary significantly expands the language
presented in the paper, for example, by adding category methods.
The present changes were foreshadowed in the discussion about allowing
set methods to become looped [2]. They are aimed at removing the
current dREL-imposed requirement that all categories must have a
single data name that acts as a key.
Proposal 3: compound key specification
--------------------------------------
dREL as published permits a particular row in a loop to be specified
by providing the value of the key for that loop using the syntax
``<category>[keyvalue]``, so for example, ``atom_site['O1']`` would be the
row in the atom_site loop for which ``_atom_site.label`` (the key data
name for category ``atom_site``) is 'O1'. We propose expanding
this syntax to allow multiple key values to be specified:
``<category>[name1=value1,name2=value2]`` would specify the row of
``<category>`` for which category objects ``name1`` and ``name2`` take
values of ``value1`` and ``value2`` respectively.
Explanation
~~~~~~~~~~~
The current core CIF dictionary treats multi-key categories by
defining a synthetic data name for each such category. These synthetic
data names are currently just a list of the values of the multiple
keys. Having such single-dataname keys allows the dREL syntax to
be unambiguous for all Loop categories.
This approach is suboptimal because:
(1) The synthetic data names have no scientific relevance
(2) A considerable amount of DDLm machinery has been developed simply
because of the resulting inhomogeneous lists. Without
these synthetic data names, there would be *no* need in the current
core dictionary for ragged and nested dimensions and multiple
data types within a single list, and therefore no requirement
for DDLm and dREL implementors to cope with such structures.
(3) dREL methods wishing to index into a multi-key category have to
construct the synthetic keys from the individual values; the new
syntax would save that line of boilerplate
(4) If a set category becomes looped, a number of looped categories
will acquire a new key data name. If single-key loops remain a
dREL requirement, previously single-key loops will require a new,
synthetic data name to be created. Note that it could be argued
that this is the way the system was designed to work.
The previous syntax will still be acceptable in those situations where
there is a single key, or where the values of the remaining keys are
unambiguous in context (see next proposal).
This proposed syntax has been included in the example EBNF for dREL
and the transformation to Python code implements the proposed semantics.
Proposal 4: elide keys where they are clear from context
--------------------------------------------------------
If category A contains data names which are parents or children of key
data names in category B, dREL methods in category A do not need to
explicitly specify the key values of category B when accessing rows of
category B.
Explanation
~~~~~~~~~~~
If b.k1 and b.k2 are the keys of category B, and data names A.a1 and
A.a2 are linked through ``_name.linked_item_id`` DDLm declarations to
those keys, then any dREL method in category A can simply write ``b.d3``
to access a specific value of dataname ``d3`` in category ``b``. This is
equivalent to writing ``b[k1=a.a1,k2=a.a2].d3`` under proposal 3.
Note that this short cut is not possible where more than one data name
is linked to the same category key, for example, in ``geom_bond``
two data names are linked to ``atom_site.label``.
Note that partial resolution of data names is also possible, so that
key references that are missing from the original form may be resolved
using linked data names.
Discussion
----------
The net result of the above two proposals is to make looping Set
categories relatively painless. A dREL reference like ``cell.vector_a``
may remain untouched when multiple cells are present, as long as the
category within which the dREL method appears has only a single
data name that is a child of the single key data name of ``cell``.
However, in situations where the ``<category>[value]`` syntax has
been used and ``<category>`` acquires a new key data name because
some other category has become looped, dREL methods will need
to be rewritten to explicitly specify the key data name that
``value`` corresponds to. Going forward, the ``[key=value]``
syntax should be preferred to minimise the need to rewrite
methods in advanced looping applications.
We should also be aware the dREL methods in our dictionaries are
curated, and therefore we can apply style guidelines to prefer the
explicit notation of proposal 3 as we see fit.
[1] Spadaccini et. al,
(2012) *J. Chem. Inf. Model.* **52**(8) pp 1917-1925
--
T +61 (02) 9717 9907
F +61 (02) 9717 3145
M +61 (04) 0249 4148
Email Disclaimer: www.stjude.org/emaildisclaimer
Consultation Disclaimer: www.stjude.org/consultationdisclaimer
ddlm-group mailing list
ddlm-group@iucr.org
http://mailman.iucr.org/cgi-bin/mailman/listinfo/ddlm-group
F +61 (02) 9717 3145
M +61 (04) 0249 4148
_______________________________________________ ddlm-group mailing list ddlm-group@iucr.org http://mailman.iucr.org/cgi-bin/mailman/listinfo/ddlm-group
Reply to: [list | sender only]
- Follow-Ups:
- Re: [ddlm-group] Proposal to update dREL, part II (Herbert J. Bernstein)
- References:
- [ddlm-group] Proposal to update dREL, part II (James Hester)
- Re: [ddlm-group] Proposal to update dREL, part II (Bollinger, John C)
- Prev by Date: Re: [ddlm-group] Proposal to update dREL, part II
- Next by Date: Re: [ddlm-group] Proposal to update dREL, part II
- Prev by thread: Re: [ddlm-group] Proposal to update dREL, part II
- Next by thread: Re: [ddlm-group] Proposal to update dREL, part II
- Index(es):