Discussion List Archives

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[ddlm-group] Third and final proposal to enhance dREL

  • To: ddlm-group <ddlm-group@iucr.org>
  • Subject: [ddlm-group] Third and final proposal to enhance dREL
  • From: James Hester <jamesrhester@gmail.com>
  • Date: Mon, 10 Sep 2018 16:29:51 +1000
  • DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025;h=mime-version:from:date:message-id:subject:to;bh=+YLTRVqzs+DaGssntPjabPR7FRRbXOKRM/oX7V+6GcA=;b=alsWw6GYJ6gEfHUfqfkUGezpgxfudNwISx7Gi79txeJXMQQUSKSL19VvV3ZYZQopAS8ezmiWYz7Et8zZPDicUGNGYu5DO0TkpYzt+arPSGOSmdppaDGDcLFFO6LesU/GsACMfgQmS+ac8SO4qoF0MyYUmMmgdAqy7rBuUfSCPkCuOj2VdnNdwNMi6G18yj82/Av29W5tcXW/q4PrS9zvsuhhSi9KK/KVVQfmxC6s+Q+6PIc/x4F8RqFD3g82zcnFbBz2c7Ok9lMPd3mxEm253DgTSqUVwikg0F1pQQDwa3IpKxjQbd16pG05iWJA/A6w/K8PozZTRcOi/vUlMoop9A==
Dear DDLm group,

Please see below my final proposal for enhancements to dREL. This one is somewhat more substantial, but concerns only built-in functions and an altered execution context for an unused aspect of dREL/DDLm.  I urge you to try and pick holes or find better ways to encode validation in dREL, as these ideas are being presented in public for the first time. I will leave this on the table for a while, meanwhile implementing these built-in functions and trying them out with some Validation methods.  Given a positive outcome of any discussion here and at Github, I would plan to fold these changes into a '1.0' version of DDLm for publication on our website and inclusion in the next edition of Volume G.

This proposal may also be read as more nicely formatted text at  https://github.com/COMCIFS/dREL/blob/master/drel_changes_3.rst , and comments in the 'Issues' tab there are welcome.

James.
====

Proposed changes to dREL, part III
==================================

Introduction
------------

dREL is a machine-actionable language describing data relationships
and designed to be embedded in DDLm dictionaries. The language is
defined both explicitly in the dREL publication [1] and implicitly by
the dREL code appearing in the DDLm core CIF dictionary. Note that
the code in the core CIF dictionary significantly expands the language
presented in the paper, for example, by adding category methods.

The present proposals concern the built-in functions that are
supported by dREL. No syntax changes or enhancements are proposed.

Proposal 5: enhance meaning of 'Validation' methods
---------------------------------------------------

It is proposed that a ``_method.purpose`` of ``Validation`` will imply
that all DDLm attribute categories may appear as variables within the
associated dREL method, and that the values of these attributes are
those for the data name being validated.  Additional predefined
variables ``value`` and ``category`` are bound to the particular value
and loop being validated.

Explanation
~~~~~~~~~~~

DDLm requires that each method have an associated ``_method.purpose``.
The current DDLm attributes dictionary defines ``Evaluation``, ``Validation``
and ``Definition``.  The ``Validation`` purpose is given as

     method compares an evaluation with existing item value

This type of method appears only once in the DDLm core dictionary, to
show how the crystal system can be checked against the cell parameters.
This could equally well be performed by creating a data name whose
``Evaluation`` dREL method returns ``True`` if the conditions are met.
The present proposal therefore suggests repurposing Validation methods for
more general validation by providing them with access to all attributes
of the definition of any data name that they are used to check. This allows
the methods to confirm, for example, that a value for a data name matches
the list of allowed enumerated values.

Note that currently, all categories from the dictionary in which a
dREL method appears can appear as pre-defined variables in that dREL
method, with values obtained from an associated data block. This proposal
enhances that list with the attributes for the definition of the data name
being checked.

The ``category`` and ``value`` pre-bound variables are required in order
to represent the generic value(s) being checked.

Example
~~~~~~~

The following code finds enumerated values that are not allowed. It
would appear in the definition for a data name
``valid.bad_enumerated_values``.  The ``enumeration_set`` variable
contains the contents of the ``enumerated_set`` category in the
definition for a given data name, and ``value`` is bound by the
execution environment to the particular value being checked.  The
execution environment is responsible for collating values of this
data name for each data name in the data block being checked.::

    # Check that a value is listed in an enumeration
    found = 'False'
    # Loop over enumerated states in the definition for this
    # data name
    loop e as enumeration_set {    
        if (value == e.state) found = 'True';
        }
    valid.bad_enumerated_value = found

Proposal 5: Extra validation functions
--------------------------------------

It is proposed to add the following functions to the list of those
allowed in dREL methods:

Reference(name,attribute)
    The value of ``attribute`` in the
    dictionary definition of ``name`` is returned.  Both ``attribute`` and
    ``name`` are either string literals or string-valued variables. Where
    the result would be a loop, an appropriate dREL category object would
    be returned.

Instance(category)
    Returns an instance of category ``category`` in the data block
    provided to the dREL method.

PacketData(container,object)
    Returns specific data corresponding to ``object`` in ``container``.
    The functional equivalent of the syntax ``cat.obj``,
    where ``cat`` is the value of ``container`` and ``obj`` is the value of
    ``object``. If ``container`` is a category, the row must be
    unambiguous from context, if necessary using the resolution rules
    of the proposals in Part II.

Lookup(category,keys)
    The functional equivalent of ``cat[k1=val1,k2=val2,...]``
    where ``cat`` is the value of ``category`` and ``keys`` is the dictionary
    ``{'k1':val1,'k2':val2,...}``.

Known(name)
    evaluates to true if a value for the object referenced by
    ``name`` can be found, false otherwise.  If ``name`` does not resolve
    to a ``category.object`` reference, or the particular row of a
    multi-row category is unknown, will return false.

Explanation
~~~~~~~~~~~

A dREL method for checking conformance to requirements arising out of
DDLm attributes (for example, that a value is drawn from a list of values
of a different data name) cannot have 'hard-coded' ``<category>.<object>``
names, as the method would no longer be applicable to all data names.
The above functions are therefore required to provide access into categories
and data in a generalised way. 

Examples
~~~~~~~~

``Reference('atom_type.symbol','enumerated_set')``
    Return the contents
    of the ``enumerated_set`` loop in the definition of ``atom_type.symbol``.

``Loop i as Instance( Reference( name.linked_item_id,'_name.category_id'))'``
    Loop over all rows of the category
    containing the data name contained in variable
    ``name.linked_item_id``. Note that ``name.linked_item_id`` is not
    contained in quotes and therefore will be assigned the value given in the
    definition of the data name being validated. The ``Reference`` function returns
    a string naming the category of the linked data name, and the ``Instance``
    function takes that string and returns a category object that is populated
    with the values in the data file.

Finding values that are not child values. ::

    # Find values that are not those of the linked data name.
    result = 'False'
    linked_object = Reference(name.linked_item_id,'_name.object_id')
    loop i as Instance(Reference(name.linked_item_id,'_name.category_id')) {
        if (PacketData(i,linked_object) == value) result = 'True'
    }
    valid.is_child_key = result

Finding and returning repeated values of a key data name as the
value of data name ``valid.not_unique``. Note the use of variable
``category`` to refer to the loop being checked. ::

    # find key values that are not unique.
    not_unique = []
    # Accumulate keys
    keylist = []
    # get the object id for each key data name
    Loop k as category_key {
        keylist ++= Reference(k.name,'name.object_id') #Append
        }
    Loop c as category {
        new_val = []
        for ko in keylist {
            new_val ++= PacketData(c,ko) #Append
            }
        if (new_val in keylist) {
            not_unique ++= new_val
        }
    else {
        keylist ++= new_val
        }
    valid.not_unique = not_unique
    
Proposal 6: Extension of 'in' to substrings
-------------------------------------------

It is proposed that the construction ``<string1> in <string2>`` be interpreted
as a boolean statement that returns true if ``<string1>`` is a substring of
``<string2>``.

Explanation
~~~~~~~~~~~

``in`` in dREL is currently only applied to testing membership in a
List or Array.  dREL as published proposes using the ``Substr``
function to test for membership of a string in another string. This
could be more economically performed using the ``in`` keyword without
compromising the use for Lists or Arrays. This also accords with the
use of ``in`` in Python.

Proposal 7: Removal of built-in functions
-----------------------------------------

The following functions are proposed for removal from the list of
provided functions:

TopLo, TopHi (sorting low->high, high->low)
    functionality duplicated by combinations of sort() and reverse()

Substr
    functionality replaced by Proposal 6.


[1] Spadaccini et. al,
(2012) *J. Chem. Inf. Model.* **52**(8) pp 1917-1925

--
T +61 (02) 9717 9907
F +61 (02) 9717 3145
M +61 (04) 0249 4148
_______________________________________________
ddlm-group mailing list
ddlm-group@iucr.org
http://mailman.iucr.org/cgi-bin/mailman/listinfo/ddlm-group

Reply to: [list | sender only]
International Union of Crystallography

Scientific Union Member of the International Science Council (admitted 1947). Member of CODATA, the ISC Committee on Data. Partner with UNESCO, the United Nations Educational, Scientific and Cultural Organization in the International Year of Crystallography 2014.

International Science Council Scientific Freedom Policy

The IUCr observes the basic policy of non-discrimination and affirms the right and freedom of scientists to associate in international scientific activity without regard to such factors as ethnic origin, religion, citizenship, language, political stance, gender, sex or age, in accordance with the Statutes of the International Council for Science.