Discussion List Archives

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [ddlm-group] Third and final proposal to enhance dREL

  • To: Group finalising DDLm and associated dictionaries <ddlm-group@iucr.org>
  • Subject: Re: [ddlm-group] Third and final proposal to enhance dREL
  • From: "Bollinger, John C" <John.Bollinger@STJUDE.ORG>
  • Date: Mon, 17 Sep 2018 14:49:14 +0000
  • Accept-Language: en-US
  • authentication-results: spf=none (sender IP is )smtp.mailfrom=John.Bollinger@STJUDE.ORG;
  • DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=SJCRH.onmicrosoft.com; s=selector1-stjude-org;h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck;bh=Jox2sIs9Ik4n1YWrjL3O/paJFleTZ3zWzh5/yswL6c4=;b=FLhg0c5M5R6K2z+a2Z+n/t7eG/LINTPJmm9Ezw2cHQSmCRpwJOUD3/9kr1B9fB4z04Sm4QWhjy4J8/Oe3mJ2tKk23+9abD/MzpJ4v6HPJIL7s7GfS2gqSQfzH+eTIMEHobxqVDstiIHTnWO9BI7UgKTmNVJt7msQqQtfo+BMUHA=
  • In-Reply-To: <CAM+dB2dWmcO6xGph9a6QvpdMHN+EGHS3B_btU6sLp-E_Esa-oA@mail.gmail.com>
  • References: <CAM+dB2dWmcO6xGph9a6QvpdMHN+EGHS3B_btU6sLp-E_Esa-oA@mail.gmail.com>
  • spamdiagnosticmetadata: NSPM
  • spamdiagnosticoutput: 1:99

Dear James and DDLm group,

 

First, I remark that whereas I received an e-mail “Proposal to update dREL, ,part I” and obviously the “Third and final proposal to enhance dREL”, I do not recall, nor do I find any record of, a second or part II proposal, which it appears would have contained items numbered 3 and 4.

 

With regard to this part III proposal, however:

 

1. I’m not sure I follow the intended purpose of the “enhance meaning of 'Validation' methods” item.  As I understand it, the proposal is to expose all the details of each item’s definition to dREL for the use of validation methods.  But the example of checking an item’s value against the allowed values of its enumerated type is something that I would expect a DDLm-based validator to do at its own initiative, without need of a dREL method being defined in the dictionary.  More generally, I consider it the role and responsibility of a DDLm-based validator to validate all the per-item and inter-item characteristics that the relevant dictionary defines via DDLm semantics.

 

2. The proposed new functions seem also to be aimed at supporting validation of DDLm-based semantics via methods expressed in data dictionaries.  Here too, I am inclined to think that the method behaviors that these are intended to support are not appropriate for expression in data dictionaries.  It ought not to be necessary, and I’m not presently seeing how it would be advantageous.

 

3. Overall, I have previously understood “Validation” methods as being aimed at supporting item cross validations that cannot be expressed via DDLm attributes.  It is unclear to me why or in what circumstances it would be necessary or appropriate for such validations to depend on DDLm attributes. As far as I can see, the semantics of DDLm ought to be handled at a different level -- dictionary authors should not be responsible for providing for them.  In a strategic sense, not only do I not think we _need_ to provide for externalizing validation of DDLm semantics, I don’t think we _want_ to do that.  However, it is possible that there are good use cases that I have not considered, so I am prepared to be persuaded.

 

 

John

 

 

From: ddlm-group [mailto:ddlm-group-bounces@iucr.org] On Behalf Of James Hester
Sent: Monday, September 10, 2018 1:30 AM
To: ddlm-group <ddlm-group@iucr.org>
Subject: [ddlm-group] Third and final proposal to enhance dREL

 

Dear DDLm group,

 

Please see below my final proposal for enhancements to dREL. This one is somewhat more substantial, but concerns only built-in functions and an altered execution context for an unused aspect of dREL/DDLm.  I urge you to try and pick holes or find better ways to encode validation in dREL, as these ideas are being presented in public for the first time. I will leave this on the table for a while, meanwhile implementing these built-in functions and trying them out with some Validation methods.  Given a positive outcome of any discussion here and at Github, I would plan to fold these changes into a '1.0' version of DDLm for publication on our website and inclusion in the next edition of Volume G.

 

This proposal may also be read as more nicely formatted text at  https://github.com/COMCIFS/dREL/blob/master/drel_changes_3.rst , and comments in the 'Issues' tab there are welcome.

 

James.

====

 

Proposed changes to dREL, part III

==================================

 

Introduction

------------

 

dREL is a machine-actionable language describing data relationships

and designed to be embedded in DDLm dictionaries. The language is

defined both explicitly in the dREL publication [1] and implicitly by

the dREL code appearing in the DDLm core CIF dictionary. Note that

the code in the core CIF dictionary significantly expands the language

presented in the paper, for example, by adding category methods.

 

The present proposals concern the built-in functions that are

supported by dREL. No syntax changes or enhancements are proposed.

 

Proposal 5: enhance meaning of 'Validation' methods

---------------------------------------------------

 

It is proposed that a ``_method.purpose`` of ``Validation`` will imply

that all DDLm attribute categories may appear as variables within the

associated dREL method, and that the values of these attributes are

those for the data name being validated.  Additional predefined

variables ``value`` and ``category`` are bound to the particular value

and loop being validated.

 

Explanation

~~~~~~~~~~~

 

DDLm requires that each method have an associated ``_method.purpose``.

The current DDLm attributes dictionary defines ``Evaluation``, ``Validation``

and ``Definition``.  The ``Validation`` purpose is given as

 

     method compares an evaluation with existing item value

 

This type of method appears only once in the DDLm core dictionary, to

show how the crystal system can be checked against the cell parameters.

This could equally well be performed by creating a data name whose

``Evaluation`` dREL method returns ``True`` if the conditions are met.

The present proposal therefore suggests repurposing Validation methods for

more general validation by providing them with access to all attributes

of the definition of any data name that they are used to check. This allows

the methods to confirm, for example, that a value for a data name matches

the list of allowed enumerated values.

 

Note that currently, all categories from the dictionary in which a

dREL method appears can appear as pre-defined variables in that dREL

method, with values obtained from an associated data block. This proposal

enhances that list with the attributes for the definition of the data name

being checked.

 

The ``category`` and ``value`` pre-bound variables are required in order

to represent the generic value(s) being checked.

 

Example

~~~~~~~

 

The following code finds enumerated values that are not allowed. It

would appear in the definition for a data name

``valid.bad_enumerated_values``.  The ``enumeration_set`` variable

contains the contents of the ``enumerated_set`` category in the

definition for a given data name, and ``value`` is bound by the

execution environment to the particular value being checked.  The

execution environment is responsible for collating values of this

data name for each data name in the data block being checked.::

 

    # Check that a value is listed in an enumeration

    found = 'False'

    # Loop over enumerated states in the definition for this

    # data name

    loop e as enumeration_set {    

        if (value == e.state) found = 'True';

        }

    valid.bad_enumerated_value = found

 

Proposal 5: Extra validation functions

--------------------------------------

 

It is proposed to add the following functions to the list of those

allowed in dREL methods:

 

Reference(name,attribute)

    The value of ``attribute`` in the

    dictionary definition of ``name`` is returned.  Both ``attribute`` and

    ``name`` are either string literals or string-valued variables. Where

    the result would be a loop, an appropriate dREL category object would

    be returned.

 

Instance(category)

    Returns an instance of category ``category`` in the data block

    provided to the dREL method.

 

PacketData(container,object)

    Returns specific data corresponding to ``object`` in ``container``.

    The functional equivalent of the syntax ``cat.obj``,

    where ``cat`` is the value of ``container`` and ``obj`` is the value of

    ``object``. If ``container`` is a category, the row must be

    unambiguous from context, if necessary using the resolution rules

    of the proposals in Part II.

 

Lookup(category,keys)

    The functional equivalent of ``cat[k1=val1,k2=val2,...]``

    where ``cat`` is the value of ``category`` and ``keys`` is the dictionary

    ``{'k1':val1,'k2':val2,...}``.

 

Known(name)

    evaluates to true if a value for the object referenced by

    ``name`` can be found, false otherwise.  If ``name`` does not resolve

    to a ``category.object`` reference, or the particular row of a

    multi-row category is unknown, will return false.

 

Explanation

~~~~~~~~~~~

 

A dREL method for checking conformance to requirements arising out of

DDLm attributes (for example, that a value is drawn from a list of values

of a different data name) cannot have 'hard-coded' ``<category>.<object>``

names, as the method would no longer be applicable to all data names.

The above functions are therefore required to provide access into categories

and data in a generalised way. 

 

Examples

~~~~~~~~

 

``Reference('atom_type.symbol','enumerated_set')``

    Return the contents

    of the ``enumerated_set`` loop in the definition of ``atom_type.symbol``.

 

``Loop i as Instance( Reference( name.linked_item_id,'_name.category_id'))'``

    Loop over all rows of the category

    containing the data name contained in variable

    ``name.linked_item_id``. Note that ``name.linked_item_id`` is not

    contained in quotes and therefore will be assigned the value given in the

    definition of the data name being validated. The ``Reference`` function returns

    a string naming the category of the linked data name, and the ``Instance``

    function takes that string and returns a category object that is populated

    with the values in the data file.

 

Finding values that are not child values. ::

 

    # Find values that are not those of the linked data name.

    result = 'False'

    linked_object = Reference(name.linked_item_id,'_name.object_id')

    loop i as Instance(Reference(name.linked_item_id,'_name.category_id')) {

        if (PacketData(i,linked_object) == value) result = 'True'

    }

    valid.is_child_key = result

 

Finding and returning repeated values of a key data name as the

value of data name ``valid.not_unique``. Note the use of variable

``category`` to refer to the loop being checked. ::

 

    # find key values that are not unique.

    not_unique = []

    # Accumulate keys

    keylist = []

    # get the object id for each key data name

    Loop k as category_key {

        keylist ++= Reference(k.name,'name.object_id') #Append

        }

    Loop c as category {

        new_val = []

        for ko in keylist {

            new_val ++= PacketData(c,ko) #Append

            }

        if (new_val in keylist) {

            not_unique ++= new_val

        }

    else {

        keylist ++= new_val

        }

    valid.not_unique = not_unique

    

Proposal 6: Extension of 'in' to substrings

-------------------------------------------

 

It is proposed that the construction ``<string1> in <string2>`` be interpreted

as a boolean statement that returns true if ``<string1>`` is a substring of

``<string2>``.

 

Explanation

~~~~~~~~~~~

 

``in`` in dREL is currently only applied to testing membership in a

List or Array.  dREL as published proposes using the ``Substr``

function to test for membership of a string in another string. This

could be more economically performed using the ``in`` keyword without

compromising the use for Lists or Arrays. This also accords with the

use of ``in`` in Python.

 

Proposal 7: Removal of built-in functions

-----------------------------------------

 

The following functions are proposed for removal from the list of

provided functions:

 

TopLo, TopHi (sorting low->high, high->low)

    functionality duplicated by combinations of sort() and reverse()

 

Substr

    functionality replaced by Proposal 6.

 

 

[1] Spadaccini et. al,

(2012) *J. Chem. Inf. Model.* **52**(8) pp 1917-1925

 

--

T +61 (02) 9717 9907
F +61 (02) 9717 3145
M +61 (04) 0249 4148



Email Disclaimer: www.stjude.org/emaildisclaimer
Consultation Disclaimer: www.stjude.org/consultationdisclaimer
_______________________________________________
ddlm-group mailing list
ddlm-group@iucr.org
http://mailman.iucr.org/cgi-bin/mailman/listinfo/ddlm-group

Reply to: [list | sender only]
International Union of Crystallography

Scientific Union Member of the International Science Council (admitted 1947). Member of CODATA, the ISC Committee on Data. Partner with UNESCO, the United Nations Educational, Scientific and Cultural Organization in the International Year of Crystallography 2014.

International Science Council Scientific Freedom Policy

The IUCr observes the basic policy of non-discrimination and affirms the right and freedom of scientists to associate in international scientific activity without regard to such factors as ethnic origin, religion, citizenship, language, political stance, gender, sex or age, in accordance with the Statutes of the International Council for Science.