[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[ddlm-group] Refocusing discussion on dREL use for validation

  • To: ddlm-group <ddlm-group@iucr.org>
  • Subject: [ddlm-group] Refocusing discussion on dREL use for validation
  • From: James Hester <jamesrhester@gmail.com>
  • Date: Thu, 18 Oct 2018 16:16:39 +1100
  • DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025;h=mime-version:reply-to:from:date:message-id:subject:to;bh=g7dmmu1f4MCO5/+09dd9P6m2QQlj09DeJlDV/55TtT0=;b=OBTvSUj/ltaTfvYSVIcM33oeNTTBqMM8oTquHrw4yGv+0kioaaD+9LlfSxcYuzcgoYMi1s3GyrBdTQmQGCASfvxOSim3dqBSg8g0fUgsih8x6Ng3dbVThhQOwfNX4uMcNEdvb744kAoxnJYCEN0STyj9InPgcUQhGbpKaWmRv5S2bK0R6ZCOniO8vt0MEwG9jjrIHCCQkOs/kiKDtzpma+X4MnM4LceUrdgbZTHa3ASl7mFX7NUqMCRER3tVRopnE/YagTDevNSl7BrcAaYEx6XiFlL41aW2RmXsmz2eSETnyEj2DepnKqSK4eq8Ojfmp/QRlqc4mnAy3046eD1gsg==
Dear DDLm group,

This email is intended as a bit of a reset on the discussion regarding my 3rd round of proposals for enhancing dREL. John has pointed out a certain lack of focus and clear vision in the previous emails, so I propose here to focus on a particular task: that of validating a data file's contents relative to information in a domain DDLm dictionary.  I restrict the meaning of 'validation' here to checking conformance to DDLm attributes, and explicitly exclude checking the sort of mathematical relationships that are currently covered by dREL methods, such as cell volume matching cell parameters.  Example of the type of validation I wish to discuss are therefore checking that a value is drawn from the acceptable set of enumerated values, or that values taken by a child data name are drawn from the set of parent data name values, or that a value falls within a specified range.

First: do we agree that such validation is useful? I think yes, as CheckCIF does check that certain data names have allowed values, but if not, then the rest of this project is pointless.

I believe that expressing these checks in a programming-language-agnostic way is important, as this would avoid us being pinned to particular environments and systems over time.  Furthermore, I think that dREL would be a good choice, as it is tightly matched to the dictionary environment and tools that transform it to <insert your favourite language+CIF environment here> can be re-used.

So, given that we wish to use dREL, can we make it work for our simple task of checking enumerated values?  dREL as currently conceived executes in a well-defined environment, which can be described as follows, if a dREL definition is located in the definition for object 'd' in category 'c', with supplied data block 'f': 

The following immutable bindings have been made:
(i) a single packet of category 'c' is bound to 'c'
(ii) values for all objects 'o' in 'c' are bound to 'c.o' using values from 'f', except for 'd'
(ii) all other categories are available through their names, and after a packet is specified, individual data are accessed in the same way as 'c'

In addition, dREL engines need to make use of the following semantic information from the dictionary in which the definition appears:
(i) category keys are used to identify packets in categories other than 'c'
(ii) linked items could be used to resolve key values (not yet agreed with this group)
(iii) item type and dimension is determined using type information for the relevant data name
(iv) correspondence between data name in the data file and category.object in the dREL 

Given this environment, we cannot write a dREL method for checking enumerated values of even a single, specific data name, because no explicit access to domain dictionary contents is exposed in the dREL method - neither through built-in functions, or through syntactic constructs, or through pre-existing bindings (feel free to try). Furthermore, if we wish to write a single dREL method for all enumerated value data names (which is much more economical), then we no longer even have bindings to 'c'.  Therefore, my initial proposal posited enhancing the execution environment to remove these restrictions, with the change flagged by the value of the '_method.purpose' attribute. I think this is a low-impact solution to this conundrum, but I would welcome alternative suggestions.

For example, one attempt you might make would be to locate our validation dREL method in the DDLm attribute dictionary definition for '_enumerated_set.state'. However, according to the above environment this just binds the values of DDLm attributes to those found in a particular definition of interest in the DDLm domain dictionary, and our data file is inaccessible from a dREL point of view and therefore can't be validated.



T +61 (02) 9717 9907
F +61 (02) 9717 3145
M +61 (04) 0249 4148
ddlm-group mailing list

Reply to: [list | sender only]