Discussion List Archives

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [ddlm-group] Refocusing discussion on dREL use for validation

  • To: "james.r.hester@gmail.com" <james.r.hester@gmail.com>, "Group finalisingDDLm and associated dictionaries" <ddlm-group@iucr.org>
  • Subject: Re: [ddlm-group] Refocusing discussion on dREL use for validation
  • From: "Bollinger, John C" <John.Bollinger@STJUDE.ORG>
  • Date: Thu, 18 Oct 2018 14:39:50 +0000
  • Accept-Language: en-US
  • authentication-results: spf=none (sender IP is )smtp.mailfrom=John.Bollinger@STJUDE.ORG;
  • DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=SJCRH.onmicrosoft.com; s=selector1-stjude-org;h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck;bh=C871o0CabPC/fROKEoG/XkwDVMUnCmeCqfO/Q5/8uyU=;b=n/ipDxa8zam4bCt8UKtMEE9nrTRuHoY3qhHb80L0R8e79t2/Ia+vJImFI+MpKbjQCQxc99RjXk92PHQzFbcZwtG31zwTTSFe/eY/0g/mMVafFuLucomOWo6wacIYnHaCYGqM1T+JOR2fw7m6/R1ImSdwnzNGoQ/kXZPQwANVlts=
  • In-Reply-To: <CAM+dB2d0LJ6ibgCNOr1u6j1jsJonvbD9SscD9u+O6ynri0qS7w@mail.gmail.com>
  • References: <CAM+dB2d0LJ6ibgCNOr1u6j1jsJonvbD9SscD9u+O6ynri0qS7w@mail.gmail.com>
  • spamdiagnosticmetadata: NSPM
  • spamdiagnosticoutput: 1:99
Dear DDLm group,

Please see my comments in-line below.

On Thursday, October 18, 2018 12:17 AM, James Hester wrote:

> This email is intended as a bit of a reset on the discussion regarding my 3rd round of proposals for enhancing dREL. John has pointed out a certain lack of focus and clear vision in the previous emails, so I propose here to focus on a particular task: that of validating a data file's contents relative to information in a domain DDLm dictionary.  I restrict the meaning of 'validation' here to checking conformance to DDLm attributes, and explicitly exclude checking the sort of mathematical relationships that are currently covered by dREL methods, such as cell volume matching cell parameters.  Example of the type of validation I wish to discuss are therefore checking that a value is drawn from the acceptable set of enumerated values, or that values taken by a child data name are drawn from the set of parent data name values, or that a value falls within a specified range.
>
> First: do we agree that such validation is useful? I think yes, as CheckCIF does check that certain data names have allowed values, but if not, then the rest of this project is pointless.


Yes.  My previous pushback was primarily about where the responsibility for such validation should reside, and it reflects a bias toward minimizing changes.  It is not about the propriety or usefulness of the validations themselves, which I fully accept.


> I believe that expressing these checks in a programming-language-agnostic way is important, as this would avoid us being pinned to particular environments and systems over time.  Furthermore, I think that dREL would be a good choice, as it is tightly matched to the dictionary environment and tools that transform it to <insert your favourite language+CIF environment here> can be re-used.


As I'm sure is clear by now, I do not attribute much importance to expressing such checks in dREL.  In particular, I do not consider that objective a sufficient justification for a broad suite of additions and changes.  I do agree that dREL is well matched to the dictionary environment, so it seems a reasonable choice from that perspective.

Before moving on, however, I'd like to point out that if changing dREL is on the table then there is a broad spectrum of possible approaches, including such things as requiring dREL to perform type checking automatically or adding a validateType() function.  These are not the sort of thing presented in the previous proposal, but they should not be dismissed out of hand, especially if we look at the question from the point of view of maintaining the dREL language in general, as opposed to specifically enabling it to serve the purpose we're discussing.  The more features we add, the fewer implementations we can expect, and that could easily mean that in-principle tool independence is actually single-tool dependence in practice.


> So, given that we wish to use dREL, can we make it work for our simple task of checking enumerated values?  dREL as currently conceived executes in a well-defined environment, which can be described as follows, if a dREL definition is located in the definition for object 'd' in category 'c', with supplied data block 'f':
>
> The following immutable bindings have been made:
> (i) a single packet of category 'c' is bound to 'c'
> (ii) values for all objects 'o' in 'c' are bound to 'c.o' using values from 'f', except for 'd'
> (ii) all other categories are available through their names, and after a packet is specified, individual data are accessed in the same way as 'c'
>
> In addition, dREL engines need to make use of the following semantic information from the dictionary in which the definition appears:
> (i) category keys are used to identify packets in categories other than 'c'
> (ii) linked items could be used to resolve key values (not yet agreed with this group)
> (iii) item type and dimension is determined using type information for the relevant data name
> (iv) correspondence between data name in the data file and category.object in the dREL
>
> Given this environment, we cannot write a dREL method for checking enumerated values of even a single, specific data name, because no explicit access to domain dictionary contents is exposed in the dREL method - neither through built-in functions, or through syntactic constructs, or through pre-existing bindings (feel free to try). Furthermore, if we wish to write a single dREL method for all enumerated value data names (which is much more economical), then we no longer even have bindings to 'c'.


Thank you.  That is exactly how I would have liked to start the discussion, and I am pleased to be there now.


> Therefore, my initial proposal posited enhancing the execution environment to remove these restrictions, with the change flagged by the value of the '_method.purpose' attribute. I think this is a low-impact solution to this conundrum, but I would welcome alternative suggestions.


There is room to extend dREL, but do we need to do that to solve the particular narrow problem we are considering?  I think not.  The challenge revolves around the fact that we seem to want methods that access data from multiple ontological levels, and a large component of the previously proposed solution can be characterized as adding introspective capabilities to dREL to support that.  But that's not the only way to get the wanted data.

For example, one alternative would be for validation methods in the domain dictionary to be generated, with appropriate ontology-level data inserted literally, by a method residing in the DDLm dictionary.  That would take the form of an evaluation method for attribute _method.expression.  It would have multiple advantages, among them that
 (i) the information about how to encode DDLm requirements in methods would be presented in the DDLm dictionary itself.
 (ii) there would be no need for any explicit validation methods (for this purpose) in domain dictionaries or external dictionaries.  They would be generated at need.
 (iii) method implementations would automatically track dictionary changes.
 (iv) dREL can support this already, I think, or with minimal changes at most.  I am optimistic enough to think it plausible that some existing dREL implementations would support it out of the box.

Another alternative would be even simpler: to write a separate code generator for the wanted dREL methods, and to incorporate all the resulting methods into the domain dictionary.  One would simply re-run the validation method generator each time the dictionary is updated, as a late step in the process of issuing a new release.  That would somewhat enlarge domain dictionaries, but probably not all that much if the only validations we generate are those we are specifically discussing at the moment -- validating values of items having enumerated types.


Regards,

John

--
John C. Bollinger, Ph.D.
Computing and X-Ray Scientist
Department of Structural Biology
St. Jude Children's Research Hospital
John.Bollinger@StJude.org
(901) 595-3166 [office]
www.stjude.org




________________________________

Email Disclaimer: www.stjude.org/emaildisclaimer
Consultation Disclaimer: www.stjude.org/consultationdisclaimer
_______________________________________________ddlm-group mailing listddlm-group@iucr.orghttp://mailman.iucr.org/cgi-bin/mailman/listinfo/ddlm-group

Reply to: [list | sender only]
International Union of Crystallography

Scientific Union Member of the International Science Council (admitted 1947). Member of CODATA, the ISC Committee on Data. Partner with UNESCO, the United Nations Educational, Scientific and Cultural Organization in the International Year of Crystallography 2014.

International Science Council Scientific Freedom Policy

The IUCr observes the basic policy of non-discrimination and affirms the right and freedom of scientists to associate in international scientific activity without regard to such factors as ethnic origin, religion, citizenship, language, political stance, gender, sex or age, in accordance with the Statutes of the International Council for Science.