Discussion List Archives

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [ddlm-group] Third and final proposal to enhance dREL

Dear DDLm group,

Please find my comments inline below.

On Monday, September 17, 2018 7:39 PM, James Hester wrote:
>
> On Tue, 18 Sep 2018 at 00:49, Bollinger, John C <John.Bollinger@stjude.org> wrote:
>> 1. I’m not sure I follow the intended purpose of the “enhance meaning of 'Validation' methods” item.  As I understand it, the proposal is to expose all the details of each item’s definition to dREL for the use of validation methods.  But the example of checking an item’s value against the allowed values of its enumerated type is something that I would expect a DDLm-based validator to do at its own initiative, without need of a dREL method being defined in the dictionary.  More generally, I consider it the role and responsibility of a DDLm-based validator to validate all the per-item and inter-item characteristics that the relevant dictionary defines via DDLm semantics.
>
> If a dictionary is viewed as a data file that provides (ontological) data conforming to the DDLm attributes, then a Validation dREL method applied to the dictionary fulfills the same function as a dREL method for validating that a data file contains data that are consistent with the domain dictionaries. Following your argument, dREL is not necessary in domain dictionaries either, because calculations are more properly the domain of 'dictionary-aware software'.


I am comfortable with considering domain dictionaries to be data files conforming to the ontology-domain formalisms defined by DDLm, but that does not imply that arguments about dictionaries and data files can be cleanly shifted between the ontological level and the domain level, or vice versa.  DDLm is distinguished from other dictionaries and data files because it provides the formal definition of its own semantics instead of relying for that on some other dictionary.  As a result, although we can use DDLm semantics to understand domain dictionaries, we cannot use them to understand DDLm itself, because that would be tautological.

And that bears directly on my point.  In order for DDLm and DDLm-based dictionaries to be useful at all, we need an entry point into ontological space, some prior and external comprehension of DDLm semantics.  That we can use such a comprehension to test the completeness and consistency of DDLm's self-expression is a bit of a sideshow: we do not need to do that because we have taken a comprehension of DDLm as granted.  Domain dictionaries do not need to use dREL to express specific cases of the semantics that follow, according to DDLm, from their definitions, because such methods can be understood in the first place only in a context in which they are redundant with the required external comprehension of DDLm.

I have argued, furthermore, that such redundant dREL methods are not only unneeded but undesirable.  This is a more subjective consideration, and open to debate.  From my perspective, the inherent redundancy is an invitation to introduce inconsistencies with domain dictionaries, and any significant exercise of such redundancies furthermore carries unwanted costs in storage space and possibly processing resources.

I do still remain open to the possibility that dREL access to DDLm attributes of items' definitions could have some utility other than redundantly expressing DDLm semantics, but I have not yet seen or conceived any examples.


>> 2. The proposed new functions seem also to be aimed at supporting validation of DDLm-based semantics via methods expressed in data dictionaries.  Here too, I am inclined to think that the method behaviors that these are intended to support are not appropriate for expression in data dictionaries.  It ought not to be necessary, and I’m not presently seeing how it would be advantageous.
>
> The use case I'm thinking of is that these 'validation' dREL methods would appear in dictionaries full of validation data names. A validator would then evaluate each of these data names in order to check that a domain dictionary is correctly written, in the same way as CheckCIF runs through a series of checks on a data file. By expressing the conditions for validity in dREL, the specification is not bound to a particular concrete programming language or set of CIF access libraries.


I'm hearing that my understanding of the purpose of the proposed methods is accurate.  I'm not persuaded that this is a good or useful purpose.

I return to my earlier point.  In order to understand a dictionary expressed in DDLm, you need to already understand DDLm semantics.  In particular, any implementations of the proposed functions would necessarily be based on such an understanding.  Since that has to already be in there somewhere, I do not see the appeal of using it to re-express itself in dREL, and especially not in the form of many specific cases instead of a small number of general one.

Expressing DDLm-derived constraints that way is not freeing.  That the dREL is not bound to a concrete programming language or CIF library is moot, because the dREL is redundant in the first place.  You need some tool that *is* bound to programming language and libraries to process it, and such a tool needs to be able to perform the same validations without the dREL.


>> 3. Overall, I have previously understood “Validation” methods as being aimed at supporting item cross validations that cannot be expressed via DDLm attributes.  It is unclear to me why or in what circumstances it would be necessary or appropriate for such validations to depend on DDLm attributes. As far as I can see, the semantics of DDLm ought to be handled at a different level -- dictionary authors should not be responsible for providing for them.  In a strategic sense, not only do I not think we _need_ to provide for externalizing validation of DDLm semantics, I don’t think we _want_ to do that.  However, it is possible that there are good use cases that I have not considered, so I am prepared to be persuaded.
>
> I think your understanding of the current intention of 'Validation' methods is correct, because the single example of their use in current dictionaries is to check that cell parameters match the crystal system. However, as I wrote in the proposal, the same result can be achieved by defining a separate data name (e.g. '_valid.crystal_system') and using a normal 'Evaluation' method, so that use of 'Validation' appears a bit pointless.


Yes, I followed that observation in your proposal, and I agree that cross validations could be defined as you describe.  I do not take that as rendering 'Validation' methods pointless, however.  They serve at least two related purposes: to enable cross validations to be defined _without_ introducing synthetic data names, and to bind such validations directly to the items being validated.

If you want to argue that one or both of those is undesirable then let's do have that discussion.  Be aware that one of the points I already see myself raising concerns the propriety of defining items that are inappropriate for explicit use in data files.


> Note that I am not proposing that domain dictionary authors would ever need to use these 'Validation' methods. I am instead proposing that these methods would have a niche use, e.g. in a dictionary listing a series of datanames whose dREL methods validate the use of DDLm attributes. This niche use is similar to the way in which quite a few DDLm attributes and attribute values are only ever used in the DDLm attribute definition dictionary itself.  If the word 'Validation' is not appropriate, we can choose a word with less baggage, such as 'Technical'.  Whatever the name, having a list of checks that can be run over domain dictionaries in a form that allows use in any environment supported by dREL would be useful. My experiments with the Lark generator suggest to me that generating code from dREL is a lot easier than one might think.


I don't think I'm catching your vision there.  So riddle me this: what would prevent all these 'Technical' methods themselves being machine generated, whether in dREL or in some other form, from the dictionary to be validated?  (Or whose associated data files are to be validated?) And if such an external representation can be so generated, then why does it need to be externalized at all?

Also, if we're talking about validating domain dictionaries, not data files, then wouldn't the appropriate place for any dREL be the dictionaries' dictionary, i.e. DDLm itself?  And would not dREL appearing there _naturally_ have access to all the details you're proposing to expose via new functions?


> Another driver for this is the 'CheckCIF for raw data' project. I would prefer that any checks for raw data are written in dREL, to maintain independence from a particular set of libraries or language.  I would also envisage eventually rewriting CheckCIF checks in dREL to put it on a more robust footing. However, these CheckCIF-type projects only really need the proposed 'Known' built-in function, so you may wish to comment on that separately.


Those checks that are inherent in the DDLm semantics of items' definitions are *already* expressed in a form that is independent of libraries or (programming) language: DDLm!  If there are any desired checks that are not inherent in DDLm semantics but nevertheless are based on attributes of items' definitions then I would be truly interested to hear about them.  As for other checks, I haven't yet recognized a reason to think that dREL is not sufficient as-is.

Now I do see that it might be desirable to be able to write dREL methods associated with particular items but not residing (directly) in those items' dictionaries.  I also see that it might be useful to be able to associate identifiers with dREL methods, especially if they are physically separated from the associated item(s).  I don't think I like defining synthetic data items for this purpose, but we should be able to come up with an alternative if this is something worth pursuing.


Regards,

John

--
John C. Bollinger, Ph.D.
Computing and X-Ray Scientist
Department of Structural Biology
St. Jude Children's Research Hospital
John.Bollinger@StJude.org
(901) 595-3166 [office]
www.stjude.org



________________________________

Email Disclaimer: www.stjude.org/emaildisclaimer
Consultation Disclaimer: www.stjude.org/consultationdisclaimer
_______________________________________________ddlm-group mailing listddlm-group@iucr.orghttp://mailman.iucr.org/cgi-bin/mailman/listinfo/ddlm-group

Reply to: [list | sender only]
International Union of Crystallography

Scientific Union Member of the International Science Council (admitted 1947). Member of CODATA, the ISC Committee on Data. Partner with UNESCO, the United Nations Educational, Scientific and Cultural Organization in the International Year of Crystallography 2014.

International Science Council Scientific Freedom Policy

The IUCr observes the basic policy of non-discrimination and affirms the right and freedom of scientists to associate in international scientific activity without regard to such factors as ethnic origin, religion, citizenship, language, political stance, gender, sex or age, in accordance with the Statutes of the International Council for Science.