Discussion List Archives

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [ddlm-group] Handling of missing and null in dREL

Dear John and DDLm group,

Thanks for the detailed comments. See my comments inline below 

On Wed, 22 Jan 2020 at 09:36, Bollinger, John C <John.Bollinger@stjude.org> wrote:

Dear James and DDLm group,

 

A Happy New Year and best wishes to you all.

 

Regarding the proposed specifications for the evaluation of dREL expressions and function calls involving null and missing values,

 

- I am supposing that the dREL behaviors described are meant to be applicable to values that have emerged as missing or null from the computation of unspecified values by applicable methods and the substitution of default values for null where that applies.  Perhaps the context will make that clear, but it’s important to account for the fact that for these particular values there is a potential to confuse the lexical value that is syntactically associated with a given data name in a given data file with the value to which that data name evaluates in a dREL method.


A very salient point. How dREL might interface with a surrounding dictionary and data file is obviously an important topic which I have covered in the draft dREL chapter. I am pondering sharing the whole draft with this group for review.

 

- Regarding "A value of missing implies that all values in the domain of the data name are possible": I suspect that that wording is chosen to preclude the possibility that a missing value could be outside the domain of the data name.  It also conveys a sense that nothing but its domain is known about the value, however, which is not necessarily true for computable quantities where some inputs to the computation are known.  I might instead say something more along the lines of "A value of missing implies that no specific value for the data name is available."


Agreed.

 

- The three-value logic described for expressions involving missing values seems sensible to me.

 

- I am uncertain at the moment what the implication is of the sort function returning missing, but I suggest that it should be specified to sort all missing values to one end (the tail end would be my choice) despite the inconsistency inherent in that.  As an alternative, a function that could be used to filter the missing values from sort()'s and other functions' inputs might satisfactorily address the same needs.


I think defining a function that drops 'missing' from an array would be preferable for consistency.  

 

- I agree with Herbert that it is necessary to have a function or operation that can be used to test whether a value is missing, and the same for testing whether a value is null.


OK, sounds good. 

 

- Semantics: "The behaviour of arithmetic operators and built-in mathematical functions is undefined for null arguments" is manifestly untrue in light of the following sentence, "These functions will therefore return missing if any arguments are null," which constitutes a definition of the behavior.  The bit about being undefined should probably just be dropped.


Fair enough.  I was just trying to explain the logic behind the behaviour, but if you think it does not need explanation I can leave it out. 

 

- Following from my first point, it should be reasonable to suppose that the only null values that dREL has to be concerned about are those that convey a sense of “not applicable”.  In view of that, I would choose altogether different semantics for expressions involving null and mathematical functions with null arguments, much more parallel to the semantics for expressions involving missing: any arithmetic, equality, or relational expression of which at least one operand is null evaluates to null, and any built-in mathematical function with at least one null argument returns null.  Note that this does not preclude an ISNULL operator or function (however spelled) that is considered to be outside all the aforementioned categories.


So I think your suggestion is that the default null behaviour is as you describe, and if this is not correct for a particular data name, then the dREL method can use ISNULL to separately specify how null values are treated. My original suggestion is equivalent to saying that null has to be treated explicitly in all cases to avoid returning 'missing'.  So I think your suggestion is good, because some subset of dataname null behaviour can be mopped up by the default behaviour, which means less code.
 

- Although it probably would follow from a sufficiently clear description of the rules for missing and null values individually, it might nevertheless be worthwhile to explicitly describe how expressions with both a missing and a null operand behave.  (Per my formulation of the results of expressions involving null, such expressions would yield null.)


OK

 

 

Best Regards,

 

John

 

--

John C. Bollinger, Ph.D.

Computing and X-Ray Scientist

Department of Structural Biology

St. Jude Children's Research Hospital

John.Bollinger@StJude.org

(901) 595-3166 [office]

www.stjude.org

 

 

 

From: ddlm-group [mailto:ddlm-group-bounces@iucr.org] On Behalf Of James Hester
Sent: Tuesday, January 14, 2020 12:09 AM
To: ddlm-group <ddlm-group@iucr.org>
Subject: [ddlm-group] Handling of missing and null in dREL

 

Caution: External Sender

 

Dear DDLm group,

 

Happy New Year to all.

 

In preparing the Volume G chapter on dREL I noticed that, while 'missing' and 'null' have been added to dREL as values, there was no discussion about how they would behave when appearing in expressions. I have therefore composed the following two paragraphs, and would appreciate any insight you might have as to problems with the behaviour as stated.  While the behaviour for 'missing' is a standard 3-valued logic, the behaviour of null has essentially been dreamt up by me.  Note that 'missing' refers to a data value of '?' in a CIF file and null refers to '.'.

 

If there are no objections, these paragraphs will eventually find their way into the standard via Volume G.

 

thanks,

James.

===========

 

Computations with missing and null values

A value of missing implies that all values in the domain of the data name are possible. Unless otherwise stated, missing values propagate: where missing is an argument to an arithmetic operator or mathematical function, the result will also be missing. Logical operations follow the rules of three-valued logic: A OR missing is true iff A is true, otherwise the result is missing. Likewise, A AND missing is false iff A is false, otherwise the result is missing. An equality test between two values, where at least one is missing, will result in missing. Comparisons between values, where at least one is missing, result in missing. The latter behaviour means that the built-in 'sort' function will return missing if any of the elements of the sorted list have a value of missing. dREL methods may not explicitly test for missing values; the result of comparing missing with missing is missing.

The behaviour of arithmetic operators and built-in mathematical functions is undefined for null arguments. These functions will therefore return missing if any arguments are null. Logical operations and comparisons with null also behave identically to missing, with the important exception that equality with null can be tested. dREL methods can therefore invoke particular behaviour where null is part of the domain of a data value, and this will often serve to specify the interpretation of null in the context of the defined data name.

 

--

T +61 (02) 9717 9907
F +61 (02) 9717 3145
M +61 (04) 0249 4148



Email Disclaimer: www.stjude.org/emaildisclaimer
Consultation Disclaimer: www.stjude.org/consultationdisclaimer


--
T +61 (02) 9717 9907
F +61 (02) 9717 3145
M +61 (04) 0249 4148
_______________________________________________
ddlm-group mailing list
ddlm-group@iucr.org
http://mailman.iucr.org/cgi-bin/mailman/listinfo/ddlm-group

Reply to: [list | sender only]
International Union of Crystallography

Scientific Union Member of the International Science Council (admitted 1947). Member of CODATA, the ISC Committee on Data. Partner with UNESCO, the United Nations Educational, Scientific and Cultural Organization in the International Year of Crystallography 2014.

International Science Council Scientific Freedom Policy

The IUCr observes the basic policy of non-discrimination and affirms the right and freedom of scientists to associate in international scientific activity without regard to such factors as ethnic origin, religion, citizenship, language, political stance, gender, sex or age, in accordance with the Statutes of the International Council for Science.