Re: [ddlm-group] Handling of missing and null in dREL
- To: "Bollinger, John C" <John.Bollinger@stjude.org>
- Subject: Re: [ddlm-group] Handling of missing and null in dREL
- From: James Hester <firstname.lastname@example.org>
- Date: Wed, 5 Feb 2020 12:05:16 +1100
- Cc: Group finalising DDLm and associated dictionaries <email@example.com>
- In-Reply-To: <DM6PR04MB44912D74B36430AFFED9BCEEE00D0@DM6PR04MB4491.namprd04.prod.outlook.com>
- References: <CAM+dB2dFjxUnYRwVUH1gN6SKoSg8CQMRvfw-kHJmTk2axJ4+aA@mail.gmail.com><DM6PR04MB44912D74B36430AFFED9BCEEE00D0@DM6PR04MB4491.namprd04.prod.outlook.com>
Dear James and DDLm group,
A Happy New Year and best wishes to you all.
Regarding the proposed specifications for the evaluation of dREL expressions and function calls involving null and missing values,
- I am supposing that the dREL behaviors described are meant to be applicable to values that have emerged as missing or null from the computation of unspecified values by applicable methods and the substitution of default values for null where that applies. Perhaps the context will make that clear, but it’s important to account for the fact that for these particular values there is a potential to confuse the lexical value that is syntactically associated with a given data name in a given data file with the value to which that data name evaluates in a dREL method.
- Regarding "A value of missing implies that all values in the domain of the data name are possible": I suspect that that wording is chosen to preclude the possibility that a missing value could be outside the domain of the data name. It also conveys a sense that nothing but its domain is known about the value, however, which is not necessarily true for computable quantities where some inputs to the computation are known. I might instead say something more along the lines of "A value of missing implies that no specific value for the data name is available."
- The three-value logic described for expressions involving missing values seems sensible to me.
- I am uncertain at the moment what the implication is of the sort function returning missing, but I suggest that it should be specified to sort all missing values to one end (the tail end would be my choice) despite the inconsistency inherent in that. As an alternative, a function that could be used to filter the missing values from sort()'s and other functions' inputs might satisfactorily address the same needs.
- I agree with Herbert that it is necessary to have a function or operation that can be used to test whether a value is missing, and the same for testing whether a value is null.
- Semantics: "The behaviour of arithmetic operators and built-in mathematical functions is undefined for null arguments" is manifestly untrue in light of the following sentence, "These functions will therefore return missing if any arguments are null," which constitutes a definition of the behavior. The bit about being undefined should probably just be dropped.
- Following from my first point, it should be reasonable to suppose that the only null values that dREL has to be concerned about are those that convey a sense of “not applicable”. In view of that, I would choose altogether different semantics for expressions involving null and mathematical functions with null arguments, much more parallel to the semantics for expressions involving missing: any arithmetic, equality, or relational expression of which at least one operand is null evaluates to null, and any built-in mathematical function with at least one null argument returns null. Note that this does not preclude an ISNULL operator or function (however spelled) that is considered to be outside all the aforementioned categories.
- Although it probably would follow from a sufficiently clear description of the rules for missing and null values individually, it might nevertheless be worthwhile to explicitly describe how expressions with both a missing and a null operand behave. (Per my formulation of the results of expressions involving null, such expressions would yield null.)
John C. Bollinger, Ph.D.
Computing and X-Ray Scientist
Department of Structural Biology
St. Jude Children's Research Hospital
(901) 595-3166 [office]
Caution: External Sender
Dear DDLm group,
Happy New Year to all.
In preparing the Volume G chapter on dREL I noticed that, while 'missing' and 'null' have been added to dREL as values, there was no discussion about how they would behave when appearing in expressions. I have therefore composed the following two paragraphs, and would appreciate any insight you might have as to problems with the behaviour as stated. While the behaviour for 'missing' is a standard 3-valued logic, the behaviour of null has essentially been dreamt up by me. Note that 'missing' refers to a data value of '?' in a CIF file and null refers to '.'.
If there are no objections, these paragraphs will eventually find their way into the standard via Volume G.
Computations with missing and null values
A value of missing implies that all values in the domain of the data name are possible. Unless otherwise stated, missing values propagate: where missing is an argument to an arithmetic operator or mathematical function, the result will also be missing. Logical operations follow the rules of three-valued logic: A OR missing is true iff A is true, otherwise the result is missing. Likewise, A AND missing is false iff A is false, otherwise the result is missing. An equality test between two values, where at least one is missing, will result in missing. Comparisons between values, where at least one is missing, result in missing. The latter behaviour means that the built-in 'sort' function will return missing if any of the elements of the sorted list have a value of missing. dREL methods may not explicitly test for missing values; the result of comparing missing with missing is missing.
The behaviour of arithmetic operators and built-in mathematical functions is undefined for null arguments. These functions will therefore return missing if any arguments are null. Logical operations and comparisons with null also behave identically to missing, with the important exception that equality with null can be tested. dREL methods can therefore invoke particular behaviour where null is part of the domain of a data value, and this will often serve to specify the interpretation of null in the context of the defined data name.
T +61 (02) 9717 9907
F +61 (02) 9717 3145
M +61 (04) 0249 4148
Email Disclaimer: www.stjude.org/emaildisclaimer
Consultation Disclaimer: www.stjude.org/consultationdisclaimer
F +61 (02) 9717 3145
M +61 (04) 0249 4148
_______________________________________________ ddlm-group mailing list firstname.lastname@example.org http://mailman.iucr.org/cgi-bin/mailman/listinfo/ddlm-group
Reply to: [list | sender only]
- [ddlm-group] Handling of missing and null in dREL (James Hester)
- Re: [ddlm-group] Handling of missing and null in dREL (Bollinger, John C)
- Prev by Date: Re: [ddlm-group] Handling of missing and null in dREL
- Next by Date: Re: [ddlm-group] Improving the enumeration_range definition.
- Prev by thread: Re: [ddlm-group] Handling of missing and null in dREL
- Next by thread: [ddlm-group] Removing separate "Count" and "Index" types from_type.contents in DDLm