[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
--
T +61 (02) 9717 9907
F +61 (02) 9717 3145
M +61 (04) 0249 4148
Reply to: [list | sender only]
Re: [ddlm-group] CIF2 semantics
- To: Group finalising DDLm and associated dictionaries <ddlm-group@iucr.org>
- Subject: Re: [ddlm-group] CIF2 semantics
- From: James Hester <jamesrhester@gmail.com>
- Date: Thu, 28 Jul 2011 13:09:27 +1000
- In-Reply-To: <8F77913624F7524AACD2A92EAF3BFA543C16565B26@11.stjude.org>
- References: <CAM+dB2eL5jrEFBcmGpDe6RTvpv4qfmxXa722XXzaS_zgCjsxKw@mail.gmail.com><8F77913624F7524AACD2A92EAF3BFA543C16565B24@11.stjude.org><a06240803ca54b9a20900@149.72.36.242><CAM+dB2eT83aTPYc_Dg2aQAsp9VoWTpBA79RPLne61LFWfcFEZQ@mail.gmail.com><8F77913624F7524AACD2A92EAF3BFA543C16565B26@11.stjude.org>
On Thu, Jul 28, 2011 at 12:00 AM, Bollinger, John C <John.Bollinger@stjude.org> wrote:
I'm not sure what the difference between CIF1.1 and CIF1.0 behaviour is - I thought that 'numb' type had always been the same?
[edit]
I'm not convinced that existing software ever gets into trouble with the current 'numb' definition. Any programs that really do require numbers (e.g. structure display programs) are not affected by the need to keep the string representation. The only programs that will have trouble are those which input and manipulate strings that look like numbers, if the parsing part of the program aggressively converts to numbers before the part that expects strings can stop it. Strings in datafiles are mostly meant for human consumption, so I would have thought that such bugs would long ago have been discovered and dealt with. While the explicit data model I've described may sound overly complex, I suspect that it is actually what pragmatic programmers have long ago implemented (e.g. CIFtbx).
So I don't think specifying a coercion format is a solution, because I don't think there is a problem, and I expect it to be highly unlikely that a set of rules could be found such that a number that looks like a float would reliably be coerced back to its original form, with appropriate number of significant digits, identically on all platforms.
I don't think the current specification is broken, rather it is a somewhat subtle solution which allows:
(1) Non-delimited number-like strings to still be available as strings, increasing the robustness of the system
(2) Disallowing delimited number-like strings to be interpreted as numbers, giving a further avenue for safety and matching human readability expectations
James.
(James wrote):Agreed.
>Note the following consequences of the CIF1 approach, which I hope we
>all accept:
>(1) A delimited numerical value is invalid if the dictionary specifies
>that 'numb' is expected
I'm not sure I follow. Certainly a delimited numerical value is valid for a given data item if it matches the dictionary definition, which could be formulated as you describe, but how do you go from there to the value formally being a number as far as CIF is concerned?
>(2) A delimited numerical value is a valid number if the dictionary/DDL
>allows numbers to be derived from character strings (e.g. by giving a
>POSIX regex in the DDL2 _item_list_type.construct and a primitive code
>of 'char')
My apologies for the lack of clarity. I simply meant that a dictionary or DDL could define a type, e.g. "float" which it defines to act like a number (or a butterfly - that's up to the dictionary). This is formally different to the CIF 'numb' type. In any case consequence (2) above is probably a red herring as no dictionaries actually do that at present.
I agree that this is a consequence of the CIF 1.1 specifications. I'm not certain that it's desirable, however, given the substantial body of software that ignores that result in favor of CIF 1.0 behavior. I think it might be more consistent for the specifications to actually require numeric strings to be quoted to be interpreted as 'char' values (or at least to be faithfully preserved), which now is only a practical consideration.
>(3) Dictionary-blind pretty-printers as hypothesised by John B below may
>make mistakes in their pretty-printing if they assume 'numb' wrongly.
>Likewise, other dictionary-blind software cannot rely on apparent 'numb'
>values really being 'numb'. Successful behaviour after assuming 'numb'
>type is likely, but not guaranteed.
I'm not sure what the difference between CIF1.1 and CIF1.0 behaviour is - I thought that 'numb' type had always been the same?
[edit]
Reserving judgment on the advantages of 'numb', I agree that the rest of the analysis is accurate for CIF 1.1. I could probably accept the same semantics for CIF2, but I think there may be room to do better. Consider this:
>Is everyone happy with my analysis above? Are we OK with accepting the same semantics for CIF2?
a) Undelimited values that lex as numbers according to the CIF specifications are in fact handled as numbers (possibly with associated s.u.s) in the abstract data model.
b) CIF2 specifies a standard number format for use in coercing numbers to strings in the event that a dictionary specifies a 'char' subtype for such a value, limits its values via a regex, or otherwise needs a string representation.
It is implicit in that approach that the literal character sequence of unquoted numeric values will not be faithfully preserved in some cases, but that is the practical reality today, CIF specifications notwithstanding. The main advantage to be gained is better agreement with the behavior of existing software. Although it is likely that the formatting existing software performs would not agree exactly with whatever format might be chosen for (b), I anticipate that for many programs and libraries it would be a lot easier to change formats than to change the underlying data model.
That would also make for more flexible use of regexes in validating 'numb' values, in that the regexes used in such contexts could safely make some assumptions about the form of the strings matched against them.
I'm not convinced that existing software ever gets into trouble with the current 'numb' definition. Any programs that really do require numbers (e.g. structure display programs) are not affected by the need to keep the string representation. The only programs that will have trouble are those which input and manipulate strings that look like numbers, if the parsing part of the program aggressively converts to numbers before the part that expects strings can stop it. Strings in datafiles are mostly meant for human consumption, so I would have thought that such bugs would long ago have been discovered and dealt with. While the explicit data model I've described may sound overly complex, I suspect that it is actually what pragmatic programmers have long ago implemented (e.g. CIFtbx).
So I don't think specifying a coercion format is a solution, because I don't think there is a problem, and I expect it to be highly unlikely that a set of rules could be found such that a number that looks like a float would reliably be coerced back to its original form, with appropriate number of significant digits, identically on all platforms.
I don't think the current specification is broken, rather it is a somewhat subtle solution which allows:
(1) Non-delimited number-like strings to still be available as strings, increasing the robustness of the system
(2) Disallowing delimited number-like strings to be interpreted as numbers, giving a further avenue for safety and matching human readability expectations
James.
John
--
John C. Bollinger, Ph.D.
Department of Structural Biology
St. Jude Children's Research Hospital
Email Disclaimer: www.stjude.org/emaildisclaimer
_______________________________________________
ddlm-group mailing list
ddlm-group@iucr.org
http://scripts.iucr.org/mailman/listinfo/ddlm-group
--
T +61 (02) 9717 9907
F +61 (02) 9717 3145
M +61 (04) 0249 4148
_______________________________________________ ddlm-group mailing list ddlm-group@iucr.org http://scripts.iucr.org/mailman/listinfo/ddlm-group
Reply to: [list | sender only]
- Follow-Ups:
- Re: [ddlm-group] CIF2 semantics (Bollinger, John C)
- References:
- [ddlm-group] CIF2 semantics (James Hester)
- Re: [ddlm-group] CIF2 semantics (Bollinger, John C)
- Re: [ddlm-group] CIF2 semantics (James Hester)
- Re: [ddlm-group] CIF2 semantics (Bollinger, John C)
- Prev by Date: Re: [ddlm-group] CIF2 semantics
- Next by Date: Re: [ddlm-group] The Grazulis eliding proposal: how to incorporateinto CIF?. .. .
- Prev by thread: Re: [ddlm-group] CIF2 semantics
- Next by thread: Re: [ddlm-group] CIF2 semantics
- Index(es):