Discussion List Archives

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [ddlm-group] CIF2 semantics


On Tuesday, August 09, 2011 5:08 AM, Herbert J. Bernstein wrote:

>   The heart of the difference lies in "Therefore, datavalues that can
>be interpreted as numbers (CIF1.1 'numb' type) must retain knowledge
>of the source string so that the DDLs and dictionaries are free to
>interpret them as 'char' type (see Vol G 2.2.7.4.7.1(17), reproduced
>below)"  I do not see any requirement in Volume G that we must retain
>the original source string for _any_ data value, just that we must
>faithfully preserve the information in that data value.  In the case
>of a number, that is the numeric value, not the particular choice of
>character string used to represent that numeric value, so we may
>freely change from 123 to 1.23e2 and back.


James's is a practical argument, predicated on the idea of a hypothetical CIF parser, operating without dictionary knowledge but supporting dictionary-based applications.  The hypothetical processor's output is a complex object conforming to the abstract CIF data model that James is attempting to define.  He postulates that it is both necessary and sufficient for such an abstract data model to retain the original character sequence of every data value, including those having numeric lexical form.

James's argument is supported by Vol G 2.2.5.2, wherein it is specified that given

_unknown_data_name 1

and a dictionary definition assigning type 'char' to that name, "the value should be stored as the literal character 1."  As a practical matter, then, a CIF 1.1 processor must retain the original character sequence at least until it is known whether there is a definition.  Therefore James's abstract data model, output by a processor ignorant of any dictionary yet supporting dictionary-based applications, indeed must retain the original character sequence.


>   Saying, as 2.2.7.4.7.1.(17), that "it may be assumed that a
>character string interpretable as a number should be taken to
>represent an item of type 'numb'" does _not_ say that we need
>to retain the original source string


No, but saying, as 2.2.7.4.7.1.(17)'s next sentence does, "However, an explicit dictionary declaration of type will override such an assumption," _does_ require the original source string to be retained in the event that a dictionary definition declares type 'char' for the value.  James's hypothetical processor must retain the original source string because it doesn't yet know whether there is such a definition.

[...]


>   If we wish to preserve a particular string, we should quote it, but
>then it is type char, not type numb.


That is unquestionably the most pragmatic approach for writing CIFs, but what does or should the CIF specifications require when that approach is not taken?  There seems more of a sore spot here than I appreciated before this discussion, but I now believe that CIF 1.1's approach for determining data types is flawed, and furthermore that it is inconsistently implemented in practice.  It is unsatisfactory that CIF 1.1 does not require dictionaries to be used, yet mandates different, incompatible, data typing analysis when they are used than when they are not.

I think CIF2 can and should adopt a different position, wherein there are three base primitive data types for values (char, numb, and null), and values are assigned to one of these base types upon parsing, based on their lexical form.  If a dictionary definition requires a different base type than the one a value was expressed in, then the value must be coerced to the needed type after parsing, according to a standard set of platform-independent coercion rules.  That would achieve a separation between the (optional) dictionary layer and the underlying data model that CIF 1.1 lacks.  Without something like that one needs the kind of workaround that James describes, at least in principle.


John

--
John C. Bollinger, Ph.D.
Department of Structural Biology
St. Jude Children's Research Hospital


Email Disclaimer:  www.stjude.org/emaildisclaimer

_______________________________________________
ddlm-group mailing list
ddlm-group@iucr.org
http://scripts.iucr.org/mailman/listinfo/ddlm-group

Reply to: [list | sender only]
International Union of Crystallography

Scientific Union Member of the International Science Council (admitted 1947). Member of CODATA, the ISC Committee on Data. Partner with UNESCO, the United Nations Educational, Scientific and Cultural Organization in the International Year of Crystallography 2014.

International Science Council Scientific Freedom Policy

The IUCr observes the basic policy of non-discrimination and affirms the right and freedom of scientists to associate in international scientific activity without regard to such factors as ethnic origin, religion, citizenship, language, political stance, gender, sex or age, in accordance with the Statutes of the International Council for Science.