Discussion List Archives

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [ddlm-group] Semantics of whitespace-delimited values

Dear All,

On Tuesday, July 07, 2015 9:18 AM, James Hester wrote:
> [...]
>What do others think?  If there is a body of CIF1 applications out there that have been designed to raise errors when values expected to be numeric are enclosed by delimiters, this proposal would represent a further annoying change from CIF1, and it would be good to have some idea of how many such applications there are.  I speculate that many applications ignore the delimiter status, for reasons both of laziness, the authority of the dictionary definitions, and the philosophy of writing liberal parsers.

I am certain that there are CIF parsers that distinguish numeric values from other values based on the CIF 1.1 convention for numbers, as I have written at least two such parsers myself.  The local applications that use these will indeed raise errors when values expected to be numeric are enclosed by delimiters.  I do not know whether such parsers are common, however, or whether the most widely-used parsers do this.  In practice, it has never presented a problem because CIF 1.1 writers are very consistent about presenting numbers in the conventional form (more so even than they are about presenting ? in conventional form).

I am especially interested in knowing whether there are validating parsers that make this distinction.  In particular, do any parsers interpret a DDL1 definition having a _type attribute of 'numb' to require values to be presented in whitespace-delimited form?  Do any parsers interpret a DDL2 type having _item_type_list.primitive_code of 'numb' as requiring values of that type to be presented in whitespace-delimited form?  These would constitute dictionary-level use of a distinction between whitespace-delimited values and other values, thereby raising the question of whether they are special cases, or whether dictionaries may more generally require that certain data values be presented unquoted (or whether it is erroneous for parsers to make such a distinction at all).

At minimum, I don't think we can avoid special cases for the . and ? values.  Their use and interpretation is too deeply ensconced in CIF software and practice to consider any variance to be acceptable, so they are a de facto part of the CIF 1.1 syntax, despite ITVG describing ascribing their special interpretations to mere convention.  The same might be true of CIF 1.1's conventional numeric format, which is the reason for my questions above.  I'm inclined be more general, however, by saying broadly that in both CIF 1.1 and CIF 2.0, whether a data value is presented in whitespace-delimited form is a property that can influence its interpretation.  That avoids a need for explicit special cases, or for determining post hoc (for CIF 1.1) exactly what aspects of these conventions have the force of requirement.



_______________________________________________ddlm-group mailing listddlm-group@iucr.orghttp://mailman.iucr.org/cgi-bin/mailman/listinfo/ddlm-group

Reply to: [list | sender only]
International Union of Crystallography

Scientific Union Member of the International Science Council (admitted 1947). Member of CODATA, the ISC Committee on Data. Partner with UNESCO, the United Nations Educational, Scientific and Cultural Organization in the International Year of Crystallography 2014.

International Science Council Scientific Freedom Policy

The IUCr observes the basic policy of non-discrimination and affirms the right and freedom of scientists to associate in international scientific activity without regard to such factors as ethnic origin, religion, citizenship, language, political stance, gender, sex or age, in accordance with the Statutes of the International Council for Science.