Discussion List Archives

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [ddlm-group] How to specify syntax of a number in CIF2

Hi James,


I support your enumerated objectives, but your proposed changes seem at odds with objective (2), with widespread DDL1 practice, and possibly even with standard DDL2 practice.  I’m up for clarifying and for making recommendations, but not for making changes that invalidate significant bodies of current software or practice.  Moreover, the proposed additions don’t address all the issues.


Without getting into specific language, this is what I think I would like to see:


() a clarification for ITVG (10) explaining that "values that are to be interpreted as numbers" refers specifically to values interpreted, for whatever reason, according to the data type 'numb' described in section


() a clarification for ITVG that explicitly narrows its scope to items not defined in a dictionary.


() a clarification that the CIF 1.1 <Numeric> production and its related component productions provide the details of the conventional data type 'numb', as opposed to being the only allowed form for numeric data values, regardless of actual data type.


() a clarification that a dictionary *can*, without restriction, ascribe any significance to whether a value is presented quoted, paired with a recommendation that they *not* do so, and perhaps a description of the limited ways in which the current DDLs and dictionaries do do so.


() a secondary recommendation that dictionaries that do ascribe significance to whether a value is presented quoted do so as broadly and uniformly as possible.  Examples of broad and uniform would be overall dictionary-level, or even DDL-level recognition of the conventional CIF null values as distinct from their quoted analogs, and similarly-scoped specifications that numbers be presented unquoted.  We especially want to discourage such distinctions being drawn on an item-by-item basis, but I don’t think that’s a major problem because none of our DDLs has a means to express that.


() an adjustment to the prose definition of DDL1's '_type' attribute, which is anyway either incomplete or inconsistent in version 1.4.1 of that dictionary, as it pertains to type numb.  This could provide format details for the general case, to be narrowed where necessary by other definition attributes.


() a recommendation to CIF authors (but mostly to their proxies, authors of software that outputs CIF) that numeric data values be presented unquoted wherever their data types permit.


() a recommendation to authors of software that reads CIF to accept quoted numeric data values, even when their data types do not actually allow it.  This is not meant to preclude software issuing diagnostic messages warning about malformed numeric values in the event that values are presented quoted when their items' definitions demand otherwise.


() a recommendation to CIF dictionary authors that the defined format for numeric data types be consistent with the ITVG numeric syntax wherever possible.



Looking forward to the next edition of ITVG -- and with apologies to the section 2.2 authors, many of whom I know are receiving this -- I think section 2.2 would benefit from a thorough rewrite.  Minor tweaks here and there won’t really suffice.  The current version is a concatenation of two distinct documents with overlapping subject matter coverage, drawing on document history and lineage that extends to a time before that of some of the material it is intended to specify.  It is needlessly repetitive, and it emphasizes some details that these days are of minimal importance.  As the discussion here has shown, it is also tricky to interpret in places, and it struggles a bit to accommodate both DDL1 and DDL2 practices.  It will face an even bigger challenge in the next edition, with the addition of CIF 2.0 syntax and DDLm (albeit probably in a different section).  I suggest, therefore, that we not worry at this point about prose for that edition, but instead work on making the best we can of the current edition by providing written interpretations and, if necessary, corrigenda.








From: ddlm-group [mailto:ddlm-group-bounces@iucr.org] On Behalf Of James Hester
Sent: Monday, August 03, 2015 9:13 PM
To: ddlm-group
Subject: [ddlm-group] How to specify syntax of a number in CIF2


Dear All,

The preceding discussion around possible semantic distinctions between whitespace and non-whitespace delimited strings has thrown up an unresolved semantic issue in CIF2.  In a nutshell, a programmer wishing to write a number in CIF2 currently has no specification anywhere as to how that number should be presented, and neither do CIF2 readers know how to interpret strings as numbers.

In CIF1.1, the syntax description is included in the BNF, and the DDL2 system additionally permits each dictionary to specify the text syntax of the types used in that particular dictionary using _item_type_list.construct.

In making this specification, I think we should preserve the following behaviour:

(1) DDL dictionaries are format agnostic (i.e. they could be used to define ontologies for other file formats) - our DDLs are advanced and potentially useful to other communities

(2) DDL dictionaries determine whether or not a value should be interpreted as a number (as they define the nature of a dataitem)

In a practical sense, software written in consultation with a dictionary is happy to specify that it expects a number when it calls an API routine to obtain a datavalue, as this knowledge is available at program writing time.  So the onus is on the API routine to look at the sequence of characters that for the requested datavalue and decide if it can return something that the calling software understands as a number. 

So I would suggest the following be inserted into "Common semantic features" in our online specs and the next edition of Vol G:


A datavalue may only be interpreted as a real number if it conforms to the following syntax:

<insert delimiter-agnostic CIF1 syntax expressions here>

A datavalue may only be interpreted as an integer if it conforms to the following syntax:

<insert suitable delimiter-agnostic integer ENBF expressions here>

What do you think?




T +61 (02) 9717 9907
F +61 (02) 9717 3145
M +61 (04) 0249 4148

ddlm-group mailing list

Reply to: [list | sender only]
International Union of Crystallography

Scientific Union Member of the International Science Council (admitted 1947). Member of CODATA, the ISC Committee on Data. Partner with UNESCO, the United Nations Educational, Scientific and Cultural Organization in the International Year of Crystallography 2014.

International Science Council Scientific Freedom Policy

The IUCr observes the basic policy of non-discrimination and affirms the right and freedom of scientists to associate in international scientific activity without regard to such factors as ethnic origin, religion, citizenship, language, political stance, gender, sex or age, in accordance with the Statutes of the International Council for Science.