Discussion List Archives

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[ddlm-group] How to specify syntax of a number in CIF2

  • To: ddlm-group <ddlm-group@iucr.org>
  • Subject: [ddlm-group] How to specify syntax of a number in CIF2
  • From: James Hester <jamesrhester@gmail.com>
  • Date: Tue, 4 Aug 2015 12:12:38 +1000
  • DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;h=mime-version:date:message-id:subject:from:to:content-type;bh=vYW3s2CSd9qU6Wn4iqNOSMSRhSg31V+tiPFfQByU+rc=;b=CkC2DJkxieKUkYbawJse6BieQPD6GSnxY9XbpRxP//hH3qYCQPRnd8+a/JZ/kvPe2ai3lCIEh5ebGr66gL/AMURG1BlEVyvzaWZbjt3aJJURSHH/NttEU7ODOR4VBErJX6/ffoUfeXseuKzOME6765iHoGTJv2WViiVGCeAx/6to8qhvqMvyxZreUnWEiL8yBOPkYz7T+lHvHcfZ4LtpHeWsvKmuiEqj0v7hmtxMQ3AD9ks+uRZ5opS7+ySeZF9su0qWK2lUWHBbDHnFQ/BIH7eryE2NFJUhgoIBXvv3S4stZh16UNUbByxXv8whgvGUc1J8E2Zvqja+zot7FIeoPA==
Dear All,

The preceding discussion around possible semantic distinctions between whitespace and non-whitespace delimited strings has thrown up an unresolved semantic issue in CIF2.  In a nutshell, a programmer wishing to write a number in CIF2 currently has no specification anywhere as to how that number should be presented, and neither do CIF2 readers know how to interpret strings as numbers.

In CIF1.1, the syntax description is included in the BNF, and the DDL2 system additionally permits each dictionary to specify the text syntax of the types used in that particular dictionary using _item_type_list.construct.

In making this specification, I think we should preserve the following behaviour:

(1) DDL dictionaries are format agnostic (i.e. they could be used to define ontologies for other file formats) - our DDLs are advanced and potentially useful to other communities
(2) DDL dictionaries determine whether or not a value should be interpreted as a number (as they define the nature of a dataitem)

In a practical sense, software written in consultation with a dictionary is happy to specify that it expects a number when it calls an API routine to obtain a datavalue, as this knowledge is available at program writing time.  So the onus is on the API routine to look at the sequence of characters that for the requested datavalue and decide if it can return something that the calling software understands as a number. 

So I would suggest the following be inserted into "Common semantic features" in our online specs and the next edition of Vol G:

====
A datavalue may only be interpreted as a real number if it conforms to the following syntax:

<insert delimiter-agnostic CIF1 syntax expressions here>

A datavalue may only be interpreted as an integer if it conforms to the following syntax:

<insert suitable delimiter-agnostic integer ENBF expressions here>
=====

What do you think?

James.
--
T +61 (02) 9717 9907
F +61 (02) 9717 3145
M +61 (04) 0249 4148
_______________________________________________
ddlm-group mailing list
ddlm-group@iucr.org
http://mailman.iucr.org/cgi-bin/mailman/listinfo/ddlm-group

Reply to: [list | sender only]
International Union of Crystallography

Scientific Union Member of the International Science Council (admitted 1947). Member of CODATA, the ISC Committee on Data. Partner with UNESCO, the United Nations Educational, Scientific and Cultural Organization in the International Year of Crystallography 2014.

International Science Council Scientific Freedom Policy

The IUCr observes the basic policy of non-discrimination and affirms the right and freedom of scientists to associate in international scientific activity without regard to such factors as ethnic origin, religion, citizenship, language, political stance, gender, sex or age, in accordance with the Statutes of the International Council for Science.