Discussion List Archives

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[ddlm-group] Semantics of whitespace-delimited values

  • To: ddlm-group <ddlm-group@iucr.org>
  • Subject: [ddlm-group] Semantics of whitespace-delimited values
  • From: James Hester <jamesrhester@gmail.com>
  • Date: Wed, 8 Jul 2015 00:17:32 +1000
  • DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;h=mime-version:date:message-id:subject:from:to:content-type;bh=2ihdNHQ+I6Mn9Spf1IJVRvV98cJ4R5oA95zhnETayug=;b=vEx3FIXuW1b2NMAxvMR+4YHixtc7COxkfwbsY094EZZL59jTJc1wRwyfoRox9UZcNMi2AUwYA4aHpRHKlMZ3G4/P70dyGK7u+PUa/c+5nbI8vH5Tf7nLBhU+4XZOeY2jRaVSMvfY8FsJ/JHLA3eIDBCBflcI9zRMC9HUwG3dmvB0tmCkpvbdoajCHxqFysU2wyn1+iVz4bF0anN3sbsCKpkjDm6OnlOCkF8dDA4NWrWZSo5g5+2inBt0Q+IZQLK2zDvsTHHrS2NqyC+I0yCM10ZTpo/MbnWGPgDBGXZABrORHtUNW/BTbPHlGvaE4gNJCOIHkYYT8wm3LclqwDqmKg==
Dear All,

One issue that has not been discussed in the context of the CIF2 syntax is the special interpretation of whitespace-delimited values.  In CIF1.1 as recorded in Volume G, a whitespace-delimited question mark and a whitespace-delimited period have a special interpretation as "unknown" and "default/not applicable/null" respectively.  Furthermore, only a whitespace-delimited value matching a specified syntax (which includes optional appended esd values) may be interpreted as a numeric value, and it would strictly speaking be a semantic error for a CIF processor to interpret as a  number a numeric value enclosed in delimiters.

I have no issue with question mark or period, as these are necessary for semantic completeness. 

What I would like to discuss for CIF2.0 is the following:
(i) The interpretation of a data value as numeric is determined solely by the dictionary with no regard to the particular delimiters used in the CIF file;
(ii) A convention is encouraged for CIF writers whereby numeric values are not enclosed by delimiters.
(iii) The precise construction of numeric values is moved into the DDLm attribute dictionary.

The advantage of this simpler scheme is a clean separation between syntax and human-relevant semantics.  The only CIF applications that can have a use for the CIF1 scheme are those that are written without reference to a dictionary, most obviously pretty-printers that might want to tabulate numbers by lining up decimal points instead of left-justifying.  Even if such formatting applications get it wrong, they will not change the meaning of the file and so I would view point (ii) as sufficient support for such applications.  Conversely, any application that wishes to operate on a number as opposed to operating on the textual representation of the number will of necessity need to know what this number means and will therefore be written with reference to a dictionary, making it unnecessary to signal "numericness" using whitespace deliimited datavalues.

What do others think?  If there is a body of CIF1 applications out there that have been designed to raise errors when values expected to be numeric are enclosed by delimiters, this proposal would represent a further annoying change from CIF1, and it would be good to have some idea of how many such applications there are.  I speculate that many applications ignore the delimiter status, for reasons both of laziness, the authority of the dictionary definitions, and the philosophy of writing liberal parsers.

all the best,
James.


--
T +61 (02) 9717 9907
F +61 (02) 9717 3145
M +61 (04) 0249 4148
_______________________________________________
ddlm-group mailing list
ddlm-group@iucr.org
http://mailman.iucr.org/cgi-bin/mailman/listinfo/ddlm-group

Reply to: [list | sender only]
International Union of Crystallography

Scientific Union Member of the International Council for Science (admitted 1947). Member of CODATA, the ICSU Committee on Data. Member of ICSTI, the International Council for Scientific and Technical Information. Partner with UNESCO, the United Nations Educational, Scientific and Cultural Organization in the International Year of Crystallography 2014.

ICSU Scientific Freedom Policy

The IUCr observes the basic policy of non-discrimination and affirms the right and freedom of scientists to associate in international scientific activity without regard to such factors as ethnic origin, religion, citizenship, language, political stance, gender, sex or age, in accordance with the Statutes of the International Council for Science.