Discussion List Archives

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [ddlm-group] How to specify syntax of a number in CIF2

Hi Simon,

Yes, we would be defining/preserving the base numeric types. A DDLm dictionary has no mechanism (currently) to specify in a machine-readable way the character syntax of a general data type (i.e. those listed in _type.contents), so the partial answer to the second part of your question is no, DDLm dictionaries do not have the freedom to extend the basic set of types.  There is a mechanism for defining new types based on dREL character string transformations but I'm not across it yet.

As far as these number types go, I don't actually see that the dictionary needs to care about the particular representation.  I'm going to go into my analysis of this in my talk at Rovinj and paper that I am drafting, but in a nutshell, a dictionary deals with mathematical and scientific meaning, and can be independent of any datafile format.  So if DDLm _type.contents is 'Real', the CIF-dictionary-aware application deals with whatever its particular programming framework uses to represent a 'Real'.  Thus my requirement that we specify what can be interpreted as a 'Real' in the CIF file format documents somewhere.

Unlike DDLm and DDL1, DDL2 dictionaries can choose to define character regexes for all types.  I would choose to interpret such data definitions as saying "*If* the datavalue is provided as a sequence of characters, this is the regex it should match".

all the best,
James.

On 4 August 2015 at 21:05, SIMON WESTRIP <simonwestrip@btinternet.com> wrote:
I would encourage this (especially as CIF2 supports Unicode and thus potentially widens the actual character set that could be interpreted as 'numbers'). By this we would be defining/preserving the base numeric types, while still giving a dictionary the freedom to extend/define its own numeric types according to whatever character sequences its domain prefers?

Cheers

Simon


From: James Hester <jamesrhester@gmail.com>
To: ddlm-group <ddlm-group@iucr.org>
Sent: Tuesday, 4 August 2015, 3:12
Subject: [ddlm-group] How to specify syntax of a number in CIF2

Dear All,

The preceding discussion around possible semantic distinctions between whitespace and non-whitespace delimited strings has thrown up an unresolved semantic issue in CIF2.  In a nutshell, a programmer wishing to write a number in CIF2 currently has no specification anywhere as to how that number should be presented, and neither do CIF2 readers know how to interpret strings as numbers.

In CIF1.1, the syntax description is included in the BNF, and the DDL2 system additionally permits each dictionary to specify the text syntax of the types used in that particular dictionary using _item_type_list.construct.

In making this specification, I think we should preserve the following behaviour:

(1) DDL dictionaries are format agnostic (i.e. they could be used to define ontologies for other file formats) - our DDLs are advanced and potentially useful to other communities
(2) DDL dictionaries determine whether or not a value should be interpreted as a number (as they define the nature of a dataitem)

In a practical sense, software written in consultation with a dictionary is happy to specify that it expects a number when it calls an API routine to obtain a datavalue, as this knowledge is available at program writing time.  So the onus is on the API routine to look at the sequence of characters that for the requested datavalue and decide if it can return something that the calling software understands as a number. 

So I would suggest the following be inserted into "Common semantic features" in our online specs and the next edition of Vol G:

====
A datavalue may only be interpreted as a real number if it conforms to the following syntax:

<insert delimiter-agnostic CIF1 syntax expressions here>

A datavalue may only be interpreted as an integer if it conforms to the following syntax:

<insert suitable delimiter-agnostic integer ENBF expressions here>
=====

What do you think?

James.
--
T +61 (02) 9717 9907
F +61 (02) 9717 3145
M +61 (04) 0249 4148

_______________________________________________
ddlm-group mailing list
ddlm-group@iucr.org
http://mailman.iucr.org/cgi-bin/mailman/listinfo/ddlm-group



_______________________________________________
ddlm-group mailing list
ddlm-group@iucr.org
http://mailman.iucr.org/cgi-bin/mailman/listinfo/ddlm-group




--
T +61 (02) 9717 9907
F +61 (02) 9717 3145
M +61 (04) 0249 4148
_______________________________________________
ddlm-group mailing list
ddlm-group@iucr.org
http://mailman.iucr.org/cgi-bin/mailman/listinfo/ddlm-group

Reply to: [list | sender only]
International Union of Crystallography

Scientific Union Member of the International Science Council (admitted 1947). Member of CODATA, the ISC Committee on Data. Partner with UNESCO, the United Nations Educational, Scientific and Cultural Organization in the International Year of Crystallography 2014.

International Science Council Scientific Freedom Policy

The IUCr observes the basic policy of non-discrimination and affirms the right and freedom of scientists to associate in international scientific activity without regard to such factors as ethnic origin, religion, citizenship, language, political stance, gender, sex or age, in accordance with the Statutes of the International Council for Science.