Discussion List Archives

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[ddlm-group] THREAD 3: The alphabet of non-delimited strings.

  • To: Group finalising DDLm and associated dictionaries <ddlm-group@iucr.org>
  • Subject: [ddlm-group] THREAD 3: The alphabet of non-delimited strings.
  • From: Nick Spadaccini <nick@csse.uwa.edu.au>
  • Date: Fri, 18 Sep 2009 14:15:50 +0800
  • Authentication-Results: postfix;
As I have written before, non-delimited string (non-DS) that are not of the
Number or Measured types cause problems. Everything you need to include in a
string can be handled by the delimited string types. With the introduction
of compound data structures, restrictions have to be imposed on the allowed
alphabet of non-delimited strings so the scanner is not "fooled".

If you HAVE to use non-delimited strings then the alphabet is restricted to,

non-DS = [A_Za-z0-9./-()+?][A_Za-z0-9_./-()+?]*

(Allowing for / is in deference to James, I don't see a great need for it.)

The square brackets [] are part of the regexp and not allowed characters.
This will cover all numerics including Measured, and decline the first
character as _. None of the token delimiters are included in the alphabet.
Note also the classic example of symop is x,y+1/2,z IS NOT allowed, though a
quick scan of the IUCr cif archive shows many submissions already quite
sensibly use "x,y+1/2,z".

One level of simplification will be in the definition of datanames (DN). We
could simply define a data name as

DN = _{non-DS}

All CIF data names in the new DDLm dictionaries are consistent with this
restriction. A small amount of remediation (which has to be undertaken
anyway) will need to be done for existing domain dictionaries written in
either DDL1 or DDL2.

A further simplification is that one can write the scanner to look for token
terminating characters, rather than DEMANDING it be followed or preceded by
a whitespace.


cheers

Nick

--------------------------------
Associate Professor N. Spadaccini, PhD
School of Computer Science & Software Engineering

The University of Western Australia    t: +61 (0)8 6488 3452
35 Stirling Highway                    f: +61 (0)8 6488 1089
CRAWLEY, Perth,  WA  6009 AUSTRALIA   w3: www.csse.uwa.edu.au/~nick
MBDP  M002

CRICOS Provider Code: 00126G

e: Nick.Spadaccini@uwa.edu.au





_______________________________________________
ddlm-group mailing list
ddlm-group@iucr.org
http://scripts.iucr.org/mailman/listinfo/ddlm-group

Reply to: [list | sender only]
International Union of Crystallography

Scientific Union Member of the International Science Council (admitted 1947). Member of CODATA, the ISC Committee on Data. Partner with UNESCO, the United Nations Educational, Scientific and Cultural Organization in the International Year of Crystallography 2014.

International Science Council Scientific Freedom Policy

The IUCr observes the basic policy of non-discrimination and affirms the right and freedom of scientists to associate in international scientific activity without regard to such factors as ethnic origin, religion, citizenship, language, political stance, gender, sex or age, in accordance with the Statutes of the International Council for Science.