[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Reply to: [list | sender only]
[ddlm-group] THREAD 3: The alphabet of non-delimited strings.
- To: Group finalising DDLm and associated dictionaries <[email protected]>
- Subject: [ddlm-group] THREAD 3: The alphabet of non-delimited strings.
- From: Nick Spadaccini <[email protected]>
- Date: Fri, 18 Sep 2009 14:15:50 +0800
- Authentication-Results: postfix;
As I have written before, non-delimited string (non-DS) that are not of the
Number or Measured types cause problems. Everything you need to include in a
string can be handled by the delimited string types. With the introduction
of compound data structures, restrictions have to be imposed on the allowed
alphabet of non-delimited strings so the scanner is not "fooled".
If you HAVE to use non-delimited strings then the alphabet is restricted to,
non-DS = [A_Za-z0-9./-()+?][A_Za-z0-9_./-()+?]*
(Allowing for / is in deference to James, I don't see a great need for it.)
The square brackets [] are part of the regexp and not allowed characters.
This will cover all numerics including Measured, and decline the first
character as _. None of the token delimiters are included in the alphabet.
Note also the classic example of symop is x,y+1/2,z IS NOT allowed, though a
quick scan of the IUCr cif archive shows many submissions already quite
sensibly use "x,y+1/2,z".
One level of simplification will be in the definition of datanames (DN). We
could simply define a data name as
DN = _{non-DS}
All CIF data names in the new DDLm dictionaries are consistent with this
restriction. A small amount of remediation (which has to be undertaken
anyway) will need to be done for existing domain dictionaries written in
either DDL1 or DDL2.
A further simplification is that one can write the scanner to look for token
terminating characters, rather than DEMANDING it be followed or preceded by
a whitespace.
cheers
Nick
--------------------------------
Associate Professor N. Spadaccini, PhD
School of Computer Science & Software Engineering
The University of Western Australia t: +61 (0)8 6488 3452
35 Stirling Highway f: +61 (0)8 6488 1089
CRAWLEY, Perth, WA 6009 AUSTRALIA w3: www.csse.uwa.edu.au/~nick
MBDP M002
CRICOS Provider Code: 00126G
e: [email protected]
_______________________________________________
ddlm-group mailing list
[email protected]
http://scripts.iucr.org/mailman/listinfo/ddlm-group
Reply to: [list | sender only]
- Follow-Ups:
- Re: [ddlm-group] THREAD 3: The alphabet of non-delimited strings. (Herbert J. Bernstein)
- Prev by Date: [ddlm-group] THREAD 2: token delimiters
- Next by Date: Re: [ddlm-group] THREAD 3: The alphabet of non-delimited strings.
- Prev by thread: Re: [ddlm-group] [THREAD 4] UTF8
- Next by thread: Re: [ddlm-group] THREAD 3: The alphabet of non-delimited strings.
- Index(es):

