[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Reply to: [list | sender only]
[ddlm-group] THREAD 3: The alphabet of non-delimited strings.
- To: Group finalising DDLm and associated dictionaries <ddlm-group@iucr.org>
- Subject: [ddlm-group] THREAD 3: The alphabet of non-delimited strings.
- From: Nick Spadaccini <nick@csse.uwa.edu.au>
- Date: Fri, 18 Sep 2009 14:15:50 +0800
- Authentication-Results: postfix;
As I have written before, non-delimited string (non-DS) that are not of the Number or Measured types cause problems. Everything you need to include in a string can be handled by the delimited string types. With the introduction of compound data structures, restrictions have to be imposed on the allowed alphabet of non-delimited strings so the scanner is not "fooled". If you HAVE to use non-delimited strings then the alphabet is restricted to, non-DS = [A_Za-z0-9./-()+?][A_Za-z0-9_./-()+?]* (Allowing for / is in deference to James, I don't see a great need for it.) The square brackets [] are part of the regexp and not allowed characters. This will cover all numerics including Measured, and decline the first character as _. None of the token delimiters are included in the alphabet. Note also the classic example of symop is x,y+1/2,z IS NOT allowed, though a quick scan of the IUCr cif archive shows many submissions already quite sensibly use "x,y+1/2,z". One level of simplification will be in the definition of datanames (DN). We could simply define a data name as DN = _{non-DS} All CIF data names in the new DDLm dictionaries are consistent with this restriction. A small amount of remediation (which has to be undertaken anyway) will need to be done for existing domain dictionaries written in either DDL1 or DDL2. A further simplification is that one can write the scanner to look for token terminating characters, rather than DEMANDING it be followed or preceded by a whitespace. cheers Nick -------------------------------- Associate Professor N. Spadaccini, PhD School of Computer Science & Software Engineering The University of Western Australia t: +61 (0)8 6488 3452 35 Stirling Highway f: +61 (0)8 6488 1089 CRAWLEY, Perth, WA 6009 AUSTRALIA w3: www.csse.uwa.edu.au/~nick MBDP M002 CRICOS Provider Code: 00126G e: Nick.Spadaccini@uwa.edu.au _______________________________________________ ddlm-group mailing list ddlm-group@iucr.org http://scripts.iucr.org/mailman/listinfo/ddlm-group
Reply to: [list | sender only]
- Follow-Ups:
- Re: [ddlm-group] THREAD 3: The alphabet of non-delimited strings. (Herbert J. Bernstein)
- Prev by Date: [ddlm-group] THREAD 2: token delimiters
- Next by Date: Re: [ddlm-group] THREAD 3: The alphabet of non-delimited strings.
- Prev by thread: Re: [ddlm-group] [THREAD 4] UTF8
- Next by thread: Re: [ddlm-group] THREAD 3: The alphabet of non-delimited strings.
- Index(es):