[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[ddlm-group] Python-type eliding for triple-quoted strings

To: ddlm-group <[email protected]>
Subject: [ddlm-group] Python-type eliding for triple-quoted strings
From: James Hester <[email protected]>
Date: Tue, 4 Jan 2011 00:17:41 +1100

I am going to divide Ralf's proposal into two parts, which both
separately solve the problem of representing every possible string in
a CIF file.

Proposal A: strings can be delimited by three quotes or three
apostrophes ("cooked strings" hereafter) or else by three quotes or
three apostrophes immediately preceded by the letter 'r' ("raw
strings").  Both cooked and raw strings define two special sequences:
<backslash><delimiter> and <backslash><backslash>.  When these
sequences are encountered in a cooked string, the first backslash is
removed and the second character no longer has any special meaning
(delimiter or elide).  When these sequences are encountered in a raw
string, they function as for a cooked string, but the initial
<backslash> is not removed. Note that I have deliberately excluded the
following escape sequences from this proposal as they are not
syntactically relevant: \newline, \a, \b, \f,\n,\r,\t,\v,\ooo, \xhh

Under Proposal A, the sequence <backslash><delimiter> is represented
as <backslash><backslash><backslash><delimiter> in a cooked string.
In a raw string, it may be left as <backslash><delimiter>.  In a raw
string, a string terminating with <delimiter> must contain
<backslash><delimiter> as the last two characters.  A raw string
cannot finish with a single <backslash>.

Proposal B: strings can be delimited by three quotes or three
apostrophes or else by three quotes or three apostrophes immediately
preceded by the letter 'u' ("unicode strings").  In a non-unicode
string, no special behaviour is defined (as in the current CIF2
proposal).  In a Unicode string, the escapes \uxxxx and \Uxxxxxx are
defined as the corresponding Unicode code point.


I believe that this scheme is not particularly appropriate for the CIF
context, which is unsurprising given that Python literals are designed
for embedding in programs and CIF literals are intended to encapsulate
arbitrary data.  My criticisms are as follows:

(1) Many of the <backslash><character> sequences in non-raw strings
already have a meaning as IUCr markup or LaTeX markup
(2) The lexer must be informed of the
(2) Raw strings will include the <backslash><delimiter> sequence in
the datavalue, meaning that the


-- 
T +61 (02) 9717 9907
F +61 (02) 9717 3145
M +61 (04) 0249 4148
_______________________________________________
ddlm-group mailing list
[email protected]
http://scripts.iucr.org/mailman/listinfo/ddlm-group

Reply to: [list | sender only]

Follow-Ups:

Re: [ddlm-group] Python-type eliding for triple-quoted strings (James Hester)

Prev by Date: Re: [ddlm-group] Moving forward with DDLm

Next by Date: Re: [ddlm-group] Python-type eliding for triple-quoted strings

Prev by thread: Re: [ddlm-group] Eliding in triple-quoted strings: Proposals C and D

Next by thread: Re: [ddlm-group] Python-type eliding for triple-quoted strings

Index(es):

Date

Thread

Discussion List Archives

[ddlm-group] Python-type eliding for triple-quoted strings