[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Reply to: [list | sender only]
[ddlm-group] Python-type eliding for triple-quoted strings
- To: ddlm-group <ddlm-group@iucr.org>
- Subject: [ddlm-group] Python-type eliding for triple-quoted strings
- From: James Hester <jamesrhester@gmail.com>
- Date: Tue, 4 Jan 2011 00:17:41 +1100
I am going to divide Ralf's proposal into two parts, which both separately solve the problem of representing every possible string in a CIF file. Proposal A: strings can be delimited by three quotes or three apostrophes ("cooked strings" hereafter) or else by three quotes or three apostrophes immediately preceded by the letter 'r' ("raw strings"). Both cooked and raw strings define two special sequences: <backslash><delimiter> and <backslash><backslash>. When these sequences are encountered in a cooked string, the first backslash is removed and the second character no longer has any special meaning (delimiter or elide). When these sequences are encountered in a raw string, they function as for a cooked string, but the initial <backslash> is not removed. Note that I have deliberately excluded the following escape sequences from this proposal as they are not syntactically relevant: \newline, \a, \b, \f,\n,\r,\t,\v,\ooo, \xhh Under Proposal A, the sequence <backslash><delimiter> is represented as <backslash><backslash><backslash><delimiter> in a cooked string. In a raw string, it may be left as <backslash><delimiter>. In a raw string, a string terminating with <delimiter> must contain <backslash><delimiter> as the last two characters. A raw string cannot finish with a single <backslash>. Proposal B: strings can be delimited by three quotes or three apostrophes or else by three quotes or three apostrophes immediately preceded by the letter 'u' ("unicode strings"). In a non-unicode string, no special behaviour is defined (as in the current CIF2 proposal). In a Unicode string, the escapes \uxxxx and \Uxxxxxx are defined as the corresponding Unicode code point. I believe that this scheme is not particularly appropriate for the CIF context, which is unsurprising given that Python literals are designed for embedding in programs and CIF literals are intended to encapsulate arbitrary data. My criticisms are as follows: (1) Many of the <backslash><character> sequences in non-raw strings already have a meaning as IUCr markup or LaTeX markup (2) The lexer must be informed of the (2) Raw strings will include the <backslash><delimiter> sequence in the datavalue, meaning that the -- T +61 (02) 9717 9907 F +61 (02) 9717 3145 M +61 (04) 0249 4148 _______________________________________________ ddlm-group mailing list ddlm-group@iucr.org http://scripts.iucr.org/mailman/listinfo/ddlm-group
Reply to: [list | sender only]
- Follow-Ups:
- Prev by Date: Re: [ddlm-group] Moving forward with DDLm
- Next by Date: Re: [ddlm-group] Python-type eliding for triple-quoted strings
- Prev by thread: Re: [ddlm-group] Eliding in triple-quoted strings: Proposals C and D
- Next by thread: Re: [ddlm-group] Python-type eliding for triple-quoted strings
- Index(es):