Discussion List Archives

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [ddlm-group] Python-type eliding for triple-quoted strings

My apologies, I sent this email instead of saving it for further
editing.  You may disregard the contents until I resend it at a later
date.

James.

On Tue, Jan 4, 2011 at 12:17 AM, James Hester <jamesrhester@gmail.com> wrote:
> I am going to divide Ralf's proposal into two parts, which both
> separately solve the problem of representing every possible string in
> a CIF file.
>
> Proposal A: strings can be delimited by three quotes or three
> apostrophes ("cooked strings" hereafter) or else by three quotes or
> three apostrophes immediately preceded by the letter 'r' ("raw
> strings").  Both cooked and raw strings define two special sequences:
> <backslash><delimiter> and <backslash><backslash>.  When these
> sequences are encountered in a cooked string, the first backslash is
> removed and the second character no longer has any special meaning
> (delimiter or elide).  When these sequences are encountered in a raw
> string, they function as for a cooked string, but the initial
> <backslash> is not removed. Note that I have deliberately excluded the
> following escape sequences from this proposal as they are not
> syntactically relevant: \newline, \a, \b, \f,\n,\r,\t,\v,\ooo, \xhh
>
> Under Proposal A, the sequence <backslash><delimiter> is represented
> as <backslash><backslash><backslash><delimiter> in a cooked string.
> In a raw string, it may be left as <backslash><delimiter>.  In a raw
> string, a string terminating with <delimiter> must contain
> <backslash><delimiter> as the last two characters.  A raw string
> cannot finish with a single <backslash>.
>
> Proposal B: strings can be delimited by three quotes or three
> apostrophes or else by three quotes or three apostrophes immediately
> preceded by the letter 'u' ("unicode strings").  In a non-unicode
> string, no special behaviour is defined (as in the current CIF2
> proposal).  In a Unicode string, the escapes \uxxxx and \Uxxxxxx are
> defined as the corresponding Unicode code point.
>
>
> I believe that this scheme is not particularly appropriate for the CIF
> context, which is unsurprising given that Python literals are designed
> for embedding in programs and CIF literals are intended to encapsulate
> arbitrary data.  My criticisms are as follows:
>
> (1) Many of the <backslash><character> sequences in non-raw strings
> already have a meaning as IUCr markup or LaTeX markup
> (2) The lexer must be informed of the
> (2) Raw strings will include the <backslash><delimiter> sequence in
> the datavalue, meaning that the
>
>
> --
> T +61 (02) 9717 9907
> F +61 (02) 9717 3145
> M +61 (04) 0249 4148
>



-- 
T +61 (02) 9717 9907
F +61 (02) 9717 3145
M +61 (04) 0249 4148
_______________________________________________
ddlm-group mailing list
ddlm-group@iucr.org
http://scripts.iucr.org/mailman/listinfo/ddlm-group


Reply to: [list | sender only]
International Union of Crystallography

Scientific Union Member of the International Science Council (admitted 1947). Member of CODATA, the ISC Committee on Data. Partner with UNESCO, the United Nations Educational, Scientific and Cultural Organization in the International Year of Crystallography 2014.

International Science Council Scientific Freedom Policy

The IUCr observes the basic policy of non-discrimination and affirms the right and freedom of scientists to associate in international scientific activity without regard to such factors as ethnic origin, religion, citizenship, language, political stance, gender, sex or age, in accordance with the Statutes of the International Council for Science.