Re: [ddlm-group] Focusing the elide discussion

Dear James

Sorry for the delay in replying to this request. A long train
journey yesterday gave me the opportunity to review the
discussions on this point. In order of preference I would
rank the proposals as roughly:

F'    - requires least handling of special escapes
E     - allows the generic handling of Unicode character set and
        long lines as native CIF2 features (but only within "special"
        i.e. triple-quote delimited strings)

I think these allow embedding of any string in a reasonably clean way.

C     - I quite like, provided the post-elide escape could be a sequence
        (e.g. borrowing from TeX, the trigraph "{} is read as a
         double-quote; the literal sequence
        <doublequote><open brace><close brace> would be represented by "{}{}
        and any other sequence would have no special meaning). If those with
        greater experience argue that this imposes too great a load on the
        initial lexical scan, or can demonstrate that this leads too
        quickly to a proliferation of unreadable punctuation marks,
        this would drop quickly down the lilst of preferred approaches).
F     - because I'm not sure what is gained just by protecting the
        escape character everywhere; but on the other hand it may seem
        an easy procedure to describe to potential implementors
B     - carries an unwelcome overhead in requiring the escape character
        (here, backslash) to be encoded everywhere


P     - brings in unnecessary syntactiv overhead when we can achieve a
        closed system by simpler means.

Best wishes

On Thu, Jan 13, 2011 at 12:20:09AM +1100, James Hester wrote:
> By my count there are 6 distinct proposals for eliding triple-quoted
> strings on the table, which I have listed below.  In order to get an
> idea of where we all stand and which proposals are most likely to
> succeed, I'd like to invite you all to reply to this email with a list
> of proposals which you would find acceptable.  If you like, you can
> rank them in order of preference.  In the list below I've given short
> descriptions, but you should refer to the original emails for the full
> details.  The opinions of COMCIFS voting members are of course most
> significant at this juncture, but I for one am interested in the
> thoughts of the other members as well.
> Proposal P (for Python): Ralf's original proposal to do everything as in Python
> Proposal A: <backslash><delimiter> elides the delimiter, no other
> sequences are significant
> Proposal B: \uxxxx to represent Unicode characters, no other sequences
> are significant
> Proposal C: as yet unspecified character post-elides the delimiter
> where necessary
> Proposal D: as for C, except post-elide character is given immediately
> before opening triple delimiter
> Proposal E: (John B's suggestion) \uxxxx for Unicode character
> together with \<newline> and \\
> Proposal F: (Simon's proposal) \<newline> and \\ only
> Proposal F': (My slight tweak of Simon's proposal) \<newline> only
> when not preceded by \
> I find proposal P unacceptable, and would rank the others in order of
> preference roughly as follows:
> Best: F', F, C
> Bearable: A, B, E
> In a pinch: D
