[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [ddlm-group] Triple-quoted strings

To: Nick.Spadaccini@uwa.edu.au, Group finalising DDLm and associated dictionaries <ddlm-group@iucr.org>
Subject: Re: [ddlm-group] Triple-quoted strings
From: "Herbert J. Bernstein" <yaya@bernstein-plus-sons.com>
Date: Mon, 9 Nov 2009 07:55:51 -0500 (EST)
In-Reply-To: <C71DA0FD.12369%nick@csse.uwa.edu.au>
References: <C71DA0FD.12369%nick@csse.uwa.edu.au>
Dear Colleagues,

   There is no CIF requirement for a leading space.  With the CIF
line-folding protocol, the following is a perfectly valid alternate
representation on the string "abcdefgh" in CIF 1.1 with line folding

;\
abc\
def\
gh\
;



=====================================================
  Herbert J. Bernstein, Professor of Computer Science
    Dowling College, Kramer Science Center, KSC 121
         Idle Hour Blvd, Oakdale, NY, 11769

                  +1-631-244-3035
                  yaya@dowling.edu
=====================================================

On Mon, 9 Nov 2009, Nick Spadaccini wrote:

> On 30/10/09 3:14 AM, "Joe Krahn" <krahn@niehs.nih.gov> wrote:
>
>> Triple-quotes have the advantage that simple strings can be inline,
>> instead of block format. Maybe it is reasonable to also update semicolon
>> quoting as I suggested below, so that quoted lines drop the initial
>> space for all intervening lines, allowing for "pretty" block text
>> formatting?
>
> Not sure I appreciate what is being requested here. STAR has no requirement
> of a leading space, this must be a CIF thing. I am guessing it is yet
> another throwback to the Fortran-esque view of the world. A space to
> separate tokens at the END of the previous line will be lost if you apply
> GOFFIO (Good Old Fashion Fortran IO). Hence it has to be the leading
> character of the next line.
>
>> Herbert J. Bernstein wrote:
>>> The current proposal to deal with nested quotes, for all the quoted
>>> strngs in CIF2 is to work python-style:
>>>
>>>    1.  The reverse-solidus (\) is used to escape all quote marks and
>>> the reverse solidus itself
>> Is the escape required for all instances of reverse-solidus, or only
>> when writing a literal \" ? If it is mandatory, shouldn't it be
>> mandatory for quotes as well?
>
> Parsed this several times in my head (I am getting old), and can't decipher
> what you asking. Clarify please.
>
>>>    2.  All quotes strings will terminate on the first uneascaped
>>> closing quote, indpendent of whether it is followed by a blank.
>> If the old quoting rules are to be dropped, why use triple quotes at
>> all? Plain old quotes, with all embedded quotes escaped, is sufficient.
>> Python has triple quotes because single-quoted strings can have variable
>> substitutions within the string, which is not relevant for CIF.
>
> True for Python. The new CIF2 string states that a reverse solidus before AN
> ALLOWED character protects that character from interpretation as a token
> terminator. (ALL ALLOWED characters are protected in this way). The triple
> quote is a CIF multiline comment. A newline is not an ALLOWED character
> within a single or double quote string - hence you cannot achieve the
> equivalent of a triple quote y protecting a newline character.
>
>>>    3.  All CIF2 writers are required to follow any closing quote with
>>> a separator appropriate to the context, i.e. on the top level with
>>> whitespace, or in a list by a comma or a close brace, etc.
>>>    4.  The reverse-solidus itself would always be passed to the
>>> application to decide what to do with it.
>>>
>>> Thus the following would be used to nest treble quote
>>>
>>> """ here we are inside \""" treble quotes \""" """
>>>
>>> and the application would receive
>>>
>>> here we are inside \""" treble quotes \"""
>> What is the rationale for #4? IMHO, it will be much less problematic to
>> hide data-formatting issues from applications calling the CIF I/O routines.
>>
>> It would be much safer to avoid "corrupting" the data with details of
>> the format. Is it also requires for the application to escape it's own
>> strings when writing data? For example, if writing a literal """, does
>> the CIF library escape it for me, and I later read back a different
>> string, \""", and have to remove the reverse-solidus myself? Or, does it
>> return an error saying it is an invalid string?
>>
>> Normally, escape sequences specific to the format are encoded and
>> decoded by the low-level routines, and essentially hidden from the user.
>>     Otherwise, it adds unnecessary complications to applications using
>> the I/O library. A well-designed system should just accept raw strings
>> from the caller, insert any quotes and/or escape sequences, and do the
>> reverse when the data is read back in. The CIF2 library would
>> automatically know that strings with brackets must be quoted, even
>> though that was not required for CIF1.
>>
>> Just my 2 cents.
>
> I can understand that the applications need to know how to interpret the
> format. The problem is, there are existing multiple formats encoded in CIFs.
> If we could just assume one, say the C formatting style that \n=newline,
> \t=tab etc it would be easy.
>
> However your statement that the en/de-coding is done by low level routines
> needs clarification. If you read from a file "hello\nworld" in Python and
> print it, it is printed as a raw string. None of the decoding is done for
> you. It is not until you print the eval() of the string that the decoding is
> done - that is it is left to the invocation of an application. That is all
> we are saying here. Trouble is there will be several eval() routines for
> different encodings. Which eval() routine will depend on another data item
> to tell you which encoding, eg
>
> _abstract.encoding "Latex"
>
> _abstract """This abstract will be in \latex,
> and in {\bf this} is in bold typeface"""
>
> The encoding could come from the dictionary, but that would require ALL
> _abstract values be encoded in Latex.
>
> But I agree the set of eval() routines need to be provided (or at least
> there specification.
>
>> Joe Krahn
>>
>>>
>>>    -- HJB
>>> =====================================================
>>>   Herbert J. Bernstein, Professor of Computer Science
>>>     Dowling College, Kramer Science Center, KSC 121
>>>          Idle Hour Blvd, Oakdale, NY, 11769
>>>
>>>                   +1-631-244-3035
>>>                   yaya@dowling.edu
>>> =====================================================
>>>
>>> On Tue, 27 Oct 2009, Joe Krahn wrote:
>>>
>>>> I just joined the DDL list. Here is my view on triple-quoted strings. I
>>>> hope I'm not missing something already covered on the list.
>>>>
>>>> IMHO, triple-quoting is not a very good solution for multi-line text.
>>>> You still have to define a way to escape a literal triple-quote. Why not
>>>> just stick with the existing newline-semicolon method, and only have to
>>>> define a special escape code for a semicolon at the beginning of a line?
>>>> It is backward-compatible because older CIF's simply do not have
>>>> embedded newline-semicolon. With triple-quotes, you have to deal with
>>>> the possibility of """ being a quoted quote character.
>>>>
>>>> Another possibility is to remove a leading space from all lines within a
>>>> semicolon-quoted block. Most CIF text blocks already formatted that way,
>>>> so that the first line of text with the semicolon is at the same
>>>> indentation level as the remaining lines. For example:
>>>>
>>>> ;11111
>>>>  22222
>>>>  33333
>>>> ;
>>>>
>>>> is a quoted block representing:
>>>>
>>>> 11111
>>>> 22222
>>>> 33333
>>>>
>>>>
>>>> Even though some people might like triple-quoting due to being Python
>>>> users, I think that the alternatives are a better fit into the existing
>>>> syntax.
>>>>
>>>> Joe Krahn
>>>> _______________________________________________
>>>> ddlm-group mailing list
>>>> ddlm-group@iucr.org
>>>> http://scripts.iucr.org/mailman/listinfo/ddlm-group
>>>>
>>> _______________________________________________
>>> ddlm-group mailing list
>>> ddlm-group@iucr.org
>>> http://scripts.iucr.org/mailman/listinfo/ddlm-group
>>
>> _______________________________________________
>> ddlm-group mailing list
>> ddlm-group@iucr.org
>> http://scripts.iucr.org/mailman/listinfo/ddlm-group
>
> cheers
>
> Nick
>
> --------------------------------
> Associate Professor N. Spadaccini, PhD
> School of Computer Science & Software Engineering
>
> The University of Western Australia    t: +61 (0)8 6488 3452
> 35 Stirling Highway                    f: +61 (0)8 6488 1089
> CRAWLEY, Perth,  WA  6009 AUSTRALIA   w3: www.csse.uwa.edu.au/~nick
> MBDP  M002
>
> CRICOS Provider Code: 00126G
>
> e: Nick.Spadaccini@uwa.edu.au
>
>
>
>
> _______________________________________________
> ddlm-group mailing list
> ddlm-group@iucr.org
> http://scripts.iucr.org/mailman/listinfo/ddlm-group
>
_______________________________________________
ddlm-group mailing list
ddlm-group@iucr.org
http://scripts.iucr.org/mailman/listinfo/ddlm-group
Reply to: [list | sender only]

References:

Re: [ddlm-group] Triple-quoted strings (Nick Spadaccini)

Prev by Date: Re: [ddlm-group] CIF-2 changes

Next by Date: Re: [ddlm-group] CIF-2 changes

Prev by thread: Re: [ddlm-group] Triple-quoted strings

Next by thread: [ddlm-group] Relationship of CIF2 to legacy platforms

Index(es):

Date

Thread
Discussion List Archives

Re: [ddlm-group] Triple-quoted strings