# Re: [ddlm-group] Triple-quoted strings

On 30/10/09 3:14 AM, "Joe Krahn" <krahn@niehs.nih.gov> wrote:

> Triple-quotes have the advantage that simple strings can be inline,
> instead of block format. Maybe it is reasonable to also update semicolon
> quoting as I suggested below, so that quoted lines drop the initial
> space for all intervening lines, allowing for "pretty" block text
> formatting?

Not sure I appreciate what is being requested here. STAR has no requirement
of a leading space, this must be a CIF thing. I am guessing it is yet
another throwback to the Fortran-esque view of the world. A space to
separate tokens at the END of the previous line will be lost if you apply
GOFFIO (Good Old Fashion Fortran IO). Hence it has to be the leading
character of the next line.

> Herbert J. Bernstein wrote:
>> The current proposal to deal with nested quotes, for all the quoted
>> strngs in CIF2 is to work python-style:
>>
>>    1.  The reverse-solidus (\) is used to escape all quote marks and
>> the reverse solidus itself
> Is the escape required for all instances of reverse-solidus, or only
> when writing a literal \" ? If it is mandatory, shouldn't it be
> mandatory for quotes as well?

Parsed this several times in my head (I am getting old), and can't decipher

>>    2.  All quotes strings will terminate on the first uneascaped
>> closing quote, indpendent of whether it is followed by a blank.
> If the old quoting rules are to be dropped, why use triple quotes at
> all? Plain old quotes, with all embedded quotes escaped, is sufficient.
> Python has triple quotes because single-quoted strings can have variable
> substitutions within the string, which is not relevant for CIF.

True for Python. The new CIF2 string states that a reverse solidus before AN
ALLOWED character protects that character from interpretation as a token
terminator. (ALL ALLOWED characters are protected in this way). The triple
quote is a CIF multiline comment. A newline is not an ALLOWED character
within a single or double quote string - hence you cannot achieve the
equivalent of a triple quote y protecting a newline character.

>>    3.  All CIF2 writers are required to follow any closing quote with
>> a separator appropriate to the context, i.e. on the top level with
>> whitespace, or in a list by a comma or a close brace, etc.
>>    4.  The reverse-solidus itself would always be passed to the
>> application to decide what to do with it.
>>
>> Thus the following would be used to nest treble quote
>>
>> """ here we are inside \""" treble quotes \""" """
>>
>> and the application would receive
>>
>> here we are inside \""" treble quotes \"""
> What is the rationale for #4? IMHO, it will be much less problematic to
> hide data-formatting issues from applications calling the CIF I/O routines.
>
> It would be much safer to avoid "corrupting" the data with details of
> the format. Is it also requires for the application to escape it's own
> strings when writing data? For example, if writing a literal """, does
> the CIF library escape it for me, and I later read back a different
> string, \""", and have to remove the reverse-solidus myself? Or, does it
> return an error saying it is an invalid string?
>
> Normally, escape sequences specific to the format are encoded and
> decoded by the low-level routines, and essentially hidden from the user.
>     Otherwise, it adds unnecessary complications to applications using
> the I/O library. A well-designed system should just accept raw strings
> from the caller, insert any quotes and/or escape sequences, and do the
> reverse when the data is read back in. The CIF2 library would
> automatically know that strings with brackets must be quoted, even
> though that was not required for CIF1.
>
> Just my 2 cents.

I can understand that the applications need to know how to interpret the
format. The problem is, there are existing multiple formats encoded in CIFs.
If we could just assume one, say the C formatting style that \n=newline,
\t=tab etc it would be easy.

However your statement that the en/de-coding is done by low level routines
needs clarification. If you read from a file "hello\nworld" in Python and
print it, it is printed as a raw string. None of the decoding is done for
you. It is not until you print the eval() of the string that the decoding is
done - that is it is left to the invocation of an application. That is all
we are saying here. Trouble is there will be several eval() routines for
different encodings. Which eval() routine will depend on another data item
to tell you which encoding, eg

_abstract.encoding "Latex"

_abstract """This abstract will be in \latex,
and in {\bf this} is in bold typeface"""

The encoding could come from the dictionary, but that would require ALL
_abstract values be encoded in Latex.

But I agree the set of eval() routines need to be provided (or at least
there specification.

> Joe Krahn
>
>>
>>    -- HJB
>> =====================================================
>>   Herbert J. Bernstein, Professor of Computer Science
>>     Dowling College, Kramer Science Center, KSC 121
>>          Idle Hour Blvd, Oakdale, NY, 11769
>>
>>                   +1-631-244-3035
>>                   yaya@dowling.edu
>> =====================================================
>>
>> On Tue, 27 Oct 2009, Joe Krahn wrote:
>>
>>> I just joined the DDL list. Here is my view on triple-quoted strings. I
>>> hope I'm not missing something already covered on the list.
>>>
>>> IMHO, triple-quoting is not a very good solution for multi-line text.
>>> You still have to define a way to escape a literal triple-quote. Why not
>>> just stick with the existing newline-semicolon method, and only have to
>>> define a special escape code for a semicolon at the beginning of a line?
>>> It is backward-compatible because older CIF's simply do not have
>>> embedded newline-semicolon. With triple-quotes, you have to deal with
>>> the possibility of """ being a quoted quote character.
>>>
>>> Another possibility is to remove a leading space from all lines within a
>>> semicolon-quoted block. Most CIF text blocks already formatted that way,
>>> so that the first line of text with the semicolon is at the same
>>> indentation level as the remaining lines. For example:
>>>
>>> ;11111
>>>  22222
>>>  33333
>>> ;
>>>
>>> is a quoted block representing:
>>>
>>> 11111
>>> 22222
>>> 33333
>>>
>>>
>>> Even though some people might like triple-quoting due to being Python
>>> users, I think that the alternatives are a better fit into the existing
>>> syntax.
>>>
>>> Joe Krahn
>>> _______________________________________________
>>> ddlm-group mailing list
>>> ddlm-group@iucr.org
>>> http://scripts.iucr.org/mailman/listinfo/ddlm-group
>>>
>> _______________________________________________
>> ddlm-group mailing list
>> ddlm-group@iucr.org
>> http://scripts.iucr.org/mailman/listinfo/ddlm-group
>
> _______________________________________________
> ddlm-group mailing list
> ddlm-group@iucr.org
> http://scripts.iucr.org/mailman/listinfo/ddlm-group

cheers

Nick

--------------------------------
School of Computer Science & Software Engineering

The University of Western Australia    t: +61 (0)8 6488 3452
35 Stirling Highway                    f: +61 (0)8 6488 1089
CRAWLEY, Perth,  WA  6009 AUSTRALIA   w3: www.csse.uwa.edu.au/~nick
MBDP  M002

CRICOS Provider Code: 00126G