Discussion List Archives

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [ddlm-group] Triple-quoted strings

Triple-quotes have the advantage that simple strings can be inline, 
instead of block format. Maybe it is reasonable to also update semicolon 
quoting as I suggested below, so that quoted lines drop the initial 
space for all intervening lines, allowing for "pretty" block text 
formatting?

Herbert J. Bernstein wrote:
> The current proposal to deal with nested quotes, for all the quoted
> strngs in CIF2 is to work python-style:
> 
>    1.  The reverse-solidus (\) is used to escape all quote marks and
> the reverse solidus itself
Is the escape required for all instances of reverse-solidus, or only 
when writing a literal \" ? If it is mandatory, shouldn't it be 
mandatory for quotes as well?

>    2.  All quotes strings will terminate on the first uneascaped
> closing quote, indpendent of whether it is followed by a blank.
If the old quoting rules are to be dropped, why use triple quotes at 
all? Plain old quotes, with all embedded quotes escaped, is sufficient. 
Python has triple quotes because single-quoted strings can have variable 
substitutions within the string, which is not relevant for CIF.

>    3.  All CIF2 writers are required to follow any closing quote with
> a separator appropriate to the context, i.e. on the top level with
> whitespace, or in a list by a comma or a close brace, etc.
>    4.  The reverse-solidus itself would always be passed to the
> application to decide what to do with it.
> 
> Thus the following would be used to nest treble quote
> 
> """ here we are inside \""" treble quotes \""" """
> 
> and the application would receive
> 
> here we are inside \""" treble quotes \"""
What is the rationale for #4? IMHO, it will be much less problematic to 
hide data-formatting issues from applications calling the CIF I/O routines.

It would be much safer to avoid "corrupting" the data with details of 
the format. Is it also requires for the application to escape it's own 
strings when writing data? For example, if writing a literal """, does 
the CIF library escape it for me, and I later read back a different 
string, \""", and have to remove the reverse-solidus myself? Or, does it 
return an error saying it is an invalid string?

Normally, escape sequences specific to the format are encoded and 
decoded by the low-level routines, and essentially hidden from the user. 
    Otherwise, it adds unnecessary complications to applications using 
the I/O library. A well-designed system should just accept raw strings 
from the caller, insert any quotes and/or escape sequences, and do the 
reverse when the data is read back in. The CIF2 library would 
automatically know that strings with brackets must be quoted, even 
though that was not required for CIF1.

Just my 2 cents.

Joe Krahn

> 
>    -- HJB
> =====================================================
>   Herbert J. Bernstein, Professor of Computer Science
>     Dowling College, Kramer Science Center, KSC 121
>          Idle Hour Blvd, Oakdale, NY, 11769
> 
>                   +1-631-244-3035
>                   yaya@dowling.edu
> =====================================================
> 
> On Tue, 27 Oct 2009, Joe Krahn wrote:
> 
>> I just joined the DDL list. Here is my view on triple-quoted strings. I
>> hope I'm not missing something already covered on the list.
>>
>> IMHO, triple-quoting is not a very good solution for multi-line text.
>> You still have to define a way to escape a literal triple-quote. Why not
>> just stick with the existing newline-semicolon method, and only have to
>> define a special escape code for a semicolon at the beginning of a line?
>> It is backward-compatible because older CIF's simply do not have
>> embedded newline-semicolon. With triple-quotes, you have to deal with
>> the possibility of """ being a quoted quote character.
>>
>> Another possibility is to remove a leading space from all lines within a
>> semicolon-quoted block. Most CIF text blocks already formatted that way,
>> so that the first line of text with the semicolon is at the same
>> indentation level as the remaining lines. For example:
>>
>> ;11111
>>  22222
>>  33333
>> ;
>>
>> is a quoted block representing:
>>
>> 11111
>> 22222
>> 33333
>>
>>
>> Even though some people might like triple-quoting due to being Python
>> users, I think that the alternatives are a better fit into the existing
>> syntax.
>>
>> Joe Krahn
>> _______________________________________________
>> ddlm-group mailing list
>> ddlm-group@iucr.org
>> http://scripts.iucr.org/mailman/listinfo/ddlm-group
>>
> _______________________________________________
> ddlm-group mailing list
> ddlm-group@iucr.org
> http://scripts.iucr.org/mailman/listinfo/ddlm-group

_______________________________________________
ddlm-group mailing list
ddlm-group@iucr.org
http://scripts.iucr.org/mailman/listinfo/ddlm-group

Reply to: [list | sender only]
International Union of Crystallography

Scientific Union Member of the International Science Council (admitted 1947). Member of CODATA, the ISC Committee on Data. Partner with UNESCO, the United Nations Educational, Scientific and Cultural Organization in the International Year of Crystallography 2014.

International Science Council Scientific Freedom Policy

The IUCr observes the basic policy of non-discrimination and affirms the right and freedom of scientists to associate in international scientific activity without regard to such factors as ethnic origin, religion, citizenship, language, political stance, gender, sex or age, in accordance with the Statutes of the International Council for Science.