[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [ddlm-group] Technical issues with Proposal P. .

Hi Herbert:

You posit an additional requirement, that the internal representation
of any string may not contain artefacts of the syntactical
representation ("with a meaning depending on the type given in the
dictionary").  Given this additional requirement, we can confidently
say that my first example string finishes with an accented e and that
the second string contains an accented o followed by two double

In line with this requirement, if in fact I wanted to finish the first
string with a double quote, then I am forbidden to use a double-quoted
raw string.  Likewise, if there are any triple double quotes
internally I cannot use a double-quoted raw string.  If there are both
triple quotes and triple double quotes in my string, I cannot use raw
strings at all for my text and either have to double up all my
backslashes in 'cooked' strings, or revert to <semicolon><eol>
digraphs.  If my string contains <semicolon><eol> digraphs, then my
only choice is to use the "cooked" strings of the Python proposal.

This additional requirement would have to be added to Proposal P, and
everybody would just have to hope that CIF programmers are all
sufficiently on the ball to detect any problem strings - or more
likely they will simplify the code and just "cook" everything, making
raw strings rather pointless.

I frankly cannot understand why anyone would think that such a fragile
scheme is superior to the spare elegance of Proposals F and F'
(particularly F'), but at least we have a resolution of this
particular technical issue.

On Wed, Feb 23, 2011 at 9:49 AM, Herbert J. Bernstein
<yaya@bernstein-plus-sons.com> wrote:
> Dear James,
>  I don't understand the question.  In its internal representation
> the string is what it is.  What is the ambuguity?  If a
> string is presented to an application by an API and it contains
> \", then it contains the two characters backslash and double quote
> with a meaning that depends on the type specified in the dictionary.
> The are no delimiters in the internal representation, so a double
> quote is as good or bad a character as any other.  How is it a delimiter
> internally?  Is there some rule that we are not supposed to have strings
> whose internal representation contains a delimiter?
>  Regards,
>    Herbert
> =====================================================
>  Herbert J. Bernstein, Professor of Computer Science
>   Dowling College, Kramer Science Center, KSC 121
>        Idle Hour Blvd, Oakdale, NY, 11769
>                 +1-631-244-3035
>                 yaya@dowling.edu
> =====================================================
> On Wed, 23 Feb 2011, James Hester wrote:
>> I would point out that nobody has yet addressed, let alone answered my
>> question.  I am *not* confused about going from syntax to internal
>> representation, as it appears Simon briefly was.  I am concerned about
>> how a CIF application will disambiguate the character sequence
>> <backslash><delimiter> *in the internal representation*.
>> I am however glad that we all seem to agree that the particular
>> delimiters used to express data values should not be significant
>> beyond the parser.
>> On Wed, Feb 23, 2011 at 3:32 AM, Bollinger, John C
>> <John.Bollinger@stjude.org> wrote:
>>> On Tuesday, February 22, 2011 7:51 AM, Herbert J. Bernstein wrote:
>>>>   From the point of view of writing a pure "CIF2" application that is
>>>> not aware of the whitespace, particular quote marks, comments, etc, those
>>>> two string are identical.
>>>>   From the point of view of a more general CIF API, in which comments,
>>>> magic numbers, and partiular quote marks, those two string are different in
>>>> precisely the same way that the string 'ABC' and "ABC" are different, and
>>>> 13.4 and
>>>> 1.34e1 are different.
>>>>   This is _not_ an ambiguity.  It is a matter of whether we are looking
>>>> for the information in a file or looking for the representations of the data
>>>> in the file.
>>> Herbert is right about this.  It doesn't matter which syntactic variant
>>> was used to express a data value in an input CIF.  Once the value is parsed,
>>> the result is the value.  In particular, under proposal P, """C\""""
>>> expresses a different value than does r"""C\"""", whereas """C\\\"""" and
>>> r"""C\"""" express the same value.  The fact that the character sequence C"
>>> cannot be expressed via Python raw string format is irrelevant.  An
>>> application receiving these values does not need to know and should not care
>>> in which form the value was expressed in a CIF, if indeed it was ever
>>> expressed in CIF format at all.
>>> However, although there is no technical issue here, the fact that an
>>> experienced and successful Python and CIF practitioner such as James raised
>>> the question is illuminating.  It demonstrates that the complexity of the
>>> syntax and semantics provided by proposal P would be likely to be a source
>>> of confusion for developers and users both.
>>> Regards,
>>> John
>>> --
>>> John C. Bollinger, Ph.D.
>>> Department of Structural Biology
>>> St. Jude Children's Research Hospital
>>> Email Disclaimer:  www.stjude.org/emaildisclaimer
>>> _______________________________________________
>>> ddlm-group mailing list
>>> ddlm-group@iucr.org
>>> http://scripts.iucr.org/mailman/listinfo/ddlm-group
>> --
>> T +61 (02) 9717 9907
>> F +61 (02) 9717 3145
>> M +61 (04) 0249 4148
>> _______________________________________________
>> ddlm-group mailing list
>> ddlm-group@iucr.org
>> http://scripts.iucr.org/mailman/listinfo/ddlm-group
> _______________________________________________
> ddlm-group mailing list
> ddlm-group@iucr.org
> http://scripts.iucr.org/mailman/listinfo/ddlm-group

T +61 (02) 9717 9907
F +61 (02) 9717 3145
M +61 (04) 0249 4148
ddlm-group mailing list

Reply to: [list | sender only]