[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [ddlm-group] Technical issues with Proposal P

Dear Simon,

   I make mistakes on this, too.  That is why I like having IDLE
handy and sticking to Python syntax.

   Regards,
     Herbert

=====================================================
  Herbert J. Bernstein, Professor of Computer Science
    Dowling College, Kramer Science Center, KSC 121
         Idle Hour Blvd, Oakdale, NY, 11769

                  +1-631-244-3035
                  yaya@dowling.edu
=====================================================

On Tue, 22 Feb 2011, SIMON WESTRIP wrote:

> Dear Herbert - I've just realized I confused myself by misreading your example
> and treating it as equivalent to my own example! Sorry about that.
> 
> Cheers
> 
> Simon
> 
> 
> _______________________________________________________________________________________________________________________
> From: SIMON WESTRIP <simonwestrip@btinternet.com>
> To: Group finalising DDLm and associated dictionaries <ddlm-group@iucr.org>
> Sent: Tuesday, 22 February, 2011 14:51:03
> Subject: Re: [ddlm-group] Technical issues with Proposal P
> 
> Dear Herbert
> 
> I'm still a bit confused. Following python semantics,
> a CIF application reading the following items
> 
> _item_a """C\""""
> _item_b r"""C\""""
> 
> should return values of
> 
> C" for _item_a
> C\" for _item_b
> 
> Are you suggesting that the application should then *assume* that in the case of
> _item_b the use of the backslash was purely to escape the final quote and should
> discard the backslash from the value, thus assuming a value of C" ?
> 
> Cheers
> 
> Simon
> 
> _______________________________________________________________________________________________________________________
> From: Herbert J. Bernstein <yaya@bernstein-plus-sons.com>
> To: Group finalising DDLm and associated dictionaries <ddlm-group@iucr.org>
> Sent: Tuesday, 22 February, 2011 13:51:02
> Subject: Re: [ddlm-group] Technical issues with Proposal P
> 
> Dear Simon,
> 
>   From the point of view of writing a pure "CIF2" application
> that is not aware of the whitespace, particular quote marks,
> comments, etc, those two string are identical.
> 
>   From the point of view of a more general CIF API, in which
> comments, magic numbers, and partiular quote marks, those
> two string are different in precisely the same way that
> the string 'ABC' and "ABC" are different, and 13.4 and
> 1.34e1 are different.
> 
>   This is _not_ an ambiguity.  It is a matter of whether
> we are looking for the information in a file or looking
> for the representations of the data in the file.
> 
>   Regards,
>     Herbert
> 
> 
> =====================================================
> Herbert J. Bernstein, Professor of Computer Science
>   Dowling College, Kramer Science Center, KSC 121
>         Idle Hour Blvd, Oakdale, NY, 11769
> 
>                 +1-631-244-3035
>                 yaya@dowling.edu
> =====================================================
> 
> On Tue, 22 Feb 2011, SIMON WESTRIP wrote:
> 
> > So
> > """\\\"""" and r"""\""""
> > should strictly be treated as different, despite any recommendations you may
> > have made to the contrary?
> >
> >
> > ____________________________________________________________________________
> > From: Herbert J. Bernstein <yaya@bernstein-plus-sons.com>
> > To: Group finalising DDLm and associated dictionaries <ddlm-group@iucr.org>
> > Sent: Tuesday, 22 February, 2011 12:46:57
> > Subject: Re: [ddlm-group] Technical issues with Proposal P
> >
> > > So what is r"""C\"""" ?
> > >
> > > Is it C\" or is it C" ?
> >
> > """C\"""" is C"
> >
> > r"""C\"""" is C\"
> >
> > You can test this with IDLE.  It is very clearly defined and
> > reproducible Python string behavior, and I believe helps to make
> > the case for sticking to the Python approach.  It is very easy
> > for any software developer or user to work out how the boundary
> > cases are being handled.
> >
> > Regards,
> >   Herbert
> >
> > =====================================================
> > Herbert J. Bernstein, Professor of Computer Science
> >   Dowling College, Kramer Science Center, KSC 121
> >         Idle Hour Blvd, Oakdale, NY, 11769
> >
> >                 +1-631-244-3035
> >                 yaya@dowling.edu
> > =====================================================
> >
> > On Tue, 22 Feb 2011, SIMON WESTRIP wrote:
> >
> > > I am a little confused:
> > >
> > > So what is r"""C\"""" ?
> > >
> > > Is it C\" or is it C" ?
> > >
> > > Python says it should be C\", so CIF2 should say its C\" if CIF2 is
> > adopting
> > > Python?
> > >
> > > Or are you suggesting that we should adopt a fuzzy interpretation of
> > Python?
> > >
> > > Cheers
> > >
> > > Simon
> > >
> > >___________________________________________________________________________
> > _
> > > From: Herbert J. Bernstein <yaya@bernstein-plus-sons.com>
> > > To: Group finalising DDLm and associated dictionaries
> > <ddlm-group@iucr.org>
> > > Sent: Tuesday, 22 February, 2011 12:01:23
> > > Subject: Re: [ddlm-group] Technical issues with Proposal P
> > >
> > > Dear Colleagues,
> > >
> > >   Working under the assumption of Ralf's proposal, rather
> > > than Simon's, we have several very distinct string presentaions
> > > to consider:  a (non-raw) treble quoted string, a raw treble
> > > quoted string a unicode treble quoted string and a raw unicode
> > > treble quoted string.  As for Python 3, under CIF2, because
> > > the "native" character encoding is UTF-8, under reasonable coding
> > > constraints, this collapses to just two cases the application
> > > needs to deal with:  non-raw (i.e. cooked) versus raw.  The intent of
> > > the cooked is for the lexer to process the elides, so the response
> > > I gave is, I believe correct -- just push the string through IDLE.
> > > The intent of the raw is precisely to push through the string
> > > with the backslahes still in place, e.g. for TeX text in which
> > > you don't want to double-up your backslashes.  While I personally
> > > would recommend against such a use of raw, it is not ambiguous.
> > > It gives the application a very well-defined string of characters
> > > to deal with.  Yes, there are applications that are intended to
> > > deal with CIF with the encoding exposed (e.g. cif2cbf, cif2cif, etc.)
> > > bit, I agree that the cleanest design is for an application to
> > > only make use of the string content, not the representation.
> > >
> > >   Thus, for most applications, I would recommend that they treat
> > >
> > >   """\\\"""" and r"""\""""
> > >
> > > as equivalent, but for applications that are, for example,
> > > intended to do faithful copies of the representations that
> > > they treat them as different.
> > >
> > >   We have had, and will continue to have this subtle problem
> > > with all versions of CIF in the handling of things such as
> > > magic number, comments, white space, line folding, and choices
> > > of quoting characters.  I don't see how the introduction of
> > > the Python treble quote makes the situation any worse or
> > > any more or less ambiguous.
> > >
> > >   Regards,
> > >     Herbert
> > >
> > > =====================================================
> > >   Herbert J. Bernstein, Professor of Computer Science
> > >     Dowling College, Kramer Science Center, KSC 121
> > >         Idle Hour Blvd, Oakdale, NY, 11769
> > >
> > >                   +1-631-244-3035
> > >                   yaya@dowling.edu
> > > =====================================================
> > >
> > > On Tue, 22 Feb 2011, James Hester wrote:
> > >
> > > > I will focus this email on the technical issues and try to return to
> > > > the other issues at a later date (I've changed the subject
> > > > accordingly)
> > > >
> > > > [edit]
> > > >
> > > > My apologies for not being clear: my examples of embedded elides
> > > > already give the internal representation of the strings, deliberately
> > > > leaving out the particular delimiters that might have been used to
> > > > produce those strings.  Herbert mistakenly thought I was giving
> > > > triple-double-quote delimited strings and asking what the internal
> > > > representation was. So, unfortunately, IDLE cannot help here, as the
> > > > internal representation is not in question.
> > > >
> > > > My question therefore remains: how does the CIF application interpret
> > > > these strings? Is the <backslash><delimiter> in my examples simply an
> > > > elide that could not be removed from a raw string and therefore should
> > > > be ignored, or is it a character sequence intended for the application
> > > > (eg a LaTeX accent on the o or e)?
> > > >
> > > > In your answer you may assume that the CIF application knows that the
> > > > string was a raw string delimited by triple double quotes (even though
> > > > requiring communication of such information would be a very
> > > > unfortunate loss of clean design).
> > > >
> > > > Those strings again:
> > > >
> > > > <start> I have no idea what the last characters of this string
> > > are\"<finish>
> > > > <start> Does this string have two\""" or three internal quotes?<finish>
> > > >
> > > >
> > > > Herbert writes:
> > > >>   Now for your two examples of embedded elides of quotes:
> > > >>
> > > >> <start> I have no idea what the last characters of this string
> > > are\"<finish>
> > > >>
> > > >> is, internally, as a C-string
> > > >>
> > > >> I have no idea what the last characters of this string are"\0
> > > >>
> > > >> <start> Does this string have two\""" or three internal quotes?<finish>
> > > >>
> > > >> is, internally as a C-string
> > > >>
> > > >> Does this string have two""" or three internal quotes?\0
> > > >>
> > > >> I settled that by simply cranking up IDLE and doing:
> > > >>
> > > >>>>>  print """I have no idea what the last characters of this string
> > > >>>>> are\"""" I have no idea what the last characters of this string
> > > >>>>> are" >>> print """Does this string have two\""" or three internal
> > > >>>>> quotes?""" Does this string have two""" or three internal quotes?
> > > >>
> > > >> As you well know, having IDLE around is a big help.
> > > >>
> > > >>   Thank you again for taking the time to clarify your position
> > > >> on Ralf's proposal.  I think I now understand why you prefer Simon's
> > > >> proposal.
> > > >>
> > > >>   Regards,
> > > >>     Herbert
> > > >>
> > > >>
> > > >>
> > > >>
> > > >>
> > > >
> > > >>> One technical issue with Proposal P that has not been resolved is how
> > > >>> a CIF application is supposed to interpret the sequence
> > > >>> <backslash><double quote> when encountered in a string returned from
> > > >>> the parser.  Is this sequence:
> > > >>> (a) a terminator elide sequence that was left in a raw string, so
> > > >>> corresponds to <double quote>?
> > > >>> (b) something with meaning for the application so should be
> > > >>> <backslash><double quote>?
> > > >>>
> > > >>> Please therefore advise how a CIF application will disambiguate the
> > > >>> following string content from a Proposal P parser:
> > > >>>
> > > >>> <start> I have no idea what the last characters of this string
> > > are\"<finish>
> > > >>> <start> Does this string have two\""" or three internal
> > quotes?<finish>
> > > >>>
> > > >>> James
> > > >>>
> > > >
> > > >
> > > >
> > > > --
> > > > T +61 (02) 9717 9907
> > > > F +61 (02) 9717 3145
> > > > M +61 (04) 0249 4148
> > > > _______________________________________________
> > > > ddlm-group mailing list
> > > > ddlm-group@iucr.org
> > > > http://scripts.iucr.org/mailman/listinfo/ddlm-group
> > > >
> > >
> > >
> >
> >
> 
>
_______________________________________________
ddlm-group mailing list
ddlm-group@iucr.org
http://scripts.iucr.org/mailman/listinfo/ddlm-group

Reply to: [list | sender only]