Discussion List Archives

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [ddlm-group] Technical issues with Proposal P

So
"""\\\"""" and r"""\""""
should strictly be treated as different, despite any recommendations you may
have made to the contrary?



From: Herbert J. Bernstein <yaya@bernstein-plus-sons.com>
To: Group finalising DDLm and associated dictionaries <ddlm-group@iucr.org>
Sent: Tuesday, 22 February, 2011 12:46:57
Subject: Re: [ddlm-group] Technical issues with Proposal P

> So what is r"""C\"""" ?
>
> Is it C\" or is it C" ?

"""C\"""" is C"

r"""C\"""" is C\"

You can test this with IDLE.  It is very clearly defined and
reproducible Python string behavior, and I believe helps to make
the case for sticking to the Python approach.  It is very easy
for any software developer or user to work out how the boundary
cases are being handled.

Regards,
  Herbert

=====================================================
Herbert J. Bernstein, Professor of Computer Science
  Dowling College, Kramer Science Center, KSC 121
        Idle Hour Blvd, Oakdale, NY, 11769

                +1-631-244-3035
                yaya@dowling.edu
=====================================================

On Tue, 22 Feb 2011, SIMON WESTRIP wrote:

> I am a little confused:
>
> So what is r"""C\"""" ?
>
> Is it C\" or is it C" ?
>
> Python says it should be C\", so CIF2 should say its C\" if CIF2 is adopting
> Python?
>
> Or are you suggesting that we should adopt a fuzzy interpretation of Python?
>
> Cheers
>
> Simon
>
> ____________________________________________________________________________
> From: Herbert J. Bernstein <yaya@bernstein-plus-sons.com>
> To: Group finalising DDLm and associated dictionaries <ddlm-group@iucr.org>
> Sent: Tuesday, 22 February, 2011 12:01:23
> Subject: Re: [ddlm-group] Technical issues with Proposal P
>
> Dear Colleagues,
>
>   Working under the assumption of Ralf's proposal, rather
> than Simon's, we have several very distinct string presentaions
> to consider:  a (non-raw) treble quoted string, a raw treble
> quoted string a unicode treble quoted string and a raw unicode
> treble quoted string.  As for Python 3, under CIF2, because
> the "native" character encoding is UTF-8, under reasonable coding
> constraints, this collapses to just two cases the application
> needs to deal with:  non-raw (i.e. cooked) versus raw.  The intent of
> the cooked is for the lexer to process the elides, so the response
> I gave is, I believe correct -- just push the string through IDLE.
> The intent of the raw is precisely to push through the string
> with the backslahes still in place, e.g. for TeX text in which
> you don't want to double-up your backslashes.  While I personally
> would recommend against such a use of raw, it is not ambiguous.
> It gives the application a very well-defined string of characters
> to deal with.  Yes, there are applications that are intended to
> deal with CIF with the encoding exposed (e.g. cif2cbf, cif2cif, etc.)
> bit, I agree that the cleanest design is for an application to
> only make use of the string content, not the representation.
>
>   Thus, for most applications, I would recommend that they treat
>
>   """\\\"""" and r"""\""""
>
> as equivalent, but for applications that are, for example,
> intended to do faithful copies of the representations that
> they treat them as different.
>
>   We have had, and will continue to have this subtle problem
> with all versions of CIF in the handling of things such as
> magic number, comments, white space, line folding, and choices
> of quoting characters.  I don't see how the introduction of
> the Python treble quote makes the situation any worse or
> any more or less ambiguous.
>
>   Regards,
>     Herbert
>
> =====================================================
>   Herbert J. Bernstein, Professor of Computer Science
>     Dowling College, Kramer Science Center, KSC 121
>         Idle Hour Blvd, Oakdale, NY, 11769
>
>                   +1-631-244-3035
>                   yaya@dowling.edu
> =====================================================
>
> On Tue, 22 Feb 2011, James Hester wrote:
>
> > I will focus this email on the technical issues and try to return to
> > the other issues at a later date (I've changed the subject
> > accordingly)
> >
> > [edit]
> >
> > My apologies for not being clear: my examples of embedded elides
> > already give the internal representation of the strings, deliberately
> > leaving out the particular delimiters that might have been used to
> > produce those strings.  Herbert mistakenly thought I was giving
> > triple-double-quote delimited strings and asking what the internal
> > representation was. So, unfortunately, IDLE cannot help here, as the
> > internal representation is not in question.
> >
> > My question therefore remains: how does the CIF application interpret
> > these strings? Is the <backslash><delimiter> in my examples simply an
> > elide that could not be removed from a raw string and therefore should
> > be ignored, or is it a character sequence intended for the application
> > (eg a LaTeX accent on the o or e)?
> >
> > In your answer you may assume that the CIF application knows that the
> > string was a raw string delimited by triple double quotes (even though
> > requiring communication of such information would be a very
> > unfortunate loss of clean design).
> >
> > Those strings again:
> >
> > <start> I have no idea what the last characters of this string
> are\"<finish>
> > <start> Does this string have two\""" or three internal quotes?<finish>
> >
> >
> > Herbert writes:
> >>   Now for your two examples of embedded elides of quotes:
> >>
> >> <start> I have no idea what the last characters of this string
> are\"<finish>
> >>
> >> is, internally, as a C-string
> >>
> >> I have no idea what the last characters of this string are"\0
> >>
> >> <start> Does this string have two\""" or three internal quotes?<finish>
> >>
> >> is, internally as a C-string
> >>
> >> Does this string have two""" or three internal quotes?\0
> >>
> >> I settled that by simply cranking up IDLE and doing:
> >>
> >>>>>  print """I have no idea what the last characters of this string
> >>>>> are\"""" I have no idea what the last characters of this string
> >>>>> are" >>> print """Does this string have two\""" or three internal
> >>>>> quotes?""" Does this string have two""" or three internal quotes?
> >>
> >> As you well know, having IDLE around is a big help.
> >>
> >>   Thank you again for taking the time to clarify your position
> >> on Ralf's proposal.  I think I now understand why you prefer Simon's
> >> proposal.
> >>
> >>   Regards,
> >>     Herbert
> >>
> >>
> >>
> >>
> >>
> >
> >>> One technical issue with Proposal P that has not been resolved is how
> >>> a CIF application is supposed to interpret the sequence
> >>> <backslash><double quote> when encountered in a string returned from
> >>> the parser.  Is this sequence:
> >>> (a) a terminator elide sequence that was left in a raw string, so
> >>> corresponds to <double quote>?
> >>> (b) something with meaning for the application so should be
> >>> <backslash><double quote>?
> >>>
> >>> Please therefore advise how a CIF application will disambiguate the
> >>> following string content from a Proposal P parser:
> >>>
> >>> <start> I have no idea what the last characters of this string
> are\"<finish>
> >>> <start> Does this string have two\""" or three internal quotes?<finish>
> >>>
> >>> James
> >>>
> >
> >
> >
> > --
> > T +61 (02) 9717 9907
> > F +61 (02) 9717 3145
> > M +61 (04) 0249 4148
> > _______________________________________________
> > ddlm-group mailing list
> > ddlm-group@iucr.org
> > http://scripts.iucr.org/mailman/listinfo/ddlm-group
> >
>
>
_______________________________________________
ddlm-group mailing list
ddlm-group@iucr.org
http://scripts.iucr.org/mailman/listinfo/ddlm-group

Reply to: [list | sender only]
International Union of Crystallography

Scientific Union Member of the International Science Council (admitted 1947). Member of CODATA, the ISC Committee on Data. Partner with UNESCO, the United Nations Educational, Scientific and Cultural Organization in the International Year of Crystallography 2014.

International Science Council Scientific Freedom Policy

The IUCr observes the basic policy of non-discrimination and affirms the right and freedom of scientists to associate in international scientific activity without regard to such factors as ethnic origin, religion, citizenship, language, political stance, gender, sex or age, in accordance with the Statutes of the International Council for Science.