[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [ddlm-group] Technical issues with Proposal P

To: Group finalising DDLm and associated dictionaries <[email protected]>
Subject: Re: [ddlm-group] Technical issues with Proposal P
From: "Herbert J. Bernstein" <[email protected]>
Date: Thu, 24 Feb 2011 08:50:54 -0500 (EST)
In-Reply-To: <[email protected]>
References: <[email protected]><[email protected]><[email protected]><[email protected]><[email protected]><[email protected]><[email protected]><[email protected]><[email protected]><[email protected]><[email protected]><[email protected]><[email protected]><a06240801c98a3e532621@[192.168.2.102]><[email protected]><[email protected]><alpine.BSF.2.00.1102240758270.2666 [email protected]><[email protected]>

Dear Simon,

   The Python cooked strings are something many people are familiar with.
Any use of the treble quote is something new to CIF, with implications
for both users and developers.  Use of the straight python versions should
reduce the learning curve for both communities and the costs of data
conversion for CIF 1.1 data to CIF2.  I don't deny that there can be
better ways to do the same thing.  This reminds me of when IBM came up
with a better keyboard for computers, shifting a few keys.  It drove 
everybody nuts, not because there was anything wrong with it, it just
was sufficiently different to slow down typing in creased the error
rate.  Somebody totally new to typing on a computer keyboard had
not problem, but it certainly was not worth the costs involved for people
who had established habits.

   Regards,
     Herbert

=====================================================
  Herbert J. Bernstein, Professor of Computer Science
    Dowling College, Kramer Science Center, KSC 121
         Idle Hour Blvd, Oakdale, NY, 11769

                  +1-631-244-3035
                  [email protected]
=====================================================

On Thu, 24 Feb 2011, SIMON WESTRIP wrote:

> The way I see it, by adopting Proposal P we will not be providing
> anything new in terms of raw strings (i.e. all other delimiters
> delimit raw strings) - rather we are giving people the opportunity to
> use 'cooked' strings. If this boils down to a matter of taste, I'm not
> convinced it justifies the potential confusion for users or the extra
> burden on developers.
> 
> Cheers
> 
> Simon
> 
> 
> __________________________________________________________________________________________
> From: Herbert J. Bernstein <[email protected]>
> To: Group finalising DDLm and associated dictionaries <[email protected]>
> Sent: Thursday, 24 February, 2011 13:02:08
> Subject: Re: [ddlm-group] Technical issues with Proposal P
> 
> Dear Simon,
> 
>   Yes, the closest approximation to the current line folding
> would be a cooked python style treble-quoted string.
> 
>   The main use for the raw strings is for people who don't
> like having to double-up backslashes to present things like
> TeX.  Not my taste, but some people like it, and there
> is no downside that I can see in giving them the capability.
> 
>   Regards,
>     Herbert
> 
> =====================================================
> Herbert J. Bernstein, Professor of Computer Science
>   Dowling College, Kramer Science Center, KSC 121
>         Idle Hour Blvd, Oakdale, NY, 11769
> 
>                 +1-631-244-3035
>                 [email protected]
> =====================================================
> 
> On Thu, 24 Feb 2011, SIMON WESTRIP wrote:
> 
> > A possible minor inconvenience of proposal P:
> >
> > Given that CIF strings are essentially 'raw' as CIF2 now stands (and CIF1 strings too
> can
> > be reconciled with the raw variant), and as I understand it python raw strings do not
> > support line continuation,
> > under proposal P a string will have to be 'cooked' in order to employ line folding?
> >
> > Please forgive me if these seems a little trivial, but I am really struggling to see any
> > benefit
> > in adopting proposal P, especially for the end user. Maybe someone can help by providing
> > an example where the use of cooked strings will make life easier for the end user?
> >
> > Cheers
> >
> > Simon
> >
> >_________________________________________________________________________________________
> _
> > From: SIMON WESTRIP <[email protected]>
> > To: Group finalising DDLm and associated dictionaries <[email protected]>
> > Sent: Wednesday, 23 February, 2011 13:06:15
> > Subject: Re: [ddlm-group] Technical issues with Proposal P
> >
> > I don't know if this will help much in making a choice about proposal P, but
> > it might be worth looking at current practice in the case of one CIF user group -
> > namely authors submitting CIFs to journals.
> >
> > In many respects an F-type scheme has been used for preparing the text sections of
> > CIFs for publication for many years. The backslash is used to escape accents and greek
> > letters
> > as well as itself, e.g. \"u for uumul, \a for alpha, \\a for \a...
> > In addition, we have the line-folding protocol, although that is rarely necessary and
> > very rarely applied manually. Although these 'common semantic features' will not be
> > a part of CIF2, they may well remain in use at the application level. Under scheme F,
> > a field containing this markup delimited by semicolons could readily be dropped
> > into a field delimited by tripple quotes (though double-backslash control sequences
> > would be returned as single-backslash control sequences - but fortunately there are few
> of
> > these in use, e.g. \\rightarrow...). Under a P-type proposal, extra care would be
> required
> > when choosing between the 'raw' and 'cooked' variants before dumping the contents of
> > a semi-colon delimited field into a tripple-quoted field.
> >
> > I've mentioned before that handling the transition from CIF1's 'common semantic
> features'
> > to CIF2 unicode will require care in any case; I've yet to be convinced that complex
> > python semantics will help here, nor offer any real benefit in general, given that by
> > only adopting them for one means of delimiting a data value, great care has to be taken
> > when
> > switching delimiters (and for no obvious reason or benefit if you're only concern when
> > working with a
> > raw CIF is to complete it for publication purposes).
> >
> > Cheers
> >
> > Simon
> >
> >_________________________________________________________________________________________
> _
> > From: Herbert J. Bernstein <[email protected]>
> > To: Group finalising DDLm and associated dictionaries <[email protected]>
> > Sent: Wednesday, 23 February, 2011 4:40:39
> > Subject: Re: [ddlm-group] Technical issues with Proposal P
> >
> > Dear James,
> >
> >   I don't see any reason to disagree with your summary of limitations
> > on the use of raw strings.  That is why I find it easier to use cooked
> > strings.  But there are people who like them, so, since, once you
> > have Python-style triple quotes at all, it is easy to support them,
> > I would be inclined to do so, rather than have a one-size fits all
> > solution.
> >
> >   The real question is whether to support the cooked Python strings
> > with all Python elides or the more limited set of elides in proposal
> > F.  I repeat my suggestion that people try working with both as
> > I have for years, and we can see which proves more or less confusing
> > in what situations.  I have always found the mixing of things
> > like TeX with its backslashes with the line folding protocol without
> > doubling up the TeX backslashes very confusing.  Maybe it is just the
> > way my head works.  Maybe other people will have a different view.
> > We won't know until people other than me try it.
> >
> >   Regards,
> >     Herbert
> >
> > At 1:35 PM +1100 2/23/11, James Hester wrote:
> > >I absolutely agree that it is up to the application to decide on the
> > >meanings of strings given to it, with reference to the dictionary.  I
> > >am very happy that we agree on this iron separation between syntax and
> > >content.
> > >
> > >In a situation that <backslash><delimiter> has meaning to a CIF
> > >application, it follows that you cannot in general use raw strings to
> > >express a data value that
> > >(i) contains both triple double quotes and triple single quotes
> > >(ii) contains triple double quotes and terminates with a single quote
> > >(iii) contains triple single quotes and terminates with a double quote
> > >
> > >Furthermore, you cannot use triple-double-quote delimited raw strings
> > >to delimit a string terminating in a double quote and/or containing
> > >triple double quotes, likewise for single quote.
> > >
> > >These idiosyncracies would need to be documented if we were to adopt
> > >Proposal P, and we would need to be confident that CIF2 implementers
> > >and users would not make inadvertent errors in their selection of
> > >quotes and string types. As the manifestation of an error would
> > >typically be nothing more than a stray accent, there is no way, beyond
> > >careful proof-reading, that such mistakes would be caught.
> > >
> > >Proposals F and F' give less opportunity for error and are simpler to use.
> > >
> > >On Wed, Feb 23, 2011 at 11:27 AM, Herbert J. Bernstein
> > ><[email protected]> wrote:
> > >>  Dear James,
> > >>
> > >>  I am really lost here.  I believe it is up to the application
> > >>  to decide on the meaning of strings given as tag value, hopefully
> > >>  using a dictionary to inform that decision.  I really don't
> > >>  see what the use of r""" versus """ versus ; versus ' versus "
> > >>  to get the string into its internal form has to do with its
> > >>  meaning to the application, unless the application is one
> > >>  of these CIF copy/transform applications that violate some
> > >>  of the CIF rules to see through to the original represenatation
> > >>  rather than stopping with the data, and then we are moving outside the
> > >>  rules of CIF itself to more general text processing.
> > >>
> > >>  As I said, one of the nicer uses for the raw treble quote strings
> > >>  is to bring TeX into an application without having to double-up
> > >>  backslashes.  That is a very clear case in which the application
> > >>  have very different backslash processing than Python and you
> > >>  want to suppress most of the Python processing.  If what you
> > >>  are trying to do is to elide the quote marks, then you will have
> > >>  an easier time using the regular treble quotes.
> > >>
> > >>  Regards,
> > >>    Herbert
> > >>  =====================================================
> > >>  Herbert J. Bernstein, Professor of Computer Science
> > >>    Dowling College, Kramer Science Center, KSC 121
> > >>        Idle Hour Blvd, Oakdale, NY, 11769
> > >>
> > >>                  +1-631-244-3035
> > >  >                [email protected]
> > >  > =====================================================
> > >>
> > >>  On Wed, 23 Feb 2011, James Hester wrote:
> > >>
> > >>>  Dear Herbert,
> > >>>
> > >>>  Because raw strings must retain any eliding backslashes in the string
> > >>>  (unlike cooked strings), a backslash in the internal string
> > >>>  representation may indeed be an artefact of the syntax proposed in
> > >>>  Proposal P.  Or might not.  The application can't always tell. See my
> > >>>  other email for a way to resolve this.
> > >>>
> > >>>  If everything is so clear, could you please just answer the following
> > >>>  rephrased questions? "The CIF application" refers to an application
> > >>>  for which <backslash><delimiter> means "accent the letter preceding
> > >>>  the backslash".
> > >>>
> > >>>  Should the CIF application interpret the first string as finishing
> > >>>  with a double quote, or with an accented e?
> > >>>  Should the CIF application interpret the second string as containing
> > >>>  an accented o, followed by two double quotes, or a letter o followed
> > >>>  by three quotes?
> > >>>
> > >>>  On Wed, Feb 23, 2011 at 10:16 AM, Herbert J. Bernstein
> > >>>  <[email protected]> wrote:
> > >>>>
> > >>>>  Dear James,
> > >>>>
> > >>>>  I still don't understand. Neither python nor I think \"
> > >>>>  from a raw string is an artifact of anything.  It is
> > >>>>  just a backslash followed by a double quotemark.  The
> > >>>>  point of the raw string is to provide a quick and
> > >>>>  convenient way to input something like TeX without
> > >>>>  having to double-up the backsashes.  Personally, I am
> > >>>>  happy to double up the backslashes, but I can see the
> > >>>>  value to people who have to deal with lots of TeX in
> > >>>>  not needing to do so.
> > >>>>
> > >>>>>  Does the first string finish with a double quote, or with an accented e?
> > >>>>>  Does the second string contain an accented o, followed by two double
> > >>>>>  quotes, or a letter o followed by three quotes?
> > >>>>
> > >>>>  are not questions related to the quoting mechanism used, but
> > >>>>  purely to the application.  Working purely in CIF1.1 all
> > >>>>  of the following are equivalent, external representations:
> > >>>>
> > >>>>  Set 1
> > >>>>  ;\
> > >>>>  I have no idea what the last characters of this string are\"\
> > >>>>  ;
> > >>>>  'I have no idea what the last characters of this string are\"'
> > >>>>  "I have no idea what the last characters of this string are\""
> > >>>>
> > >>>>  and in all cases the last 2 characters are backslash followed by
> > >>>>  double quote
> > >>>>
> > >>>>  Set 2
> > >>>>  ;\
> > >>>>  Does this string have two\""" or three internal quotes?\
> > >>>>  ;
> > >>>>  'Does this string have two\""" or three internal quotes?'
> > >>>>
> > >>>>  and in both cases there are three internal quotes
> > >>>>
> > >>>>  I don't see how this differs in any way from
> > >>>>
> > >>>>  r'''I have no idea what the last characters of this string are\"'''
> > >>>>  or
> > >>>>  '''I have no idea what the last characters of this string are\\"'''
> > >>>>  or
> > >>>>  """I have no idea what the last characters of this string are\\\""""
> > >>>>
> > >>>>  and
> > >>>>
> > >>>>  r'''Does this string have two\""" or three internal quotes?'''
> > >>>>  or
> > >>>>  '''Does this string have two\\""" or three internal quotes?'''
> > >>>>  or
> > >>>>  """Does this string have two\\\"\"\" or three internal quotes?"""
> > >>>>
> > >>>>  There are very real problems with the raw string that are noted
> > >>>>  in the Pyhton documentation, but they do have their uses.  This
> > >>>>  ambiguity is not one of the problems.
> > >>>>
> > >>>>  Regards,
> > >>>>  Herbert
> > >>>>
> > >>>>  =====================================================
> > >>>>  Herbert J. Bernstein, Professor of Computer Science
> > >>>>    Dowling College, Kramer Science Center, KSC 121
> > >>>>        Idle Hour Blvd, Oakdale, NY, 11769
> > >>>>
> > >>>>                  +1-631-244-3035
> > >>>>                  [email protected]
> > >>>>  =====================================================
> > >>>>
> > >>>>  On Wed, 23 Feb 2011, James Hester wrote:
> > >>>>
> > >>>>>  I am trying to focus relentlessly on a particular and very real
> > >>>>>  technical issue.  I repeat that I am not concerned about the
> > >>>>>  transformation from surface syntax to a sequence of characters.  I
> > >>>>>  accept that that is well-defined and unambiguous for all proposals on
> > >>>>>  the table.  If you think that IDLE can resolve this problem, you
> > >>>>>  haven't understood my question.
> > >>>>>
> > >>>>>  My question relates to the next step: how does the CIF application
> > >  >>>> downstream from the parser interpret this sequence of characters?
> > >>>>>  Under all previous incarnations of CIF, it was safe to assume that no
> > >>>>>  artefacts of syntactical representation were left in the string, so
> > >>>>>  the string had purely domain-specific meaning.  However, with the
> > >>>>>  introduction of raw strings, <backslash><delimiter> will escape the
> > >>>>>  delimiter, but the <backslash> is required to remain in the string.
> > >>>>>  So the downstream application must decide between artefacts of the
> > >>>>>  syntactical representation (<backslash><delimiter>) that have remained
> > >>>>>  in raw strings, and domain-specific character sequences
> > >>>>>  (<backslash><delimiter>).  Here those examples are again (remember
> > >>>>>  this is the character sequence after syntactic processing):
> > >>>>>
> > >>>>>  <start> I have no idea what the last characters of this string
> > >>>>>  are\"<finish>
> > >>>>>  <start> Does this string have two\""" or three internal quotes?<finish>
> > >>>>>
> > >>>>>  Assume the domain-specific meaning of <backslash><quote> when found in
> > >>>>>  a datavalue is to accent the letter preceding the <backslash>.
> > >>>>>
> > >>>>>  Does the first string finish with a double quote, or with an accented e?
> > >>>>>  Does the second string contain an accented o, followed by two double
> > >>>>>  quotes, or a letter o followed by three quotes?
> > >>>>>
> > >>>>>
> > >>>>>  On Wed, Feb 23, 2011 at 8:01 AM, SIMON WESTRIP
> > >>>>>  <[email protected]> wrote:
> > >>>>>>
> > >>>>>>  Dear all
> > >>>>>>
> > >>>>>>  Reviewing the exchanges in this thread ("Technical issues with Proposal
> > >>>>>>  P"),
> > >>>>>>  it seems that
> > >>>>>>  the 'technical issues' might better be described as 'potentially
> > >>>>>>  confusing
> > >>>>>>  issues' :-)
> > >>>>>>  That is, under proposal P, there is no ambiguity about how the string
> > >>>>>>  should
> > >>>>>>  be read, but
> > >>>>>>  there is potential for misinterpretation by the user (e.g. an erroneous
> > >>>>>>  assumption that by using a backslash
> > >>>>>>  to escape a quotation mark, the backslash will not be included as part
> > >>>>>>  of
> > >>>>>>  the parsed data value (in the raw variant)).
> > >>>>>>  So, as John says, perhaps this simply demonstrates that "the complexity
> > >>>>>>  of
> > >>>>>>  the syntax and semantics
> > >>>>>>  provided by proposal P would be likely to be a source of confusion for
> > >>>>>>  developers and users both", and maybe
> > >>>>>>  therein lies the merit of this particular thread? It reinforces those
> > >>>>>>  arguements against proposal P that suggest
> > >>>>>>  that the introduction of a more complex syntax for one of the delimiter
> > >>>>>>  types is a potential source of
> > >>>>>>  confusion for many existing CIF users.
> > >>>>>>
> > >>>>>>  Cheers
> > >>>>>>
> > >>>>>>  Simon
> > >>>>>>  ________________________________
> > >>>>>>  From: Herbert J. Bernstein <[email protected]>
> > >>>>>>  To: Group finalising DDLm and associated dictionaries
> > >>>>>>  <[email protected]>
> > >>>>>>  Sent: Tuesday, 22 February, 2011 20:22:57
> > >>>>>>  Subject: Re: [ddlm-group] Technical issues with Proposal P
> > >>>>>>
> > >>>>>>  Dear Simon,
> > >>>>>>
> > >>>>>>    I make mistakes on this, too.  That is why I like having IDLE
> > >>>>>>  handy and sticking to Python syntax.
> > >>>>>>
> > >>>>>>    Regards,
> > >>>>>>      Herbert
> > >>>>>>
> > >>>>>>  =====================================================
> > >>>>>>  Herbert J. Bernstein, Professor of Computer Science
> > >>>>>>    Dowling College, Kramer Science Center, KSC 121
> > >>>>>>          Idle Hour Blvd, Oakdale, NY, 11769
> > >>>>>>
> > >>>>>>                  +1-631-244-3035
> > >>>>>>                  [email protected]
> > >>>>>>  =====================================================
> > >>>>>>
> > >>>>>>  On Tue, 22 Feb 2011, SIMON WESTRIP wrote:
> > >>>>>>
> > >>>>>>>  Dear Herbert - I've just realized I confused myself by misreading your
> > >>>>>>>  example
> > >>>>>>>  and treating it as equivalent to my own example! Sorry about that.
> > >>>>>>>
> > >>>>>>>  Cheers
> > >>>>>>>
> > >>>>>>>  Simon
> > >>>>>>>
> > >>>>>>>
> > >>>>>>>
> > >>>>>>>
> > >>>>>>>
> > >>>>>>>
> >>>>>>>>__________________________________________________________________________________
> 
> > _____________________________________
> > >>>>>>>  From: SIMON WESTRIP <[email protected]>
> > >>>>>>>  To: Group finalising DDLm and associated dictionaries
> > >>>>>>>  <[email protected]>
> > >>>>>>>  Sent: Tuesday, 22 February, 2011 14:51:03
> > >>>>>>>  Subject: Re: [ddlm-group] Technical issues with Proposal P
> > >  >>>>>>
> > >>>>>>>  Dear Herbert
> > >>>>>>>
> > >>>>>>>  I'm still a bit confused. Following python semantics,
> > >>>>>>>  a CIF application reading the following items
> > >>>>>>>
> > >>>>>>>  _item_a """C\""""
> > >>>>>>>  _item_b r"""C\""""
> > >>>>>>>
> > >>>>>>>  should return values of
> > >>>>>>>
> > >>>>>>>  C" for _item_a
> > >>>>>>>  C\" for _item_b
> > >>>>>>>
> > >>>>>>>  Are you suggesting that the application should then *assume* that in
> > >>>>>>>  the
> > >>>>>>>  case of
> > >>>>>>>  _item_b the use of the backslash was purely to escape the final quote
> > >>>>>>>  and
> > >>>>>>>  should
> > >>>>>>>  discard the backslash from the value, thus assuming a value of C" ?
> > >>>>>>>
> > >>>>>>>  Cheers
> > >>>>>>>
> > >>>>>>>  Simon
> > >>>>>>>
> > >>>>>>>
> > >>>>>>>
> > >>>>>>>
> > >>>>>>>
> >>>>>>>>__________________________________________________________________________________
> 
> > _____________________________________
> > >>>>>>>  From: Herbert J. Bernstein <[email protected]>
> > >>>>>>>  To: Group finalising DDLm and associated dictionaries
> > >>>>>>>  <[email protected]>
> > >>>>>>>  Sent: Tuesday, 22 February, 2011 13:51:02
> > >>>>>>>  Subject: Re: [ddlm-group] Technical issues with Proposal P
> > >>>>>>>
> > >>>>>>>  Dear Simon,
> > >>>>>>>
> > >>>>>>>    From the point of view of writing a pure "CIF2" application
> > >>>>>>>  that is not aware of the whitespace, particular quote marks,
> > >>>>>>>  comments, etc, those two string are identical.
> > >>>>>>>
> > >>>>>>>    From the point of view of a more general CIF API, in which
> > >>>>>>>  comments, magic numbers, and partiular quote marks, those
> > >>>>>>>  two string are different in precisely the same way that
> > >>>>>>>  the string 'ABC' and "ABC" are different, and 13.4 and
> > >>>>>>>  1.34e1 are different.
> > >>>>>>>
> > >>>>>>>    This is _not_ an ambiguity.  It is a matter of whether
> > >>>>>>>  we are looking for the information in a file or looking
> > >>>>>>>  for the representations of the data in the file.
> > >>>>>>>
> > >>>>>>>    Regards,
> > >>>>>>>      Herbert
> > >>>>>>>
> > >>>>>>>
> > >>>>>>>  =====================================================
> > >>>>>>>  Herbert J. Bernstein, Professor of Computer Science
> > >>>>>>>    Dowling College, Kramer Science Center, KSC 121
> > >>>>>>>          Idle Hour Blvd, Oakdale, NY, 11769
> > >>>>>>>
> > >>>>>>>                  +1-631-244-3035
> > >>>>>>>                  [email protected]
> > >>>>>>>  =====================================================
> > >>>>>>>
> > >>>>>>>  On Tue, 22 Feb 2011, SIMON WESTRIP wrote:
> > >>>>>>>
> > >>>>>>>>  So
> > >>>>>>>>  """\\\"""" and r"""\""""
> > >>>>>>>>  should strictly be treated as different, despite any recommendations
> > >>>>>>>>  you
> > >>>>>>>>  may
> > >>>>>>>>  have made to the contrary?
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>>____________________________________________________________________________
> > >>>>>>>>  From: Herbert J. Bernstein <[email protected]>
> > >>>>>>>>  To: Group finalising DDLm and associated dictionaries
> > >>>>>>>>  <[email protected]>
> > >>>>>>>>  Sent: Tuesday, 22 February, 2011 12:46:57
> > >>>>>>>>  Subject: Re: [ddlm-group] Technical issues with Proposal P
> > >>>>>>>>
> > >>>>>>>>>  So what is r"""C\"""" ?
> > >>>>>>>>>
> > >>>>>>>>>  Is it C\" or is it C" ?
> > >>>>>>>>
> > >>>>>>>>  """C\"""" is C"
> > >>>>>>>>
> > >>>>>>>>  r"""C\"""" is C\"
> > >>>>>>>>
> > >>>>>>>>  You can test this with IDLE.  It is very clearly defined and
> > >>>>>>>>  reproducible Python string behavior, and I believe helps to make
> > >>>>>>>>  the case for sticking to the Python approach.  It is very easy
> > >>>>>>>>  for any software developer or user to work out how the boundary
> > >>>>>>>>  cases are being handled.
> > >>>>>>>>
> > >>>>>>>>  Regards,
> > >>>>>>>>    Herbert
> > >>>>>>>>
> > >>>>>>>>  =====================================================
> > >>>>>>>>  Herbert J. Bernstein, Professor of Computer Science
> > >>>>>>>>    Dowling College, Kramer Science Center, KSC 121
> > >>>>>>>>          Idle Hour Blvd, Oakdale, NY, 11769
> > >>>>>>>>
> > >>>>>>>>                  +1-631-244-3035
> > >>>>>>>>                  [email protected]
> > >>>>>>>>  =====================================================
> > >>>>>>>>
> > >>>>>>>>  On Tue, 22 Feb 2011, SIMON WESTRIP wrote:
> > >>>>>>>>
> > >>>>>>>>>  I am a little confused:
> > >>>>>>>>>
> > >>>>>>>>>  So what is r"""C\"""" ?
> > >>>>>>>>>
> > >>>>>>>>>  Is it C\" or is it C" ?
> > >>>>>>>>>
> > >>>>>>>>>  Python says it should be C\", so CIF2 should say its C\" if CIF2 is
> > >>>>>>>>
> > >>>>>>>>  adopting
> > >>>>>>>>>
> > >>>>>>>>>  Python?
> > >>>>>>>>>
> > >>>>>>>>>  Or are you suggesting that we should adopt a fuzzy interpretation of
> > >  >>>>>>>
> > >>>>>>>>  Python?
> > >>>>>>>>>
> > >>>>>>>>>  Cheers
> > >>>>>>>>>
> > >>>>>>>>>  Simon
> > >>>>>>>>>
> > >>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>>>>___________________________________________________________________________
> > >>>>>>>>
> > >>>>>>>>  _
> > >>>>>>>>>
> > >>>>>>>>>  From: Herbert J. Bernstein <[email protected]>
> > >>>>>>>>>  To: Group finalising DDLm and associated dictionaries
> > >>>>>>>>
> > >>>>>>>>  <[email protected]>
> > >>>>>>>>>
> > >>>>>>>>>  Sent: Tuesday, 22 February, 2011 12:01:23
> > >>>>>>>>>  Subject: Re: [ddlm-group] Technical issues with Proposal P
> > >>>>>>>>>
> > >>>>>>>>>  Dear Colleagues,
> > >>>>>>>>>
> > >>>>>>>>>    Working under the assumption of Ralf's proposal, rather
> > >>>>>>>>>  than Simon's, we have several very distinct string presentaions
> > >>>>>>>>>  to consider:  a (non-raw) treble quoted string, a raw treble
> > >>>>>>>>>  quoted string a unicode treble quoted string and a raw unicode
> > >>>>>>>>>  treble quoted string.  As for Python 3, under CIF2, because
> > >>>>>>>>>  the "native" character encoding is UTF-8, under reasonable coding
> > >>>>>>>>>  constraints, this collapses to just two cases the application
> > >>>>>>>>>  needs to deal with:  non-raw (i.e. cooked) versus raw.  The intent
> > >>>>>>>>>  of
> > >>>>>>>>>  the cooked is for the lexer to process the elides, so the response
> > >>>>>>>>>  I gave is, I believe correct -- just push the string through IDLE.
> > >>>>>>>>>  The intent of the raw is precisely to push through the string
> > >>>>>>>>>  with the backslahes still in place, e.g. for TeX text in which
> > >>>>>>>>>  you don't want to double-up your backslashes.  While I personally
> > >>>>>>>>>  would recommend against such a use of raw, it is not ambiguous.
> > >>>>>>>>>  It gives the application a very well-defined string of characters
> > >>>>>>>>>  to deal with.  Yes, there are applications that are intended to
> > >>>>>>>>>  deal with CIF with the encoding exposed (e.g. cif2cbf, cif2cif,
> > >>>>>>>>>  etc.)
> > >>>>>>>>>  bit, I agree that the cleanest design is for an application to
> > >>>>>>>>>  only make use of the string content, not the representation.
> > >>>>>>>>>
> > >>>>>>>>>    Thus, for most applications, I would recommend that they treat
> > >>>>>>>>>
> > >>>>>>>>>    """\\\"""" and r"""\""""
> > >>>>>>>>>
> > >>>>>>>>>  as equivalent, but for applications that are, for example,
> > >>>>>>>>>  intended to do faithful copies of the representations that
> > >>>>>>>>>  they treat them as different.
> > >>>>>>>>>
> > >>>>>>>>>    We have had, and will continue to have this subtle problem
> > >>>>>>>>>  with all versions of CIF in the handling of things such as
> > >>>>>>>>>  magic number, comments, white space, line folding, and choices
> > >>>>>>>>>  of quoting characters.  I don't see how the introduction of
> > >>>>>>>>>  the Python treble quote makes the situation any worse or
> > >>>>>>>>>  any more or less ambiguous.
> > >>>>>>>>>
> > >>>>>>>>>    Regards,
> > >>>>>>>>>      Herbert
> > >>>>>>>>>
> > >>>>>>>>>  =====================================================
> > >>>>>>>>>    Herbert J. Bernstein, Professor of Computer Science
> > >>>>>>>>>      Dowling College, Kramer Science Center, KSC 121
> > >>>>>>>>>          Idle Hour Blvd, Oakdale, NY, 11769
> > >>>>>>>>>
> > >>>>>>>>>                    +1-631-244-3035
> > >>>>>>>>>                    [email protected]
> > >>>>>>>>>  =====================================================
> > >>>>>>>>>
> > >>>>>>>>>  On Tue, 22 Feb 2011, James Hester wrote:
> > >>>>>>>>>
> > >>>>>>>>>>  I will focus this email on the technical issues and try to return
> > >>>>>>>>>>  to
> > >>>>>>>>>>  the other issues at a later date (I've changed the subject
> > >>>>>>>>>>  accordingly)
> > >>>>>>>>>>
> > >>>>>>>>>>  [edit]
> > >>>>>>>>>>
> > >>>>>>>>>>  My apologies for not being clear: my examples of embedded elides
> > >>>>>>>>>>  already give the internal representation of the strings,
> > >>>>>>>>>>  deliberately
> > >>>>>>>>>>  leaving out the particular delimiters that might have been used to
> > >>>>>>>>>>  produce those strings.  Herbert mistakenly thought I was giving
> > >>>>>>>>>>  triple-double-quote delimited strings and asking what the internal
> > >>>>>>>>>>  representation was. So, unfortunately, IDLE cannot help here, as
> > >>>>>>>>>>  the
> > >>>>>>>>>>  internal representation is not in question.
> > >>>>>>>>>>
> > >>>>>>>>>>  My question therefore remains: how does the CIF application
> > >>>>>>>>>>  interpret
> > >>>>>>>>>>  these strings? Is the <backslash><delimiter> in my examples simply
> > >  >>>>>>>>> an
> > >>>>>>>>>>  elide that could not be removed from a raw string and therefore
> > >>>>>>>>>>  should
> > >>>>>>>>>>  be ignored, or is it a character sequence intended for the
> > >>>>>>>>>>  application
> > >>>>>>>>>>  (eg a LaTeX accent on the o or e)?
> > >>>>>>>>>>
> > >>>>>>>>>>  In your answer you may assume that the CIF application knows that
> > >>>>>>>>>>  the
> > >>>>>>>>>>  string was a raw string delimited by triple double quotes (even
> > >>>>>>>>>>  though
> > >>>>>>>>>>  requiring communication of such information would be a very
> > >>>>>>>>>>  unfortunate loss of clean design).
> > >>>>>>>>>>
> > >>>>>>>>>>  Those strings again:
> > >>>>>>>>>>
> > >>>>>>>>>>  <start> I have no idea what the last characters of this string
> > >>>>>>>>>
> > >>>>>>>>>  are\"<finish>
> > >>>>>>>>>>
> > >>>>>>>>>>  <start> Does this string have two\""" or three internal
> > >>>>>>>>>>  quotes?<finish>
> > >>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>>>>  Herbert writes:
> > >>>>>>>>>>>
> > >>>>>>>>>>>  Now for your two examples of embedded elides of quotes:
> > >>>>>>>>>>>
> > >>>>>>>>>>> <start> I have no idea what the last characters of this string
> > >>>>>>>>>
> > >>>>>>>>>  are\"<finish>
> > >>>>>>>>>>>
> > >>>>>>>>>>> is, internally, as a C-string
> > >>>>>>>>>>>
> > >>>>>>>>>>> I have no idea what the last characters of this string are"\0
> > >>>>>>>>>>>
> > >>>>>>>>>>> <start> Does this string have two\""" or three internal
> > >>>>>>>>>>> quotes?<finish>
> > >>>>>>>>>>>
> > >>>>>>>>>>> is, internally as a C-string
> > >>>>>>>>>>>
> > >>>>>>>>>>> Does this string have two""" or three internal quotes?\0
> > >>>>>>>>>>>
> > >>>>>>>>>>> I settled that by simply cranking up IDLE and doing:
> > >>>>>>>>>>>
> > >>>>>>>>>>>>>>  print """I have no idea what the last characters of this
> > >>>>>>>>>>>>>> string
> > >>>>>>>>>>>>>> are\"""" I have no idea what the last characters of this string
> > >>>>>>>>>>>>>> are" >>> print """Does this string have two\""" or three
> > >>>>>>>>>>>>>> internal
> > >>>>>>>>>>>>>> quotes?""" Does this string have two""" or three internal
> > >>>>>>>>>>>>>> quotes?
> > >>>>>>>>>>>
> > >>>>>>>>>>> As you well know, having IDLE around is a big help.
> > >>>>>>>>>>>
> > >>>>>>>>>>>  Thank you again for taking the time to clarify your position
> > >>>>>>>>>>> on Ralf's proposal.  I think I now understand why you prefer
> > >>>>>>>>>>> Simon's
> > >>>>>>>>>>> proposal.
> > >>>>>>>>>>>
> > >>>>>>>>>>>  Regards,
> > >>>>>>>>>>>    Herbert
> > >>>>>>>>>>>
> > >>>>>>>>>>>
> > >>>>>>>>>>>
> > >>>>>>>>>>>
> > >>>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>>>>>> One technical issue with Proposal P that has not been resolved is
> > >>>>>>>>>>>> how
> > >>>>>>>>>>>> a CIF application is supposed to interpret the sequence
> > >>>>>>>>>>>> <backslash><double quote> when encountered in a string returned
> > >>>>>>>>>>>> from
> > >>>>>>>>>>>> the parser.  Is this sequence:
> > >>>>>>>>>>>> (a) a terminator elide sequence that was left in a raw string, so
> > >>>>>>>>>>>> corresponds to <double quote>?
> > >>>>>>>>>>>> (b) something with meaning for the application so should be
> > >>>>>>>>>>>> <backslash><double quote>?
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> Please therefore advise how a CIF application will disambiguate
> > >>>>>>>>>>>> the
> > >>>>>>>>>>>> following string content from a Proposal P parser:
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> <start> I have no idea what the last characters of this string
> > >>>>>>>>>
> > >>>>>>>>>  are\"<finish>
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> <start> Does this string have two\""" or three internal
> > >>>>>>>>
> > >>>>>>>>  quotes?<finish>
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> James
> > >>>>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>>>>  --
> > >>>>>>>>>>  T +61 (02) 9717 9907
> > >>>>>>>>>>  F +61 (02) 9717 3145
> > >>>>>>>>>>  M +61 (04) 0249 4148
> > >>>>>>>>>>  _______________________________________________
> > >>>>>>>>>>  ddlm-group mailing list
> > >>>>>>>>>>  [email protected]
> > >>>>>>>>>>  http://scripts.iucr.org/mailman/listinfo/ddlm-group
> > >>>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>
> > >>>>>>>
> > >>>>>>  _______________________________________________
> > >>>>>>  ddlm-group mailing list
> > >>>>>>  [email protected]
> > >>>>>>  http://scripts.iucr.org/mailman/listinfo/ddlm-group
> > >>>>>>
> > >>>>>>
> > >>>>>
> > >>>>>
> > >>>>>
> > >>>>>  --
> > >>>>>  T +61 (02) 9717 9907
> > >>>>>  F +61 (02) 9717 3145
> > >>>>>  M +61 (04) 0249 4148
> > >>>>>  _______________________________________________
> > >>>>>  ddlm-group mailing list
> > >>>>>  [email protected]
> > >>>>>  http://scripts.iucr.org/mailman/listinfo/ddlm-group
> > >>>>
> > >>>>  _______________________________________________
> > >  >>> ddlm-group mailing list
> > >>>>  [email protected]
> > >>>>  http://scripts.iucr.org/mailman/listinfo/ddlm-group
> > >>>>
> > >>>>
> > >>>
> > >>>
> > >>>
> > >>>  --
> > >>>  T +61 (02) 9717 9907
> > >>>  F +61 (02) 9717 3145
> > >>>  M +61 (04) 0249 4148
> > >>>  _______________________________________________
> > >>>  ddlm-group mailing list
> > >>>  [email protected]
> > >>>  http://scripts.iucr.org/mailman/listinfo/ddlm-group
> > >>
> > >>  _______________________________________________
> > >>  ddlm-group mailing list
> > >>  [email protected]
> > >>  http://scripts.iucr.org/mailman/listinfo/ddlm-group
> > >>
> > >>
> > >
> > >
> > >
> > >--
> > >T +61 (02) 9717 9907
> > >F +61 (02) 9717 3145
> > >M +61 (04) 0249 4148
> > >_______________________________________________
> > >ddlm-group mailing list
> > >[email protected]
> > >http://scripts.iucr.org/mailman/listinfo/ddlm-group
> >
> >
> > --
> > =====================================================
> >   Herbert J. Bernstein, Professor of Computer Science
> >     Dowling College, Kramer Science Center, KSC 121
> >         Idle Hour Blvd, Oakdale, NY, 11769
> >
> >                   +1-631-244-3035
> >                   [email protected]
> > =====================================================
> > _______________________________________________
> > ddlm-group mailing list
> > [email protected]
> > http://scripts.iucr.org/mailman/listinfo/ddlm-group
> >
> >
> 
>

_______________________________________________
ddlm-group mailing list
[email protected]
http://scripts.iucr.org/mailman/listinfo/ddlm-group

Reply to: [list | sender only]

Follow-Ups:

Re: [ddlm-group] Technical issues with Proposal P. . (Bollinger, John C)

Re: [ddlm-group] Technical issues with Proposal P (SIMON WESTRIP)

References:

[ddlm-group] Technical issues with Proposal P (James Hester)

Re: [ddlm-group] Technical issues with Proposal P (Herbert J. Bernstein)

Re: [ddlm-group] Technical issues with Proposal P (SIMON WESTRIP)

Re: [ddlm-group] Technical issues with Proposal P (Herbert J. Bernstein)

Re: [ddlm-group] Technical issues with Proposal P (SIMON WESTRIP)

Re: [ddlm-group] Technical issues with Proposal P (SIMON WESTRIP)

Re: [ddlm-group] Technical issues with Proposal P (Herbert J. Bernstein)

Re: [ddlm-group] Technical issues with Proposal P (SIMON WESTRIP)

Re: [ddlm-group] Technical issues with Proposal P (James Hester)

Re: [ddlm-group] Technical issues with Proposal P (Herbert J. Bernstein)

Re: [ddlm-group] Technical issues with Proposal P (James Hester)

Re: [ddlm-group] Technical issues with Proposal P (Herbert J. Bernstein)

Re: [ddlm-group] Technical issues with Proposal P (James Hester)

Re: [ddlm-group] Technical issues with Proposal P (Herbert J. Bernstein)

Re: [ddlm-group] Technical issues with Proposal P (SIMON WESTRIP)

Re: [ddlm-group] Technical issues with Proposal P (SIMON WESTRIP)

Re: [ddlm-group] Technical issues with Proposal P (SIMON WESTRIP)

Prev by Date: Re: [ddlm-group] Technical issues with Proposal P

Next by Date: Re: [ddlm-group] Technical issues with Proposal P

Prev by thread: Re: [ddlm-group] Technical issues with Proposal P

Next by thread: Re: [ddlm-group] Technical issues with Proposal P

Index(es):

Date

Thread

Discussion List Archives

Re: [ddlm-group] Technical issues with Proposal P