[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [ddlm-group] Technical issues with Proposal P

To: Group finalising DDLm and associated dictionaries <[email protected]>
Subject: Re: [ddlm-group] Technical issues with Proposal P
From: "Herbert J. Bernstein" <[email protected]>
Date: Thu, 24 Feb 2011 08:02:08 -0500 (EST)
In-Reply-To: <[email protected]>
References: <[email protected]><[email protected]><[email protected]><[email protected]><[email protected]><[email protected]><[email protected]><[email protected]><[email protected]><[email protected]><[email protected]><[email protected]><[email protected]><[email protected]><[email protected]><a06240801c98a3e532621@[192.168.2.102]><[email protected]><[email protected]>

Dear Simon,

   Yes, the closest approximation to the current line folding
would be a cooked python style treble-quoted string.

   The main use for the raw strings is for people who don't
like having to double-up backslashes to present things like
TeX.  Not my taste, but some people like it, and there
is no downside that I can see in giving them the capability.

   Regards,
     Herbert

=====================================================
  Herbert J. Bernstein, Professor of Computer Science
    Dowling College, Kramer Science Center, KSC 121
         Idle Hour Blvd, Oakdale, NY, 11769

                  +1-631-244-3035
                  [email protected]
=====================================================

On Thu, 24 Feb 2011, SIMON WESTRIP wrote:

> A possible minor inconvenience of proposal P:
> 
> Given that CIF strings are essentially 'raw' as CIF2 now stands (and CIF1 strings too can
> be reconciled with the raw variant), and as I understand it python raw strings do not
> support line continuation,
> under proposal P a string will have to be 'cooked' in order to employ line folding?
> 
> Please forgive me if these seems a little trivial, but I am really struggling to see any
> benefit
> in adopting proposal P, especially for the end user. Maybe someone can help by providing
> an example where the use of cooked strings will make life easier for the end user?
> 
> Cheers
> 
> Simon
> 
> __________________________________________________________________________________________
> From: SIMON WESTRIP <[email protected]>
> To: Group finalising DDLm and associated dictionaries <[email protected]>
> Sent: Wednesday, 23 February, 2011 13:06:15
> Subject: Re: [ddlm-group] Technical issues with Proposal P
> 
> I don't know if this will help much in making a choice about proposal P, but
> it might be worth looking at current practice in the case of one CIF user group -
> namely authors submitting CIFs to journals.
> 
> In many respects an F-type scheme has been used for preparing the text sections of
> CIFs for publication for many years. The backslash is used to escape accents and greek
> letters
> as well as itself, e.g. \"u for uumul, \a for alpha, \\a for \a...
> In addition, we have the line-folding protocol, although that is rarely necessary and
> very rarely applied manually. Although these 'common semantic features' will not be
> a part of CIF2, they may well remain in use at the application level. Under scheme F,
> a field containing this markup delimited by semicolons could readily be dropped
> into a field delimited by tripple quotes (though double-backslash control sequences
> would be returned as single-backslash control sequences - but fortunately there are few of
> these in use, e.g. \\rightarrow...). Under a P-type proposal, extra care would be required
> when choosing between the 'raw' and 'cooked' variants before dumping the contents of
> a semi-colon delimited field into a tripple-quoted field.
> 
> I've mentioned before that handling the transition from CIF1's 'common semantic features'
> to CIF2 unicode will require care in any case; I've yet to be convinced that complex
> python semantics will help here, nor offer any real benefit in general, given that by
> only adopting them for one means of delimiting a data value, great care has to be taken
> when
> switching delimiters (and for no obvious reason or benefit if you're only concern when
> working with a
> raw CIF is to complete it for publication purposes).
> 
> Cheers
> 
> Simon
> 
> __________________________________________________________________________________________
> From: Herbert J. Bernstein <[email protected]>
> To: Group finalising DDLm and associated dictionaries <[email protected]>
> Sent: Wednesday, 23 February, 2011 4:40:39
> Subject: Re: [ddlm-group] Technical issues with Proposal P
> 
> Dear James,
> 
>   I don't see any reason to disagree with your summary of limitations
> on the use of raw strings.  That is why I find it easier to use cooked
> strings.  But there are people who like them, so, since, once you
> have Python-style triple quotes at all, it is easy to support them,
> I would be inclined to do so, rather than have a one-size fits all
> solution.
> 
>   The real question is whether to support the cooked Python strings
> with all Python elides or the more limited set of elides in proposal
> F.  I repeat my suggestion that people try working with both as
> I have for years, and we can see which proves more or less confusing
> in what situations.  I have always found the mixing of things
> like TeX with its backslashes with the line folding protocol without
> doubling up the TeX backslashes very confusing.  Maybe it is just the
> way my head works.  Maybe other people will have a different view.
> We won't know until people other than me try it.
> 
>   Regards,
>     Herbert
> 
> At 1:35 PM +1100 2/23/11, James Hester wrote:
> >I absolutely agree that it is up to the application to decide on the
> >meanings of strings given to it, with reference to the dictionary.  I
> >am very happy that we agree on this iron separation between syntax and
> >content.
> >
> >In a situation that <backslash><delimiter> has meaning to a CIF
> >application, it follows that you cannot in general use raw strings to
> >express a data value that
> >(i) contains both triple double quotes and triple single quotes
> >(ii) contains triple double quotes and terminates with a single quote
> >(iii) contains triple single quotes and terminates with a double quote
> >
> >Furthermore, you cannot use triple-double-quote delimited raw strings
> >to delimit a string terminating in a double quote and/or containing
> >triple double quotes, likewise for single quote.
> >
> >These idiosyncracies would need to be documented if we were to adopt
> >Proposal P, and we would need to be confident that CIF2 implementers
> >and users would not make inadvertent errors in their selection of
> >quotes and string types. As the manifestation of an error would
> >typically be nothing more than a stray accent, there is no way, beyond
> >careful proof-reading, that such mistakes would be caught.
> >
> >Proposals F and F' give less opportunity for error and are simpler to use.
> >
> >On Wed, Feb 23, 2011 at 11:27 AM, Herbert J. Bernstein
> ><[email protected]> wrote:
> >>  Dear James,
> >>
> >>  I am really lost here.  I believe it is up to the application
> >>  to decide on the meaning of strings given as tag value, hopefully
> >>  using a dictionary to inform that decision.  I really don't
> >>  see what the use of r""" versus """ versus ; versus ' versus "
> >>  to get the string into its internal form has to do with its
> >>  meaning to the application, unless the application is one
> >>  of these CIF copy/transform applications that violate some
> >>  of the CIF rules to see through to the original represenatation
> >>  rather than stopping with the data, and then we are moving outside the
> >>  rules of CIF itself to more general text processing.
> >>
> >>  As I said, one of the nicer uses for the raw treble quote strings
> >>  is to bring TeX into an application without having to double-up
> >>  backslashes.  That is a very clear case in which the application
> >>  have very different backslash processing than Python and you
> >>  want to suppress most of the Python processing.  If what you
> >>  are trying to do is to elide the quote marks, then you will have
> >>  an easier time using the regular treble quotes.
> >>
> >>  Regards,
> >>    Herbert
> >>  =====================================================
> >>  Herbert J. Bernstein, Professor of Computer Science
> >>    Dowling College, Kramer Science Center, KSC 121
> >>        Idle Hour Blvd, Oakdale, NY, 11769
> >>
> >>                  +1-631-244-3035
> >  >                [email protected]
> >  > =====================================================
> >>
> >>  On Wed, 23 Feb 2011, James Hester wrote:
> >>
> >>>  Dear Herbert,
> >>>
> >>>  Because raw strings must retain any eliding backslashes in the string
> >>>  (unlike cooked strings), a backslash in the internal string
> >>>  representation may indeed be an artefact of the syntax proposed in
> >>>  Proposal P.  Or might not.  The application can't always tell. See my
> >>>  other email for a way to resolve this.
> >>>
> >>>  If everything is so clear, could you please just answer the following
> >>>  rephrased questions? "The CIF application" refers to an application
> >>>  for which <backslash><delimiter> means "accent the letter preceding
> >>>  the backslash".
> >>>
> >>>  Should the CIF application interpret the first string as finishing
> >>>  with a double quote, or with an accented e?
> >>>  Should the CIF application interpret the second string as containing
> >>>  an accented o, followed by two double quotes, or a letter o followed
> >>>  by three quotes?
> >>>
> >>>  On Wed, Feb 23, 2011 at 10:16 AM, Herbert J. Bernstein
> >>>  <[email protected]> wrote:
> >>>>
> >>>>  Dear James,
> >>>>
> >>>>  I still don't understand. Neither python nor I think \"
> >>>>  from a raw string is an artifact of anything.  It is
> >>>>  just a backslash followed by a double quotemark.  The
> >>>>  point of the raw string is to provide a quick and
> >>>>  convenient way to input something like TeX without
> >>>>  having to double-up the backsashes.  Personally, I am
> >>>>  happy to double up the backslashes, but I can see the
> >>>>  value to people who have to deal with lots of TeX in
> >>>>  not needing to do so.
> >>>>
> >>>>>  Does the first string finish with a double quote, or with an accented e?
> >>>>>  Does the second string contain an accented o, followed by two double
> >>>>>  quotes, or a letter o followed by three quotes?
> >>>>
> >>>>  are not questions related to the quoting mechanism used, but
> >>>>  purely to the application.  Working purely in CIF1.1 all
> >>>>  of the following are equivalent, external representations:
> >>>>
> >>>>  Set 1
> >>>>  ;\
> >>>>  I have no idea what the last characters of this string are\"\
> >>>>  ;
> >>>>  'I have no idea what the last characters of this string are\"'
> >>>>  "I have no idea what the last characters of this string are\""
> >>>>
> >>>>  and in all cases the last 2 characters are backslash followed by
> >>>>  double quote
> >>>>
> >>>>  Set 2
> >>>>  ;\
> >>>>  Does this string have two\""" or three internal quotes?\
> >>>>  ;
> >>>>  'Does this string have two\""" or three internal quotes?'
> >>>>
> >>>>  and in both cases there are three internal quotes
> >>>>
> >>>>  I don't see how this differs in any way from
> >>>>
> >>>>  r'''I have no idea what the last characters of this string are\"'''
> >>>>  or
> >>>>  '''I have no idea what the last characters of this string are\\"'''
> >>>>  or
> >>>>  """I have no idea what the last characters of this string are\\\""""
> >>>>
> >>>>  and
> >>>>
> >>>>  r'''Does this string have two\""" or three internal quotes?'''
> >>>>  or
> >>>>  '''Does this string have two\\""" or three internal quotes?'''
> >>>>  or
> >>>>  """Does this string have two\\\"\"\" or three internal quotes?"""
> >>>>
> >>>>  There are very real problems with the raw string that are noted
> >>>>  in the Pyhton documentation, but they do have their uses.  This
> >>>>  ambiguity is not one of the problems.
> >>>>
> >>>>  Regards,
> >>>>  Herbert
> >>>>
> >>>>  =====================================================
> >>>>  Herbert J. Bernstein, Professor of Computer Science
> >>>>    Dowling College, Kramer Science Center, KSC 121
> >>>>        Idle Hour Blvd, Oakdale, NY, 11769
> >>>>
> >>>>                  +1-631-244-3035
> >>>>                  [email protected]
> >>>>  =====================================================
> >>>>
> >>>>  On Wed, 23 Feb 2011, James Hester wrote:
> >>>>
> >>>>>  I am trying to focus relentlessly on a particular and very real
> >>>>>  technical issue.  I repeat that I am not concerned about the
> >>>>>  transformation from surface syntax to a sequence of characters.  I
> >>>>>  accept that that is well-defined and unambiguous for all proposals on
> >>>>>  the table.  If you think that IDLE can resolve this problem, you
> >>>>>  haven't understood my question.
> >>>>>
> >>>>>  My question relates to the next step: how does the CIF application
> >  >>>> downstream from the parser interpret this sequence of characters?
> >>>>>  Under all previous incarnations of CIF, it was safe to assume that no
> >>>>>  artefacts of syntactical representation were left in the string, so
> >>>>>  the string had purely domain-specific meaning.  However, with the
> >>>>>  introduction of raw strings, <backslash><delimiter> will escape the
> >>>>>  delimiter, but the <backslash> is required to remain in the string.
> >>>>>  So the downstream application must decide between artefacts of the
> >>>>>  syntactical representation (<backslash><delimiter>) that have remained
> >>>>>  in raw strings, and domain-specific character sequences
> >>>>>  (<backslash><delimiter>).  Here those examples are again (remember
> >>>>>  this is the character sequence after syntactic processing):
> >>>>>
> >>>>>  <start> I have no idea what the last characters of this string
> >>>>>  are\"<finish>
> >>>>>  <start> Does this string have two\""" or three internal quotes?<finish>
> >>>>>
> >>>>>  Assume the domain-specific meaning of <backslash><quote> when found in
> >>>>>  a datavalue is to accent the letter preceding the <backslash>.
> >>>>>
> >>>>>  Does the first string finish with a double quote, or with an accented e?
> >>>>>  Does the second string contain an accented o, followed by two double
> >>>>>  quotes, or a letter o followed by three quotes?
> >>>>>
> >>>>>
> >>>>>  On Wed, Feb 23, 2011 at 8:01 AM, SIMON WESTRIP
> >>>>>  <[email protected]> wrote:
> >>>>>>
> >>>>>>  Dear all
> >>>>>>
> >>>>>>  Reviewing the exchanges in this thread ("Technical issues with Proposal
> >>>>>>  P"),
> >>>>>>  it seems that
> >>>>>>  the 'technical issues' might better be described as 'potentially
> >>>>>>  confusing
> >>>>>>  issues' :-)
> >>>>>>  That is, under proposal P, there is no ambiguity about how the string
> >>>>>>  should
> >>>>>>  be read, but
> >>>>>>  there is potential for misinterpretation by the user (e.g. an erroneous
> >>>>>>  assumption that by using a backslash
> >>>>>>  to escape a quotation mark, the backslash will not be included as part
> >>>>>>  of
> >>>>>>  the parsed data value (in the raw variant)).
> >>>>>>  So, as John says, perhaps this simply demonstrates that "the complexity
> >>>>>>  of
> >>>>>>  the syntax and semantics
> >>>>>>  provided by proposal P would be likely to be a source of confusion for
> >>>>>>  developers and users both", and maybe
> >>>>>>  therein lies the merit of this particular thread? It reinforces those
> >>>>>>  arguements against proposal P that suggest
> >>>>>>  that the introduction of a more complex syntax for one of the delimiter
> >>>>>>  types is a potential source of
> >>>>>>  confusion for many existing CIF users.
> >>>>>>
> >>>>>>  Cheers
> >>>>>>
> >>>>>>  Simon
> >>>>>>  ________________________________
> >>>>>>  From: Herbert J. Bernstein <[email protected]>
> >>>>>>  To: Group finalising DDLm and associated dictionaries
> >>>>>>  <[email protected]>
> >>>>>>  Sent: Tuesday, 22 February, 2011 20:22:57
> >>>>>>  Subject: Re: [ddlm-group] Technical issues with Proposal P
> >>>>>>
> >>>>>>  Dear Simon,
> >>>>>>
> >>>>>>    I make mistakes on this, too.  That is why I like having IDLE
> >>>>>>  handy and sticking to Python syntax.
> >>>>>>
> >>>>>>    Regards,
> >>>>>>      Herbert
> >>>>>>
> >>>>>>  =====================================================
> >>>>>>  Herbert J. Bernstein, Professor of Computer Science
> >>>>>>    Dowling College, Kramer Science Center, KSC 121
> >>>>>>          Idle Hour Blvd, Oakdale, NY, 11769
> >>>>>>
> >>>>>>                  +1-631-244-3035
> >>>>>>                  [email protected]
> >>>>>>  =====================================================
> >>>>>>
> >>>>>>  On Tue, 22 Feb 2011, SIMON WESTRIP wrote:
> >>>>>>
> >>>>>>>  Dear Herbert - I've just realized I confused myself by misreading your
> >>>>>>>  example
> >>>>>>>  and treating it as equivalent to my own example! Sorry about that.
> >>>>>>>
> >>>>>>>  Cheers
> >>>>>>>
> >>>>>>>  Simon
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>__________________________________________________________________________________
> _____________________________________
> >>>>>>>  From: SIMON WESTRIP <[email protected]>
> >>>>>>>  To: Group finalising DDLm and associated dictionaries
> >>>>>>>  <[email protected]>
> >>>>>>>  Sent: Tuesday, 22 February, 2011 14:51:03
> >>>>>>>  Subject: Re: [ddlm-group] Technical issues with Proposal P
> >  >>>>>>
> >>>>>>>  Dear Herbert
> >>>>>>>
> >>>>>>>  I'm still a bit confused. Following python semantics,
> >>>>>>>  a CIF application reading the following items
> >>>>>>>
> >>>>>>>  _item_a """C\""""
> >>>>>>>  _item_b r"""C\""""
> >>>>>>>
> >>>>>>>  should return values of
> >>>>>>>
> >>>>>>>  C" for _item_a
> >>>>>>>  C\" for _item_b
> >>>>>>>
> >>>>>>>  Are you suggesting that the application should then *assume* that in
> >>>>>>>  the
> >>>>>>>  case of
> >>>>>>>  _item_b the use of the backslash was purely to escape the final quote
> >>>>>>>  and
> >>>>>>>  should
> >>>>>>>  discard the backslash from the value, thus assuming a value of C" ?
> >>>>>>>
> >>>>>>>  Cheers
> >>>>>>>
> >>>>>>>  Simon
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>__________________________________________________________________________________
> _____________________________________
> >>>>>>>  From: Herbert J. Bernstein <[email protected]>
> >>>>>>>  To: Group finalising DDLm and associated dictionaries
> >>>>>>>  <[email protected]>
> >>>>>>>  Sent: Tuesday, 22 February, 2011 13:51:02
> >>>>>>>  Subject: Re: [ddlm-group] Technical issues with Proposal P
> >>>>>>>
> >>>>>>>  Dear Simon,
> >>>>>>>
> >>>>>>>    From the point of view of writing a pure "CIF2" application
> >>>>>>>  that is not aware of the whitespace, particular quote marks,
> >>>>>>>  comments, etc, those two string are identical.
> >>>>>>>
> >>>>>>>    From the point of view of a more general CIF API, in which
> >>>>>>>  comments, magic numbers, and partiular quote marks, those
> >>>>>>>  two string are different in precisely the same way that
> >>>>>>>  the string 'ABC' and "ABC" are different, and 13.4 and
> >>>>>>>  1.34e1 are different.
> >>>>>>>
> >>>>>>>    This is _not_ an ambiguity.  It is a matter of whether
> >>>>>>>  we are looking for the information in a file or looking
> >>>>>>>  for the representations of the data in the file.
> >>>>>>>
> >>>>>>>    Regards,
> >>>>>>>      Herbert
> >>>>>>>
> >>>>>>>
> >>>>>>>  =====================================================
> >>>>>>>  Herbert J. Bernstein, Professor of Computer Science
> >>>>>>>    Dowling College, Kramer Science Center, KSC 121
> >>>>>>>          Idle Hour Blvd, Oakdale, NY, 11769
> >>>>>>>
> >>>>>>>                  +1-631-244-3035
> >>>>>>>                  [email protected]
> >>>>>>>  =====================================================
> >>>>>>>
> >>>>>>>  On Tue, 22 Feb 2011, SIMON WESTRIP wrote:
> >>>>>>>
> >>>>>>>>  So
> >>>>>>>>  """\\\"""" and r"""\""""
> >>>>>>>>  should strictly be treated as different, despite any recommendations
> >>>>>>>>  you
> >>>>>>>>  may
> >>>>>>>>  have made to the contrary?
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>____________________________________________________________________________
> >>>>>>>>  From: Herbert J. Bernstein <[email protected]>
> >>>>>>>>  To: Group finalising DDLm and associated dictionaries
> >>>>>>>>  <[email protected]>
> >>>>>>>>  Sent: Tuesday, 22 February, 2011 12:46:57
> >>>>>>>>  Subject: Re: [ddlm-group] Technical issues with Proposal P
> >>>>>>>>
> >>>>>>>>>  So what is r"""C\"""" ?
> >>>>>>>>>
> >>>>>>>>>  Is it C\" or is it C" ?
> >>>>>>>>
> >>>>>>>>  """C\"""" is C"
> >>>>>>>>
> >>>>>>>>  r"""C\"""" is C\"
> >>>>>>>>
> >>>>>>>>  You can test this with IDLE.  It is very clearly defined and
> >>>>>>>>  reproducible Python string behavior, and I believe helps to make
> >>>>>>>>  the case for sticking to the Python approach.  It is very easy
> >>>>>>>>  for any software developer or user to work out how the boundary
> >>>>>>>>  cases are being handled.
> >>>>>>>>
> >>>>>>>>  Regards,
> >>>>>>>>    Herbert
> >>>>>>>>
> >>>>>>>>  =====================================================
> >>>>>>>>  Herbert J. Bernstein, Professor of Computer Science
> >>>>>>>>    Dowling College, Kramer Science Center, KSC 121
> >>>>>>>>          Idle Hour Blvd, Oakdale, NY, 11769
> >>>>>>>>
> >>>>>>>>                  +1-631-244-3035
> >>>>>>>>                  [email protected]
> >>>>>>>>  =====================================================
> >>>>>>>>
> >>>>>>>>  On Tue, 22 Feb 2011, SIMON WESTRIP wrote:
> >>>>>>>>
> >>>>>>>>>  I am a little confused:
> >>>>>>>>>
> >>>>>>>>>  So what is r"""C\"""" ?
> >>>>>>>>>
> >>>>>>>>>  Is it C\" or is it C" ?
> >>>>>>>>>
> >>>>>>>>>  Python says it should be C\", so CIF2 should say its C\" if CIF2 is
> >>>>>>>>
> >>>>>>>>  adopting
> >>>>>>>>>
> >>>>>>>>>  Python?
> >>>>>>>>>
> >>>>>>>>>  Or are you suggesting that we should adopt a fuzzy interpretation of
> >  >>>>>>>
> >>>>>>>>  Python?
> >>>>>>>>>
> >>>>>>>>>  Cheers
> >>>>>>>>>
> >>>>>>>>>  Simon
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>___________________________________________________________________________
> >>>>>>>>
> >>>>>>>>  _
> >>>>>>>>>
> >>>>>>>>>  From: Herbert J. Bernstein <[email protected]>
> >>>>>>>>>  To: Group finalising DDLm and associated dictionaries
> >>>>>>>>
> >>>>>>>>  <[email protected]>
> >>>>>>>>>
> >>>>>>>>>  Sent: Tuesday, 22 February, 2011 12:01:23
> >>>>>>>>>  Subject: Re: [ddlm-group] Technical issues with Proposal P
> >>>>>>>>>
> >>>>>>>>>  Dear Colleagues,
> >>>>>>>>>
> >>>>>>>>>    Working under the assumption of Ralf's proposal, rather
> >>>>>>>>>  than Simon's, we have several very distinct string presentaions
> >>>>>>>>>  to consider:  a (non-raw) treble quoted string, a raw treble
> >>>>>>>>>  quoted string a unicode treble quoted string and a raw unicode
> >>>>>>>>>  treble quoted string.  As for Python 3, under CIF2, because
> >>>>>>>>>  the "native" character encoding is UTF-8, under reasonable coding
> >>>>>>>>>  constraints, this collapses to just two cases the application
> >>>>>>>>>  needs to deal with:  non-raw (i.e. cooked) versus raw.  The intent
> >>>>>>>>>  of
> >>>>>>>>>  the cooked is for the lexer to process the elides, so the response
> >>>>>>>>>  I gave is, I believe correct -- just push the string through IDLE.
> >>>>>>>>>  The intent of the raw is precisely to push through the string
> >>>>>>>>>  with the backslahes still in place, e.g. for TeX text in which
> >>>>>>>>>  you don't want to double-up your backslashes.  While I personally
> >>>>>>>>>  would recommend against such a use of raw, it is not ambiguous.
> >>>>>>>>>  It gives the application a very well-defined string of characters
> >>>>>>>>>  to deal with.  Yes, there are applications that are intended to
> >>>>>>>>>  deal with CIF with the encoding exposed (e.g. cif2cbf, cif2cif,
> >>>>>>>>>  etc.)
> >>>>>>>>>  bit, I agree that the cleanest design is for an application to
> >>>>>>>>>  only make use of the string content, not the representation.
> >>>>>>>>>
> >>>>>>>>>    Thus, for most applications, I would recommend that they treat
> >>>>>>>>>
> >>>>>>>>>    """\\\"""" and r"""\""""
> >>>>>>>>>
> >>>>>>>>>  as equivalent, but for applications that are, for example,
> >>>>>>>>>  intended to do faithful copies of the representations that
> >>>>>>>>>  they treat them as different.
> >>>>>>>>>
> >>>>>>>>>    We have had, and will continue to have this subtle problem
> >>>>>>>>>  with all versions of CIF in the handling of things such as
> >>>>>>>>>  magic number, comments, white space, line folding, and choices
> >>>>>>>>>  of quoting characters.  I don't see how the introduction of
> >>>>>>>>>  the Python treble quote makes the situation any worse or
> >>>>>>>>>  any more or less ambiguous.
> >>>>>>>>>
> >>>>>>>>>    Regards,
> >>>>>>>>>      Herbert
> >>>>>>>>>
> >>>>>>>>>  =====================================================
> >>>>>>>>>    Herbert J. Bernstein, Professor of Computer Science
> >>>>>>>>>      Dowling College, Kramer Science Center, KSC 121
> >>>>>>>>>          Idle Hour Blvd, Oakdale, NY, 11769
> >>>>>>>>>
> >>>>>>>>>                    +1-631-244-3035
> >>>>>>>>>                    [email protected]
> >>>>>>>>>  =====================================================
> >>>>>>>>>
> >>>>>>>>>  On Tue, 22 Feb 2011, James Hester wrote:
> >>>>>>>>>
> >>>>>>>>>>  I will focus this email on the technical issues and try to return
> >>>>>>>>>>  to
> >>>>>>>>>>  the other issues at a later date (I've changed the subject
> >>>>>>>>>>  accordingly)
> >>>>>>>>>>
> >>>>>>>>>>  [edit]
> >>>>>>>>>>
> >>>>>>>>>>  My apologies for not being clear: my examples of embedded elides
> >>>>>>>>>>  already give the internal representation of the strings,
> >>>>>>>>>>  deliberately
> >>>>>>>>>>  leaving out the particular delimiters that might have been used to
> >>>>>>>>>>  produce those strings.  Herbert mistakenly thought I was giving
> >>>>>>>>>>  triple-double-quote delimited strings and asking what the internal
> >>>>>>>>>>  representation was. So, unfortunately, IDLE cannot help here, as
> >>>>>>>>>>  the
> >>>>>>>>>>  internal representation is not in question.
> >>>>>>>>>>
> >>>>>>>>>>  My question therefore remains: how does the CIF application
> >>>>>>>>>>  interpret
> >>>>>>>>>>  these strings? Is the <backslash><delimiter> in my examples simply
> >  >>>>>>>>> an
> >>>>>>>>>>  elide that could not be removed from a raw string and therefore
> >>>>>>>>>>  should
> >>>>>>>>>>  be ignored, or is it a character sequence intended for the
> >>>>>>>>>>  application
> >>>>>>>>>>  (eg a LaTeX accent on the o or e)?
> >>>>>>>>>>
> >>>>>>>>>>  In your answer you may assume that the CIF application knows that
> >>>>>>>>>>  the
> >>>>>>>>>>  string was a raw string delimited by triple double quotes (even
> >>>>>>>>>>  though
> >>>>>>>>>>  requiring communication of such information would be a very
> >>>>>>>>>>  unfortunate loss of clean design).
> >>>>>>>>>>
> >>>>>>>>>>  Those strings again:
> >>>>>>>>>>
> >>>>>>>>>>  <start> I have no idea what the last characters of this string
> >>>>>>>>>
> >>>>>>>>>  are\"<finish>
> >>>>>>>>>>
> >>>>>>>>>>  <start> Does this string have two\""" or three internal
> >>>>>>>>>>  quotes?<finish>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>  Herbert writes:
> >>>>>>>>>>>
> >>>>>>>>>>>  Now for your two examples of embedded elides of quotes:
> >>>>>>>>>>>
> >>>>>>>>>>> <start> I have no idea what the last characters of this string
> >>>>>>>>>
> >>>>>>>>>  are\"<finish>
> >>>>>>>>>>>
> >>>>>>>>>>> is, internally, as a C-string
> >>>>>>>>>>>
> >>>>>>>>>>> I have no idea what the last characters of this string are"\0
> >>>>>>>>>>>
> >>>>>>>>>>> <start> Does this string have two\""" or three internal
> >>>>>>>>>>> quotes?<finish>
> >>>>>>>>>>>
> >>>>>>>>>>> is, internally as a C-string
> >>>>>>>>>>>
> >>>>>>>>>>> Does this string have two""" or three internal quotes?\0
> >>>>>>>>>>>
> >>>>>>>>>>> I settled that by simply cranking up IDLE and doing:
> >>>>>>>>>>>
> >>>>>>>>>>>>>>  print """I have no idea what the last characters of this
> >>>>>>>>>>>>>> string
> >>>>>>>>>>>>>> are\"""" I have no idea what the last characters of this string
> >>>>>>>>>>>>>> are" >>> print """Does this string have two\""" or three
> >>>>>>>>>>>>>> internal
> >>>>>>>>>>>>>> quotes?""" Does this string have two""" or three internal
> >>>>>>>>>>>>>> quotes?
> >>>>>>>>>>>
> >>>>>>>>>>> As you well know, having IDLE around is a big help.
> >>>>>>>>>>>
> >>>>>>>>>>>  Thank you again for taking the time to clarify your position
> >>>>>>>>>>> on Ralf's proposal.  I think I now understand why you prefer
> >>>>>>>>>>> Simon's
> >>>>>>>>>>> proposal.
> >>>>>>>>>>>
> >>>>>>>>>>>  Regards,
> >>>>>>>>>>>    Herbert
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>>> One technical issue with Proposal P that has not been resolved is
> >>>>>>>>>>>> how
> >>>>>>>>>>>> a CIF application is supposed to interpret the sequence
> >>>>>>>>>>>> <backslash><double quote> when encountered in a string returned
> >>>>>>>>>>>> from
> >>>>>>>>>>>> the parser.  Is this sequence:
> >>>>>>>>>>>> (a) a terminator elide sequence that was left in a raw string, so
> >>>>>>>>>>>> corresponds to <double quote>?
> >>>>>>>>>>>> (b) something with meaning for the application so should be
> >>>>>>>>>>>> <backslash><double quote>?
> >>>>>>>>>>>>
> >>>>>>>>>>>> Please therefore advise how a CIF application will disambiguate
> >>>>>>>>>>>> the
> >>>>>>>>>>>> following string content from a Proposal P parser:
> >>>>>>>>>>>>
> >>>>>>>>>>>> <start> I have no idea what the last characters of this string
> >>>>>>>>>
> >>>>>>>>>  are\"<finish>
> >>>>>>>>>>>>
> >>>>>>>>>>>> <start> Does this string have two\""" or three internal
> >>>>>>>>
> >>>>>>>>  quotes?<finish>
> >>>>>>>>>>>>
> >>>>>>>>>>>> James
> >>>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>  --
> >>>>>>>>>>  T +61 (02) 9717 9907
> >>>>>>>>>>  F +61 (02) 9717 3145
> >>>>>>>>>>  M +61 (04) 0249 4148
> >>>>>>>>>>  _______________________________________________
> >>>>>>>>>>  ddlm-group mailing list
> >>>>>>>>>>  [email protected]
> >>>>>>>>>>  http://scripts.iucr.org/mailman/listinfo/ddlm-group
> >>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>  _______________________________________________
> >>>>>>  ddlm-group mailing list
> >>>>>>  [email protected]
> >>>>>>  http://scripts.iucr.org/mailman/listinfo/ddlm-group
> >>>>>>
> >>>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>>  --
> >>>>>  T +61 (02) 9717 9907
> >>>>>  F +61 (02) 9717 3145
> >>>>>  M +61 (04) 0249 4148
> >>>>>  _______________________________________________
> >>>>>  ddlm-group mailing list
> >>>>>  [email protected]
> >>>>>  http://scripts.iucr.org/mailman/listinfo/ddlm-group
> >>>>
> >>>>  _______________________________________________
> >  >>> ddlm-group mailing list
> >>>>  [email protected]
> >>>>  http://scripts.iucr.org/mailman/listinfo/ddlm-group
> >>>>
> >>>>
> >>>
> >>>
> >>>
> >>>  --
> >>>  T +61 (02) 9717 9907
> >>>  F +61 (02) 9717 3145
> >>>  M +61 (04) 0249 4148
> >>>  _______________________________________________
> >>>  ddlm-group mailing list
> >>>  [email protected]
> >>>  http://scripts.iucr.org/mailman/listinfo/ddlm-group
> >>
> >>  _______________________________________________
> >>  ddlm-group mailing list
> >>  [email protected]
> >>  http://scripts.iucr.org/mailman/listinfo/ddlm-group
> >>
> >>
> >
> >
> >
> >--
> >T +61 (02) 9717 9907
> >F +61 (02) 9717 3145
> >M +61 (04) 0249 4148
> >_______________________________________________
> >ddlm-group mailing list
> >[email protected]
> >http://scripts.iucr.org/mailman/listinfo/ddlm-group
> 
> 
> --
> =====================================================
>   Herbert J. Bernstein, Professor of Computer Science
>     Dowling College, Kramer Science Center, KSC 121
>         Idle Hour Blvd, Oakdale, NY, 11769
> 
>                   +1-631-244-3035
>                   [email protected]
> =====================================================
> _______________________________________________
> ddlm-group mailing list
> [email protected]
> http://scripts.iucr.org/mailman/listinfo/ddlm-group
> 
>

_______________________________________________
ddlm-group mailing list
[email protected]
http://scripts.iucr.org/mailman/listinfo/ddlm-group

Reply to: [list | sender only]

Follow-Ups:

Re: [ddlm-group] Technical issues with Proposal P (SIMON WESTRIP)

References:

[ddlm-group] Technical issues with Proposal P (James Hester)

Re: [ddlm-group] Technical issues with Proposal P (Herbert J. Bernstein)

Re: [ddlm-group] Technical issues with Proposal P (SIMON WESTRIP)

Re: [ddlm-group] Technical issues with Proposal P (Herbert J. Bernstein)

Re: [ddlm-group] Technical issues with Proposal P (SIMON WESTRIP)

Re: [ddlm-group] Technical issues with Proposal P (Herbert J. Bernstein)

Re: [ddlm-group] Technical issues with Proposal P (SIMON WESTRIP)

Re: [ddlm-group] Technical issues with Proposal P (SIMON WESTRIP)

Re: [ddlm-group] Technical issues with Proposal P (Herbert J. Bernstein)

Re: [ddlm-group] Technical issues with Proposal P (SIMON WESTRIP)

Re: [ddlm-group] Technical issues with Proposal P (James Hester)

Re: [ddlm-group] Technical issues with Proposal P (Herbert J. Bernstein)

Re: [ddlm-group] Technical issues with Proposal P (James Hester)

Re: [ddlm-group] Technical issues with Proposal P (Herbert J. Bernstein)

Re: [ddlm-group] Technical issues with Proposal P (James Hester)

Re: [ddlm-group] Technical issues with Proposal P (Herbert J. Bernstein)

Re: [ddlm-group] Technical issues with Proposal P (SIMON WESTRIP)

Re: [ddlm-group] Technical issues with Proposal P (SIMON WESTRIP)

Prev by Date: Re: [ddlm-group] Technical issues with Proposal P

Next by Date: Re: [ddlm-group] Technical issues with Proposal P

Prev by thread: Re: [ddlm-group] Technical issues with Proposal P

Next by thread: Re: [ddlm-group] Technical issues with Proposal P

Index(es):

Date

Thread

Discussion List Archives

Re: [ddlm-group] Technical issues with Proposal P