[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Reply to: [list | sender only]
Re: [ddlm-group] Technical issues with Proposal P
- To: Group finalising DDLm and associated dictionaries <ddlm-group@iucr.org>
- Subject: Re: [ddlm-group] Technical issues with Proposal P
- From: SIMON WESTRIP <simonwestrip@btinternet.com>
- Date: Thu, 24 Feb 2011 13:21:11 +0000 (GMT)
- In-Reply-To: <alpine.BSF.2.00.1102240758270.26665@epsilon.pair.com>
- References: <AANLkTi=kadbHikjabDyioDOw=L_pthGORgi6w2b45yX6@mail.gmail.com><alpine.BSF.2.00.1102220644270.84613@epsilon.pair.com><417719.45449.qm@web87006.mail.ird.yahoo.com><alpine.BSF.2.00.1102220741480.84613@epsilon.pair.com><301639.7573.qm@web87001.mail.ird.yahoo.com><alpine.BSF.2.00.1102220845481.84613@epsilon.pair.com><228483.70348.qm@web87004.mail.ird.yahoo.com><710426.91151.qm@web87002.mail.ird.yahoo.com><alpine.BSF.2.00.1102221522040.36016@epsilon.pair.com><639597.72610.qm@web87009.mail.ird.yahoo.com><AANLkTikReSDV23THOOXBA2mhnq8O1-jF79_zQRbDp4Q8@mail.gmail.com><alpine.BSF.2.00.1102221758260.23065@epsilon.pair.com><AANLkTi=4wXyFTquP88JDjUc+w480Y5M1uzOimmayTaRY@mail.gmail.com><alpine.BSF.2.00.1102221915580.23065@epsilon.pair.com><AANLkTikgabBbmimHEWm9gD_kEnFdCSVV4-G-i+Z2nsYw@mail.gmail.com><a06240801c98a3e532621@[192.168.2.102]><219810.84158.qm@web87007.mail.ird.yahoo.com><635800.54481.qm@web87007.mail.ird.yahoo.com><alpine.BSF.2.00.1102240758270.2666 5@epsilon.pair.com>
The way I see it, by adopting Proposal P we will not be providing
anything new in terms of raw strings (i.e. all other delimiters
delimit raw strings) - rather we are giving people the opportunity to
use 'cooked' strings. If this boils down to a matter of taste, I'm not
convinced it justifies the potential confusion for users or the extra
burden on developers.
Cheers
Simon
From: Herbert J. Bernstein <yaya@bernstein-plus-sons.com>
To: Group finalising DDLm and associated dictionaries <ddlm-group@iucr.org>
Sent: Thursday, 24 February, 2011 13:02:08
Subject: Re: [ddlm-group] Technical issues with Proposal P
Dear Simon,
Yes, the closest approximation to the current line folding
would be a cooked python style treble-quoted string.
The main use for the raw strings is for people who don't
like having to double-up backslashes to present things like
TeX. Not my taste, but some people like it, and there
is no downside that I can see in giving them the capability.
Regards,
Herbert
=====================================================
Herbert J. Bernstein, Professor of Computer Science
Dowling College, Kramer Science Center, KSC 121
Idle Hour Blvd, Oakdale, NY, 11769
+1-631-244-3035
yaya@dowling.edu
=====================================================
On Thu, 24 Feb 2011, SIMON WESTRIP wrote:
> A possible minor inconvenience of proposal P:
>
> Given that CIF strings are essentially 'raw' as CIF2 now stands (and CIF1 strings too can
> be reconciled with the raw variant), and as I understand it python raw strings do not
> support line continuation,
> under proposal P a string will have to be 'cooked' in order to employ line folding?
>
> Please forgive me if these seems a little trivial, but I am really struggling to see any
> benefit
> in adopting proposal P, especially for the end user. Maybe someone can help by providing
> an example where the use of cooked strings will make life easier for the end user?
>
> Cheers
>
> Simon
>
> __________________________________________________________________________________________
> From: SIMON WESTRIP <simonwestrip@btinternet.com>
> To: Group finalising DDLm and associated dictionaries <ddlm-group@iucr.org>
> Sent: Wednesday, 23 February, 2011 13:06:15
> Subject: Re: [ddlm-group] Technical issues with Proposal P
>
> I don't know if this will help much in making a choice about proposal P, but
> it might be worth looking at current practice in the case of one CIF user group -
> namely authors submitting CIFs to journals.
>
> In many respects an F-type scheme has been used for preparing the text sections of
> CIFs for publication for many years. The backslash is used to escape accents and greek
> letters
> as well as itself, e.g. \"u for uumul, \a for alpha, \\a for \a...
> In addition, we have the line-folding protocol, although that is rarely necessary and
> very rarely applied manually. Although these 'common semantic features' will not be
> a part of CIF2, they may well remain in use at the application level. Under scheme F,
> a field containing this markup delimited by semicolons could readily be dropped
> into a field delimited by tripple quotes (though double-backslash control sequences
> would be returned as single-backslash control sequences - but fortunately there are few of
> these in use, e.g. \\rightarrow...). Under a P-type proposal, extra care would be required
> when choosing between the 'raw' and 'cooked' variants before dumping the contents of
> a semi-colon delimited field into a tripple-quoted field.
>
> I've mentioned before that handling the transition from CIF1's 'common semantic features'
> to CIF2 unicode will require care in any case; I've yet to be convinced that complex
> python semantics will help here, nor offer any real benefit in general, given that by
> only adopting them for one means of delimiting a data value, great care has to be taken
> when
> switching delimiters (and for no obvious reason or benefit if you're only concern when
> working with a
> raw CIF is to complete it for publication purposes).
>
> Cheers
>
> Simon
>
> __________________________________________________________________________________________
> From: Herbert J. Bernstein <yaya@bernstein-plus-sons.com>
> To: Group finalising DDLm and associated dictionaries <ddlm-group@iucr.org>
> Sent: Wednesday, 23 February, 2011 4:40:39
> Subject: Re: [ddlm-group] Technical issues with Proposal P
>
> Dear James,
>
> I don't see any reason to disagree with your summary of limitations
> on the use of raw strings. That is why I find it easier to use cooked
> strings. But there are people who like them, so, since, once you
> have Python-style triple quotes at all, it is easy to support them,
> I would be inclined to do so, rather than have a one-size fits all
> solution.
>
> The real question is whether to support the cooked Python strings
> with all Python elides or the more limited set of elides in proposal
> F. I repeat my suggestion that people try working with both as
> I have for years, and we can see which proves more or less confusing
> in what situations. I have always found the mixing of things
> like TeX with its backslashes with the line folding protocol without
> doubling up the TeX backslashes very confusing. Maybe it is just the
> way my head works. Maybe other people will have a different view.
> We won't know until people other than me try it.
>
> Regards,
> Herbert
>
> At 1:35 PM +1100 2/23/11, James Hester wrote:
> >I absolutely agree that it is up to the application to decide on the
> >meanings of strings given to it, with reference to the dictionary. I
> >am very happy that we agree on this iron separation between syntax and
> >content.
> >
> >In a situation that <backslash><delimiter> has meaning to a CIF
> >application, it follows that you cannot in general use raw strings to
> >express a data value that
> >(i) contains both triple double quotes and triple single quotes
> >(ii) contains triple double quotes and terminates with a single quote
> >(iii) contains triple single quotes and terminates with a double quote
> >
> >Furthermore, you cannot use triple-double-quote delimited raw strings
> >to delimit a string terminating in a double quote and/or containing
> >triple double quotes, likewise for single quote.
> >
> >These idiosyncracies would need to be documented if we were to adopt
> >Proposal P, and we would need to be confident that CIF2 implementers
> >and users would not make inadvertent errors in their selection of
> >quotes and string types. As the manifestation of an error would
> >typically be nothing more than a stray accent, there is no way, beyond
> >careful proof-reading, that such mistakes would be caught.
> >
> >Proposals F and F' give less opportunity for error and are simpler to use.
> >
> >On Wed, Feb 23, 2011 at 11:27 AM, Herbert J. Bernstein
> ><yaya@bernstein-plus-sons.com> wrote:
> >> Dear James,
> >>
> >> I am really lost here. I believe it is up to the application
> >> to decide on the meaning of strings given as tag value, hopefully
> >> using a dictionary to inform that decision. I really don't
> >> see what the use of r""" versus """ versus ; versus ' versus "
> >> to get the string into its internal form has to do with its
> >> meaning to the application, unless the application is one
> >> of these CIF copy/transform applications that violate some
> >> of the CIF rules to see through to the original represenatation
> >> rather than stopping with the data, and then we are moving outside the
> >> rules of CIF itself to more general text processing.
> >>
> >> As I said, one of the nicer uses for the raw treble quote strings
> >> is to bring TeX into an application without having to double-up
> >> backslashes. That is a very clear case in which the application
> >> have very different backslash processing than Python and you
> >> want to suppress most of the Python processing. If what you
> >> are trying to do is to elide the quote marks, then you will have
> >> an easier time using the regular treble quotes.
> >>
> >> Regards,
> >> Herbert
> >> =====================================================
> >> Herbert J. Bernstein, Professor of Computer Science
> >> Dowling College, Kramer Science Center, KSC 121
> >> Idle Hour Blvd, Oakdale, NY, 11769
> >>
> >> +1-631-244-3035
> > > yaya@dowling.edu
> > > =====================================================
> >>
> >> On Wed, 23 Feb 2011, James Hester wrote:
> >>
> >>> Dear Herbert,
> >>>
> >>> Because raw strings must retain any eliding backslashes in the string
> >>> (unlike cooked strings), a backslash in the internal string
> >>> representation may indeed be an artefact of the syntax proposed in
> >>> Proposal P. Or might not. The application can't always tell. See my
> >>> other email for a way to resolve this.
> >>>
> >>> If everything is so clear, could you please just answer the following
> >>> rephrased questions? "The CIF application" refers to an application
> >>> for which <backslash><delimiter> means "accent the letter preceding
> >>> the backslash".
> >>>
> >>> Should the CIF application interpret the first string as finishing
> >>> with a double quote, or with an accented e?
> >>> Should the CIF application interpret the second string as containing
> >>> an accented o, followed by two double quotes, or a letter o followed
> >>> by three quotes?
> >>>
> >>> On Wed, Feb 23, 2011 at 10:16 AM, Herbert J. Bernstein
> >>> <yaya@bernstein-plus-sons.com> wrote:
> >>>>
> >>>> Dear James,
> >>>>
> >>>> I still don't understand. Neither python nor I think \"
> >>>> from a raw string is an artifact of anything. It is
> >>>> just a backslash followed by a double quotemark. The
> >>>> point of the raw string is to provide a quick and
> >>>> convenient way to input something like TeX without
> >>>> having to double-up the backsashes. Personally, I am
> >>>> happy to double up the backslashes, but I can see the
> >>>> value to people who have to deal with lots of TeX in
> >>>> not needing to do so.
> >>>>
> >>>>> Does the first string finish with a double quote, or with an accented e?
> >>>>> Does the second string contain an accented o, followed by two double
> >>>>> quotes, or a letter o followed by three quotes?
> >>>>
> >>>> are not questions related to the quoting mechanism used, but
> >>>> purely to the application. Working purely in CIF1.1 all
> >>>> of the following are equivalent, external representations:
> >>>>
> >>>> Set 1
> >>>> ;\
> >>>> I have no idea what the last characters of this string are\"\
> >>>> ;
> >>>> 'I have no idea what the last characters of this string are\"'
> >>>> "I have no idea what the last characters of this string are\""
> >>>>
> >>>> and in all cases the last 2 characters are backslash followed by
> >>>> double quote
> >>>>
> >>>> Set 2
> >>>> ;\
> >>>> Does this string have two\""" or three internal quotes?\
> >>>> ;
> >>>> 'Does this string have two\""" or three internal quotes?'
> >>>>
> >>>> and in both cases there are three internal quotes
> >>>>
> >>>> I don't see how this differs in any way from
> >>>>
> >>>> r'''I have no idea what the last characters of this string are\"'''
> >>>> or
> >>>> '''I have no idea what the last characters of this string are\\"'''
> >>>> or
> >>>> """I have no idea what the last characters of this string are\\\""""
> >>>>
> >>>> and
> >>>>
> >>>> r'''Does this string have two\""" or three internal quotes?'''
> >>>> or
> >>>> '''Does this string have two\\""" or three internal quotes?'''
> >>>> or
> >>>> """Does this string have two\\\"\"\" or three internal quotes?"""
> >>>>
> >>>> There are very real problems with the raw string that are noted
> >>>> in the Pyhton documentation, but they do have their uses. This
> >>>> ambiguity is not one of the problems.
> >>>>
> >>>> Regards,
> >>>> Herbert
> >>>>
> >>>> =====================================================
> >>>> Herbert J. Bernstein, Professor of Computer Science
> >>>> Dowling College, Kramer Science Center, KSC 121
> >>>> Idle Hour Blvd, Oakdale, NY, 11769
> >>>>
> >>>> +1-631-244-3035
> >>>> yaya@dowling.edu
> >>>> =====================================================
> >>>>
> >>>> On Wed, 23 Feb 2011, James Hester wrote:
> >>>>
> >>>>> I am trying to focus relentlessly on a particular and very real
> >>>>> technical issue. I repeat that I am not concerned about the
> >>>>> transformation from surface syntax to a sequence of characters. I
> >>>>> accept that that is well-defined and unambiguous for all proposals on
> >>>>> the table. If you think that IDLE can resolve this problem, you
> >>>>> haven't understood my question.
> >>>>>
> >>>>> My question relates to the next step: how does the CIF application
> > >>>> downstream from the parser interpret this sequence of characters?
> >>>>> Under all previous incarnations of CIF, it was safe to assume that no
> >>>>> artefacts of syntactical representation were left in the string, so
> >>>>> the string had purely domain-specific meaning. However, with the
> >>>>> introduction of raw strings, <backslash><delimiter> will escape the
> >>>>> delimiter, but the <backslash> is required to remain in the string.
> >>>>> So the downstream application must decide between artefacts of the
> >>>>> syntactical representation (<backslash><delimiter>) that have remained
> >>>>> in raw strings, and domain-specific character sequences
> >>>>> (<backslash><delimiter>). Here those examples are again (remember
> >>>>> this is the character sequence after syntactic processing):
> >>>>>
> >>>>> <start> I have no idea what the last characters of this string
> >>>>> are\"<finish>
> >>>>> <start> Does this string have two\""" or three internal quotes?<finish>
> >>>>>
> >>>>> Assume the domain-specific meaning of <backslash><quote> when found in
> >>>>> a datavalue is to accent the letter preceding the <backslash>.
> >>>>>
> >>>>> Does the first string finish with a double quote, or with an accented e?
> >>>>> Does the second string contain an accented o, followed by two double
> >>>>> quotes, or a letter o followed by three quotes?
> >>>>>
> >>>>>
> >>>>> On Wed, Feb 23, 2011 at 8:01 AM, SIMON WESTRIP
> >>>>> <simonwestrip@btinternet.com> wrote:
> >>>>>>
> >>>>>> Dear all
> >>>>>>
> >>>>>> Reviewing the exchanges in this thread ("Technical issues with Proposal
> >>>>>> P"),
> >>>>>> it seems that
> >>>>>> the 'technical issues' might better be described as 'potentially
> >>>>>> confusing
> >>>>>> issues' :-)
> >>>>>> That is, under proposal P, there is no ambiguity about how the string
> >>>>>> should
> >>>>>> be read, but
> >>>>>> there is potential for misinterpretation by the user (e.g. an erroneous
> >>>>>> assumption that by using a backslash
> >>>>>> to escape a quotation mark, the backslash will not be included as part
> >>>>>> of
> >>>>>> the parsed data value (in the raw variant)).
> >>>>>> So, as John says, perhaps this simply demonstrates that "the complexity
> >>>>>> of
> >>>>>> the syntax and semantics
> >>>>>> provided by proposal P would be likely to be a source of confusion for
> >>>>>> developers and users both", and maybe
> >>>>>> therein lies the merit of this particular thread? It reinforces those
> >>>>>> arguements against proposal P that suggest
> >>>>>> that the introduction of a more complex syntax for one of the delimiter
> >>>>>> types is a potential source of
> >>>>>> confusion for many existing CIF users.
> >>>>>>
> >>>>>> Cheers
> >>>>>>
> >>>>>> Simon
> >>>>>> ________________________________
> >>>>>> From: Herbert J. Bernstein <yaya@bernstein-plus-sons.com>
> >>>>>> To: Group finalising DDLm and associated dictionaries
> >>>>>> <ddlm-group@iucr.org>
> >>>>>> Sent: Tuesday, 22 February, 2011 20:22:57
> >>>>>> Subject: Re: [ddlm-group] Technical issues with Proposal P
> >>>>>>
> >>>>>> Dear Simon,
> >>>>>>
> >>>>>> I make mistakes on this, too. That is why I like having IDLE
> >>>>>> handy and sticking to Python syntax.
> >>>>>>
> >>>>>> Regards,
> >>>>>> Herbert
> >>>>>>
> >>>>>> =====================================================
> >>>>>> Herbert J. Bernstein, Professor of Computer Science
> >>>>>> Dowling College, Kramer Science Center, KSC 121
> >>>>>> Idle Hour Blvd, Oakdale, NY, 11769
> >>>>>>
> >>>>>> +1-631-244-3035
> >>>>>> yaya@dowling.edu
> >>>>>> =====================================================
> >>>>>>
> >>>>>> On Tue, 22 Feb 2011, SIMON WESTRIP wrote:
> >>>>>>
> >>>>>>> Dear Herbert - I've just realized I confused myself by misreading your
> >>>>>>> example
> >>>>>>> and treating it as equivalent to my own example! Sorry about that.
> >>>>>>>
> >>>>>>> Cheers
> >>>>>>>
> >>>>>>> Simon
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>__________________________________________________________________________________
> _____________________________________
> >>>>>>> From: SIMON WESTRIP <simonwestrip@btinternet.com>
> >>>>>>> To: Group finalising DDLm and associated dictionaries
> >>>>>>> <ddlm-group@iucr.org>
> >>>>>>> Sent: Tuesday, 22 February, 2011 14:51:03
> >>>>>>> Subject: Re: [ddlm-group] Technical issues with Proposal P
> > >>>>>>
> >>>>>>> Dear Herbert
> >>>>>>>
> >>>>>>> I'm still a bit confused. Following python semantics,
> >>>>>>> a CIF application reading the following items
> >>>>>>>
> >>>>>>> _item_a """C\""""
> >>>>>>> _item_b r"""C\""""
> >>>>>>>
> >>>>>>> should return values of
> >>>>>>>
> >>>>>>> C" for _item_a
> >>>>>>> C\" for _item_b
> >>>>>>>
> >>>>>>> Are you suggesting that the application should then *assume* that in
> >>>>>>> the
> >>>>>>> case of
> >>>>>>> _item_b the use of the backslash was purely to escape the final quote
> >>>>>>> and
> >>>>>>> should
> >>>>>>> discard the backslash from the value, thus assuming a value of C" ?
> >>>>>>>
> >>>>>>> Cheers
> >>>>>>>
> >>>>>>> Simon
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>__________________________________________________________________________________
> _____________________________________
> >>>>>>> From: Herbert J. Bernstein <yaya@bernstein-plus-sons.com>
> >>>>>>> To: Group finalising DDLm and associated dictionaries
> >>>>>>> <ddlm-group@iucr.org>
> >>>>>>> Sent: Tuesday, 22 February, 2011 13:51:02
> >>>>>>> Subject: Re: [ddlm-group] Technical issues with Proposal P
> >>>>>>>
> >>>>>>> Dear Simon,
> >>>>>>>
> >>>>>>> From the point of view of writing a pure "CIF2" application
> >>>>>>> that is not aware of the whitespace, particular quote marks,
> >>>>>>> comments, etc, those two string are identical.
> >>>>>>>
> >>>>>>> From the point of view of a more general CIF API, in which
> >>>>>>> comments, magic numbers, and partiular quote marks, those
> >>>>>>> two string are different in precisely the same way that
> >>>>>>> the string 'ABC' and "ABC" are different, and 13.4 and
> >>>>>>> 1.34e1 are different.
> >>>>>>>
> >>>>>>> This is _not_ an ambiguity. It is a matter of whether
> >>>>>>> we are looking for the information in a file or looking
> >>>>>>> for the representations of the data in the file.
> >>>>>>>
> >>>>>>> Regards,
> >>>>>>> Herbert
> >>>>>>>
> >>>>>>>
> >>>>>>> =====================================================
> >>>>>>> Herbert J. Bernstein, Professor of Computer Science
> >>>>>>> Dowling College, Kramer Science Center, KSC 121
> >>>>>>> Idle Hour Blvd, Oakdale, NY, 11769
> >>>>>>>
> >>>>>>> +1-631-244-3035
> >>>>>>> yaya@dowling.edu
> >>>>>>> =====================================================
> >>>>>>>
> >>>>>>> On Tue, 22 Feb 2011, SIMON WESTRIP wrote:
> >>>>>>>
> >>>>>>>> So
> >>>>>>>> """\\\"""" and r"""\""""
> >>>>>>>> should strictly be treated as different, despite any recommendations
> >>>>>>>> you
> >>>>>>>> may
> >>>>>>>> have made to the contrary?
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>____________________________________________________________________________
> >>>>>>>> From: Herbert J. Bernstein <yaya@bernstein-plus-sons.com>
> >>>>>>>> To: Group finalising DDLm and associated dictionaries
> >>>>>>>> <ddlm-group@iucr.org>
> >>>>>>>> Sent: Tuesday, 22 February, 2011 12:46:57
> >>>>>>>> Subject: Re: [ddlm-group] Technical issues with Proposal P
> >>>>>>>>
> >>>>>>>>> So what is r"""C\"""" ?
> >>>>>>>>>
> >>>>>>>>> Is it C\" or is it C" ?
> >>>>>>>>
> >>>>>>>> """C\"""" is C"
> >>>>>>>>
> >>>>>>>> r"""C\"""" is C\"
> >>>>>>>>
> >>>>>>>> You can test this with IDLE. It is very clearly defined and
> >>>>>>>> reproducible Python string behavior, and I believe helps to make
> >>>>>>>> the case for sticking to the Python approach. It is very easy
> >>>>>>>> for any software developer or user to work out how the boundary
> >>>>>>>> cases are being handled.
> >>>>>>>>
> >>>>>>>> Regards,
> >>>>>>>> Herbert
> >>>>>>>>
> >>>>>>>> =====================================================
> >>>>>>>> Herbert J. Bernstein, Professor of Computer Science
> >>>>>>>> Dowling College, Kramer Science Center, KSC 121
> >>>>>>>> Idle Hour Blvd, Oakdale, NY, 11769
> >>>>>>>>
> >>>>>>>> +1-631-244-3035
> >>>>>>>> yaya@dowling.edu
> >>>>>>>> =====================================================
> >>>>>>>>
> >>>>>>>> On Tue, 22 Feb 2011, SIMON WESTRIP wrote:
> >>>>>>>>
> >>>>>>>>> I am a little confused:
> >>>>>>>>>
> >>>>>>>>> So what is r"""C\"""" ?
> >>>>>>>>>
> >>>>>>>>> Is it C\" or is it C" ?
> >>>>>>>>>
> >>>>>>>>> Python says it should be C\", so CIF2 should say its C\" if CIF2 is
> >>>>>>>>
> >>>>>>>> adopting
> >>>>>>>>>
> >>>>>>>>> Python?
> >>>>>>>>>
> >>>>>>>>> Or are you suggesting that we should adopt a fuzzy interpretation of
> > >>>>>>>
> >>>>>>>> Python?
> >>>>>>>>>
> >>>>>>>>> Cheers
> >>>>>>>>>
> >>>>>>>>> Simon
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>___________________________________________________________________________
> >>>>>>>>
> >>>>>>>> _
> >>>>>>>>>
> >>>>>>>>> From: Herbert J. Bernstein <yaya@bernstein-plus-sons.com>
> >>>>>>>>> To: Group finalising DDLm and associated dictionaries
> >>>>>>>>
> >>>>>>>> <ddlm-group@iucr.org>
> >>>>>>>>>
> >>>>>>>>> Sent: Tuesday, 22 February, 2011 12:01:23
> >>>>>>>>> Subject: Re: [ddlm-group] Technical issues with Proposal P
> >>>>>>>>>
> >>>>>>>>> Dear Colleagues,
> >>>>>>>>>
> >>>>>>>>> Working under the assumption of Ralf's proposal, rather
> >>>>>>>>> than Simon's, we have several very distinct string presentaions
> >>>>>>>>> to consider: a (non-raw) treble quoted string, a raw treble
> >>>>>>>>> quoted string a unicode treble quoted string and a raw unicode
> >>>>>>>>> treble quoted string. As for Python 3, under CIF2, because
> >>>>>>>>> the "native" character encoding is UTF-8, under reasonable coding
> >>>>>>>>> constraints, this collapses to just two cases the application
> >>>>>>>>> needs to deal with: non-raw (i.e. cooked) versus raw. The intent
> >>>>>>>>> of
> >>>>>>>>> the cooked is for the lexer to process the elides, so the response
> >>>>>>>>> I gave is, I believe correct -- just push the string through IDLE.
> >>>>>>>>> The intent of the raw is precisely to push through the string
> >>>>>>>>> with the backslahes still in place, e.g. for TeX text in which
> >>>>>>>>> you don't want to double-up your backslashes. While I personally
> >>>>>>>>> would recommend against such a use of raw, it is not ambiguous.
> >>>>>>>>> It gives the application a very well-defined string of characters
> >>>>>>>>> to deal with. Yes, there are applications that are intended to
> >>>>>>>>> deal with CIF with the encoding exposed (e.g. cif2cbf, cif2cif,
> >>>>>>>>> etc.)
> >>>>>>>>> bit, I agree that the cleanest design is for an application to
> >>>>>>>>> only make use of the string content, not the representation.
> >>>>>>>>>
> >>>>>>>>> Thus, for most applications, I would recommend that they treat
> >>>>>>>>>
> >>>>>>>>> """\\\"""" and r"""\""""
> >>>>>>>>>
> >>>>>>>>> as equivalent, but for applications that are, for example,
> >>>>>>>>> intended to do faithful copies of the representations that
> >>>>>>>>> they treat them as different.
> >>>>>>>>>
> >>>>>>>>> We have had, and will continue to have this subtle problem
> >>>>>>>>> with all versions of CIF in the handling of things such as
> >>>>>>>>> magic number, comments, white space, line folding, and choices
> >>>>>>>>> of quoting characters. I don't see how the introduction of
> >>>>>>>>> the Python treble quote makes the situation any worse or
> >>>>>>>>> any more or less ambiguous.
> >>>>>>>>>
> >>>>>>>>> Regards,
> >>>>>>>>> Herbert
> >>>>>>>>>
> >>>>>>>>> =====================================================
> >>>>>>>>> Herbert J. Bernstein, Professor of Computer Science
> >>>>>>>>> Dowling College, Kramer Science Center, KSC 121
> >>>>>>>>> Idle Hour Blvd, Oakdale, NY, 11769
> >>>>>>>>>
> >>>>>>>>> +1-631-244-3035
> >>>>>>>>> yaya@dowling.edu
> >>>>>>>>> =====================================================
> >>>>>>>>>
> >>>>>>>>> On Tue, 22 Feb 2011, James Hester wrote:
> >>>>>>>>>
> >>>>>>>>>> I will focus this email on the technical issues and try to return
> >>>>>>>>>> to
> >>>>>>>>>> the other issues at a later date (I've changed the subject
> >>>>>>>>>> accordingly)
> >>>>>>>>>>
> >>>>>>>>>> [edit]
> >>>>>>>>>>
> >>>>>>>>>> My apologies for not being clear: my examples of embedded elides
> >>>>>>>>>> already give the internal representation of the strings,
> >>>>>>>>>> deliberately
> >>>>>>>>>> leaving out the particular delimiters that might have been used to
> >>>>>>>>>> produce those strings. Herbert mistakenly thought I was giving
> >>>>>>>>>> triple-double-quote delimited strings and asking what the internal
> >>>>>>>>>> representation was. So, unfortunately, IDLE cannot help here, as
> >>>>>>>>>> the
> >>>>>>>>>> internal representation is not in question.
> >>>>>>>>>>
> >>>>>>>>>> My question therefore remains: how does the CIF application
> >>>>>>>>>> interpret
> >>>>>>>>>> these strings? Is the <backslash><delimiter> in my examples simply
> > >>>>>>>>> an
> >>>>>>>>>> elide that could not be removed from a raw string and therefore
> >>>>>>>>>> should
> >>>>>>>>>> be ignored, or is it a character sequence intended for the
> >>>>>>>>>> application
> >>>>>>>>>> (eg a LaTeX accent on the o or e)?
> >>>>>>>>>>
> >>>>>>>>>> In your answer you may assume that the CIF application knows that
> >>>>>>>>>> the
> >>>>>>>>>> string was a raw string delimited by triple double quotes (even
> >>>>>>>>>> though
> >>>>>>>>>> requiring communication of such information would be a very
> >>>>>>>>>> unfortunate loss of clean design).
> >>>>>>>>>>
> >>>>>>>>>> Those strings again:
> >>>>>>>>>>
> >>>>>>>>>> <start> I have no idea what the last characters of this string
> >>>>>>>>>
> >>>>>>>>> are\"<finish>
> >>>>>>>>>>
> >>>>>>>>>> <start> Does this string have two\""" or three internal
> >>>>>>>>>> quotes?<finish>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> Herbert writes:
> >>>>>>>>>>>
> >>>>>>>>>>> Now for your two examples of embedded elides of quotes:
> >>>>>>>>>>>
> >>>>>>>>>>> <start> I have no idea what the last characters of this string
> >>>>>>>>>
> >>>>>>>>> are\"<finish>
> >>>>>>>>>>>
> >>>>>>>>>>> is, internally, as a C-string
> >>>>>>>>>>>
> >>>>>>>>>>> I have no idea what the last characters of this string are"\0
> >>>>>>>>>>>
> >>>>>>>>>>> <start> Does this string have two\""" or three internal
> >>>>>>>>>>> quotes?<finish>
> >>>>>>>>>>>
> >>>>>>>>>>> is, internally as a C-string
> >>>>>>>>>>>
> >>>>>>>>>>> Does this string have two""" or three internal quotes?\0
> >>>>>>>>>>>
> >>>>>>>>>>> I settled that by simply cranking up IDLE and doing:
> >>>>>>>>>>>
> >>>>>>>>>>>>>> print """I have no idea what the last characters of this
> >>>>>>>>>>>>>> string
> >>>>>>>>>>>>>> are\"""" I have no idea what the last characters of this string
> >>>>>>>>>>>>>> are" >>> print """Does this string have two\""" or three
> >>>>>>>>>>>>>> internal
> >>>>>>>>>>>>>> quotes?""" Does this string have two""" or three internal
> >>>>>>>>>>>>>> quotes?
> >>>>>>>>>>>
> >>>>>>>>>>> As you well know, having IDLE around is a big help.
> >>>>>>>>>>>
> >>>>>>>>>>> Thank you again for taking the time to clarify your position
> >>>>>>>>>>> on Ralf's proposal. I think I now understand why you prefer
> >>>>>>>>>>> Simon's
> >>>>>>>>>>> proposal.
> >>>>>>>>>>>
> >>>>>>>>>>> Regards,
> >>>>>>>>>>> Herbert
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>>> One technical issue with Proposal P that has not been resolved is
> >>>>>>>>>>>> how
> >>>>>>>>>>>> a CIF application is supposed to interpret the sequence
> >>>>>>>>>>>> <backslash><double quote> when encountered in a string returned
> >>>>>>>>>>>> from
> >>>>>>>>>>>> the parser. Is this sequence:
> >>>>>>>>>>>> (a) a terminator elide sequence that was left in a raw string, so
> >>>>>>>>>>>> corresponds to <double quote>?
> >>>>>>>>>>>> (b) something with meaning for the application so should be
> >>>>>>>>>>>> <backslash><double quote>?
> >>>>>>>>>>>>
> >>>>>>>>>>>> Please therefore advise how a CIF application will disambiguate
> >>>>>>>>>>>> the
> >>>>>>>>>>>> following string content from a Proposal P parser:
> >>>>>>>>>>>>
> >>>>>>>>>>>> <start> I have no idea what the last characters of this string
> >>>>>>>>>
> >>>>>>>>> are\"<finish>
> >>>>>>>>>>>>
> >>>>>>>>>>>> <start> Does this string have two\""" or three internal
> >>>>>>>>
> >>>>>>>> quotes?<finish>
> >>>>>>>>>>>>
> >>>>>>>>>>>> James
> >>>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> --
> >>>>>>>>>> T +61 (02) 9717 9907
> >>>>>>>>>> F +61 (02) 9717 3145
> >>>>>>>>>> M +61 (04) 0249 4148
> >>>>>>>>>> _______________________________________________
> >>>>>>>>>> ddlm-group mailing list
> >>>>>>>>>> ddlm-group@iucr.org
> >>>>>>>>>> http://scripts.iucr.org/mailman/listinfo/ddlm-group
> >>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>> _______________________________________________
> >>>>>> ddlm-group mailing list
> >>>>>> ddlm-group@iucr.org
> >>>>>> http://scripts.iucr.org/mailman/listinfo/ddlm-group
> >>>>>>
> >>>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>> --
> >>>>> T +61 (02) 9717 9907
> >>>>> F +61 (02) 9717 3145
> >>>>> M +61 (04) 0249 4148
> >>>>> _______________________________________________
> >>>>> ddlm-group mailing list
> >>>>> ddlm-group@iucr.org
> >>>>> http://scripts.iucr.org/mailman/listinfo/ddlm-group
> >>>>
> >>>> _______________________________________________
> > >>> ddlm-group mailing list
> >>>> ddlm-group@iucr.org
> >>>> http://scripts.iucr.org/mailman/listinfo/ddlm-group
> >>>>
> >>>>
> >>>
> >>>
> >>>
> >>> --
> >>> T +61 (02) 9717 9907
> >>> F +61 (02) 9717 3145
> >>> M +61 (04) 0249 4148
> >>> _______________________________________________
> >>> ddlm-group mailing list
> >>> ddlm-group@iucr.org
> >>> http://scripts.iucr.org/mailman/listinfo/ddlm-group
> >>
> >> _______________________________________________
> >> ddlm-group mailing list
> >> ddlm-group@iucr.org
> >> http://scripts.iucr.org/mailman/listinfo/ddlm-group
> >>
> >>
> >
> >
> >
> >--
> >T +61 (02) 9717 9907
> >F +61 (02) 9717 3145
> >M +61 (04) 0249 4148
> >_______________________________________________
> >ddlm-group mailing list
> >ddlm-group@iucr.org
> >http://scripts.iucr.org/mailman/listinfo/ddlm-group
>
>
> --
> =====================================================
> Herbert J. Bernstein, Professor of Computer Science
> Dowling College, Kramer Science Center, KSC 121
> Idle Hour Blvd, Oakdale, NY, 11769
>
> +1-631-244-3035
> yaya@dowling.edu
> =====================================================
> _______________________________________________
> ddlm-group mailing list
> ddlm-group@iucr.org
> http://scripts.iucr.org/mailman/listinfo/ddlm-group
>
>
anything new in terms of raw strings (i.e. all other delimiters
delimit raw strings) - rather we are giving people the opportunity to
use 'cooked' strings. If this boils down to a matter of taste, I'm not
convinced it justifies the potential confusion for users or the extra
burden on developers.
Cheers
Simon
From: Herbert J. Bernstein <yaya@bernstein-plus-sons.com>
To: Group finalising DDLm and associated dictionaries <ddlm-group@iucr.org>
Sent: Thursday, 24 February, 2011 13:02:08
Subject: Re: [ddlm-group] Technical issues with Proposal P
Dear Simon,
Yes, the closest approximation to the current line folding
would be a cooked python style treble-quoted string.
The main use for the raw strings is for people who don't
like having to double-up backslashes to present things like
TeX. Not my taste, but some people like it, and there
is no downside that I can see in giving them the capability.
Regards,
Herbert
=====================================================
Herbert J. Bernstein, Professor of Computer Science
Dowling College, Kramer Science Center, KSC 121
Idle Hour Blvd, Oakdale, NY, 11769
+1-631-244-3035
yaya@dowling.edu
=====================================================
On Thu, 24 Feb 2011, SIMON WESTRIP wrote:
> A possible minor inconvenience of proposal P:
>
> Given that CIF strings are essentially 'raw' as CIF2 now stands (and CIF1 strings too can
> be reconciled with the raw variant), and as I understand it python raw strings do not
> support line continuation,
> under proposal P a string will have to be 'cooked' in order to employ line folding?
>
> Please forgive me if these seems a little trivial, but I am really struggling to see any
> benefit
> in adopting proposal P, especially for the end user. Maybe someone can help by providing
> an example where the use of cooked strings will make life easier for the end user?
>
> Cheers
>
> Simon
>
> __________________________________________________________________________________________
> From: SIMON WESTRIP <simonwestrip@btinternet.com>
> To: Group finalising DDLm and associated dictionaries <ddlm-group@iucr.org>
> Sent: Wednesday, 23 February, 2011 13:06:15
> Subject: Re: [ddlm-group] Technical issues with Proposal P
>
> I don't know if this will help much in making a choice about proposal P, but
> it might be worth looking at current practice in the case of one CIF user group -
> namely authors submitting CIFs to journals.
>
> In many respects an F-type scheme has been used for preparing the text sections of
> CIFs for publication for many years. The backslash is used to escape accents and greek
> letters
> as well as itself, e.g. \"u for uumul, \a for alpha, \\a for \a...
> In addition, we have the line-folding protocol, although that is rarely necessary and
> very rarely applied manually. Although these 'common semantic features' will not be
> a part of CIF2, they may well remain in use at the application level. Under scheme F,
> a field containing this markup delimited by semicolons could readily be dropped
> into a field delimited by tripple quotes (though double-backslash control sequences
> would be returned as single-backslash control sequences - but fortunately there are few of
> these in use, e.g. \\rightarrow...). Under a P-type proposal, extra care would be required
> when choosing between the 'raw' and 'cooked' variants before dumping the contents of
> a semi-colon delimited field into a tripple-quoted field.
>
> I've mentioned before that handling the transition from CIF1's 'common semantic features'
> to CIF2 unicode will require care in any case; I've yet to be convinced that complex
> python semantics will help here, nor offer any real benefit in general, given that by
> only adopting them for one means of delimiting a data value, great care has to be taken
> when
> switching delimiters (and for no obvious reason or benefit if you're only concern when
> working with a
> raw CIF is to complete it for publication purposes).
>
> Cheers
>
> Simon
>
> __________________________________________________________________________________________
> From: Herbert J. Bernstein <yaya@bernstein-plus-sons.com>
> To: Group finalising DDLm and associated dictionaries <ddlm-group@iucr.org>
> Sent: Wednesday, 23 February, 2011 4:40:39
> Subject: Re: [ddlm-group] Technical issues with Proposal P
>
> Dear James,
>
> I don't see any reason to disagree with your summary of limitations
> on the use of raw strings. That is why I find it easier to use cooked
> strings. But there are people who like them, so, since, once you
> have Python-style triple quotes at all, it is easy to support them,
> I would be inclined to do so, rather than have a one-size fits all
> solution.
>
> The real question is whether to support the cooked Python strings
> with all Python elides or the more limited set of elides in proposal
> F. I repeat my suggestion that people try working with both as
> I have for years, and we can see which proves more or less confusing
> in what situations. I have always found the mixing of things
> like TeX with its backslashes with the line folding protocol without
> doubling up the TeX backslashes very confusing. Maybe it is just the
> way my head works. Maybe other people will have a different view.
> We won't know until people other than me try it.
>
> Regards,
> Herbert
>
> At 1:35 PM +1100 2/23/11, James Hester wrote:
> >I absolutely agree that it is up to the application to decide on the
> >meanings of strings given to it, with reference to the dictionary. I
> >am very happy that we agree on this iron separation between syntax and
> >content.
> >
> >In a situation that <backslash><delimiter> has meaning to a CIF
> >application, it follows that you cannot in general use raw strings to
> >express a data value that
> >(i) contains both triple double quotes and triple single quotes
> >(ii) contains triple double quotes and terminates with a single quote
> >(iii) contains triple single quotes and terminates with a double quote
> >
> >Furthermore, you cannot use triple-double-quote delimited raw strings
> >to delimit a string terminating in a double quote and/or containing
> >triple double quotes, likewise for single quote.
> >
> >These idiosyncracies would need to be documented if we were to adopt
> >Proposal P, and we would need to be confident that CIF2 implementers
> >and users would not make inadvertent errors in their selection of
> >quotes and string types. As the manifestation of an error would
> >typically be nothing more than a stray accent, there is no way, beyond
> >careful proof-reading, that such mistakes would be caught.
> >
> >Proposals F and F' give less opportunity for error and are simpler to use.
> >
> >On Wed, Feb 23, 2011 at 11:27 AM, Herbert J. Bernstein
> ><yaya@bernstein-plus-sons.com> wrote:
> >> Dear James,
> >>
> >> I am really lost here. I believe it is up to the application
> >> to decide on the meaning of strings given as tag value, hopefully
> >> using a dictionary to inform that decision. I really don't
> >> see what the use of r""" versus """ versus ; versus ' versus "
> >> to get the string into its internal form has to do with its
> >> meaning to the application, unless the application is one
> >> of these CIF copy/transform applications that violate some
> >> of the CIF rules to see through to the original represenatation
> >> rather than stopping with the data, and then we are moving outside the
> >> rules of CIF itself to more general text processing.
> >>
> >> As I said, one of the nicer uses for the raw treble quote strings
> >> is to bring TeX into an application without having to double-up
> >> backslashes. That is a very clear case in which the application
> >> have very different backslash processing than Python and you
> >> want to suppress most of the Python processing. If what you
> >> are trying to do is to elide the quote marks, then you will have
> >> an easier time using the regular treble quotes.
> >>
> >> Regards,
> >> Herbert
> >> =====================================================
> >> Herbert J. Bernstein, Professor of Computer Science
> >> Dowling College, Kramer Science Center, KSC 121
> >> Idle Hour Blvd, Oakdale, NY, 11769
> >>
> >> +1-631-244-3035
> > > yaya@dowling.edu
> > > =====================================================
> >>
> >> On Wed, 23 Feb 2011, James Hester wrote:
> >>
> >>> Dear Herbert,
> >>>
> >>> Because raw strings must retain any eliding backslashes in the string
> >>> (unlike cooked strings), a backslash in the internal string
> >>> representation may indeed be an artefact of the syntax proposed in
> >>> Proposal P. Or might not. The application can't always tell. See my
> >>> other email for a way to resolve this.
> >>>
> >>> If everything is so clear, could you please just answer the following
> >>> rephrased questions? "The CIF application" refers to an application
> >>> for which <backslash><delimiter> means "accent the letter preceding
> >>> the backslash".
> >>>
> >>> Should the CIF application interpret the first string as finishing
> >>> with a double quote, or with an accented e?
> >>> Should the CIF application interpret the second string as containing
> >>> an accented o, followed by two double quotes, or a letter o followed
> >>> by three quotes?
> >>>
> >>> On Wed, Feb 23, 2011 at 10:16 AM, Herbert J. Bernstein
> >>> <yaya@bernstein-plus-sons.com> wrote:
> >>>>
> >>>> Dear James,
> >>>>
> >>>> I still don't understand. Neither python nor I think \"
> >>>> from a raw string is an artifact of anything. It is
> >>>> just a backslash followed by a double quotemark. The
> >>>> point of the raw string is to provide a quick and
> >>>> convenient way to input something like TeX without
> >>>> having to double-up the backsashes. Personally, I am
> >>>> happy to double up the backslashes, but I can see the
> >>>> value to people who have to deal with lots of TeX in
> >>>> not needing to do so.
> >>>>
> >>>>> Does the first string finish with a double quote, or with an accented e?
> >>>>> Does the second string contain an accented o, followed by two double
> >>>>> quotes, or a letter o followed by three quotes?
> >>>>
> >>>> are not questions related to the quoting mechanism used, but
> >>>> purely to the application. Working purely in CIF1.1 all
> >>>> of the following are equivalent, external representations:
> >>>>
> >>>> Set 1
> >>>> ;\
> >>>> I have no idea what the last characters of this string are\"\
> >>>> ;
> >>>> 'I have no idea what the last characters of this string are\"'
> >>>> "I have no idea what the last characters of this string are\""
> >>>>
> >>>> and in all cases the last 2 characters are backslash followed by
> >>>> double quote
> >>>>
> >>>> Set 2
> >>>> ;\
> >>>> Does this string have two\""" or three internal quotes?\
> >>>> ;
> >>>> 'Does this string have two\""" or three internal quotes?'
> >>>>
> >>>> and in both cases there are three internal quotes
> >>>>
> >>>> I don't see how this differs in any way from
> >>>>
> >>>> r'''I have no idea what the last characters of this string are\"'''
> >>>> or
> >>>> '''I have no idea what the last characters of this string are\\"'''
> >>>> or
> >>>> """I have no idea what the last characters of this string are\\\""""
> >>>>
> >>>> and
> >>>>
> >>>> r'''Does this string have two\""" or three internal quotes?'''
> >>>> or
> >>>> '''Does this string have two\\""" or three internal quotes?'''
> >>>> or
> >>>> """Does this string have two\\\"\"\" or three internal quotes?"""
> >>>>
> >>>> There are very real problems with the raw string that are noted
> >>>> in the Pyhton documentation, but they do have their uses. This
> >>>> ambiguity is not one of the problems.
> >>>>
> >>>> Regards,
> >>>> Herbert
> >>>>
> >>>> =====================================================
> >>>> Herbert J. Bernstein, Professor of Computer Science
> >>>> Dowling College, Kramer Science Center, KSC 121
> >>>> Idle Hour Blvd, Oakdale, NY, 11769
> >>>>
> >>>> +1-631-244-3035
> >>>> yaya@dowling.edu
> >>>> =====================================================
> >>>>
> >>>> On Wed, 23 Feb 2011, James Hester wrote:
> >>>>
> >>>>> I am trying to focus relentlessly on a particular and very real
> >>>>> technical issue. I repeat that I am not concerned about the
> >>>>> transformation from surface syntax to a sequence of characters. I
> >>>>> accept that that is well-defined and unambiguous for all proposals on
> >>>>> the table. If you think that IDLE can resolve this problem, you
> >>>>> haven't understood my question.
> >>>>>
> >>>>> My question relates to the next step: how does the CIF application
> > >>>> downstream from the parser interpret this sequence of characters?
> >>>>> Under all previous incarnations of CIF, it was safe to assume that no
> >>>>> artefacts of syntactical representation were left in the string, so
> >>>>> the string had purely domain-specific meaning. However, with the
> >>>>> introduction of raw strings, <backslash><delimiter> will escape the
> >>>>> delimiter, but the <backslash> is required to remain in the string.
> >>>>> So the downstream application must decide between artefacts of the
> >>>>> syntactical representation (<backslash><delimiter>) that have remained
> >>>>> in raw strings, and domain-specific character sequences
> >>>>> (<backslash><delimiter>). Here those examples are again (remember
> >>>>> this is the character sequence after syntactic processing):
> >>>>>
> >>>>> <start> I have no idea what the last characters of this string
> >>>>> are\"<finish>
> >>>>> <start> Does this string have two\""" or three internal quotes?<finish>
> >>>>>
> >>>>> Assume the domain-specific meaning of <backslash><quote> when found in
> >>>>> a datavalue is to accent the letter preceding the <backslash>.
> >>>>>
> >>>>> Does the first string finish with a double quote, or with an accented e?
> >>>>> Does the second string contain an accented o, followed by two double
> >>>>> quotes, or a letter o followed by three quotes?
> >>>>>
> >>>>>
> >>>>> On Wed, Feb 23, 2011 at 8:01 AM, SIMON WESTRIP
> >>>>> <simonwestrip@btinternet.com> wrote:
> >>>>>>
> >>>>>> Dear all
> >>>>>>
> >>>>>> Reviewing the exchanges in this thread ("Technical issues with Proposal
> >>>>>> P"),
> >>>>>> it seems that
> >>>>>> the 'technical issues' might better be described as 'potentially
> >>>>>> confusing
> >>>>>> issues' :-)
> >>>>>> That is, under proposal P, there is no ambiguity about how the string
> >>>>>> should
> >>>>>> be read, but
> >>>>>> there is potential for misinterpretation by the user (e.g. an erroneous
> >>>>>> assumption that by using a backslash
> >>>>>> to escape a quotation mark, the backslash will not be included as part
> >>>>>> of
> >>>>>> the parsed data value (in the raw variant)).
> >>>>>> So, as John says, perhaps this simply demonstrates that "the complexity
> >>>>>> of
> >>>>>> the syntax and semantics
> >>>>>> provided by proposal P would be likely to be a source of confusion for
> >>>>>> developers and users both", and maybe
> >>>>>> therein lies the merit of this particular thread? It reinforces those
> >>>>>> arguements against proposal P that suggest
> >>>>>> that the introduction of a more complex syntax for one of the delimiter
> >>>>>> types is a potential source of
> >>>>>> confusion for many existing CIF users.
> >>>>>>
> >>>>>> Cheers
> >>>>>>
> >>>>>> Simon
> >>>>>> ________________________________
> >>>>>> From: Herbert J. Bernstein <yaya@bernstein-plus-sons.com>
> >>>>>> To: Group finalising DDLm and associated dictionaries
> >>>>>> <ddlm-group@iucr.org>
> >>>>>> Sent: Tuesday, 22 February, 2011 20:22:57
> >>>>>> Subject: Re: [ddlm-group] Technical issues with Proposal P
> >>>>>>
> >>>>>> Dear Simon,
> >>>>>>
> >>>>>> I make mistakes on this, too. That is why I like having IDLE
> >>>>>> handy and sticking to Python syntax.
> >>>>>>
> >>>>>> Regards,
> >>>>>> Herbert
> >>>>>>
> >>>>>> =====================================================
> >>>>>> Herbert J. Bernstein, Professor of Computer Science
> >>>>>> Dowling College, Kramer Science Center, KSC 121
> >>>>>> Idle Hour Blvd, Oakdale, NY, 11769
> >>>>>>
> >>>>>> +1-631-244-3035
> >>>>>> yaya@dowling.edu
> >>>>>> =====================================================
> >>>>>>
> >>>>>> On Tue, 22 Feb 2011, SIMON WESTRIP wrote:
> >>>>>>
> >>>>>>> Dear Herbert - I've just realized I confused myself by misreading your
> >>>>>>> example
> >>>>>>> and treating it as equivalent to my own example! Sorry about that.
> >>>>>>>
> >>>>>>> Cheers
> >>>>>>>
> >>>>>>> Simon
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>__________________________________________________________________________________
> _____________________________________
> >>>>>>> From: SIMON WESTRIP <simonwestrip@btinternet.com>
> >>>>>>> To: Group finalising DDLm and associated dictionaries
> >>>>>>> <ddlm-group@iucr.org>
> >>>>>>> Sent: Tuesday, 22 February, 2011 14:51:03
> >>>>>>> Subject: Re: [ddlm-group] Technical issues with Proposal P
> > >>>>>>
> >>>>>>> Dear Herbert
> >>>>>>>
> >>>>>>> I'm still a bit confused. Following python semantics,
> >>>>>>> a CIF application reading the following items
> >>>>>>>
> >>>>>>> _item_a """C\""""
> >>>>>>> _item_b r"""C\""""
> >>>>>>>
> >>>>>>> should return values of
> >>>>>>>
> >>>>>>> C" for _item_a
> >>>>>>> C\" for _item_b
> >>>>>>>
> >>>>>>> Are you suggesting that the application should then *assume* that in
> >>>>>>> the
> >>>>>>> case of
> >>>>>>> _item_b the use of the backslash was purely to escape the final quote
> >>>>>>> and
> >>>>>>> should
> >>>>>>> discard the backslash from the value, thus assuming a value of C" ?
> >>>>>>>
> >>>>>>> Cheers
> >>>>>>>
> >>>>>>> Simon
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>__________________________________________________________________________________
> _____________________________________
> >>>>>>> From: Herbert J. Bernstein <yaya@bernstein-plus-sons.com>
> >>>>>>> To: Group finalising DDLm and associated dictionaries
> >>>>>>> <ddlm-group@iucr.org>
> >>>>>>> Sent: Tuesday, 22 February, 2011 13:51:02
> >>>>>>> Subject: Re: [ddlm-group] Technical issues with Proposal P
> >>>>>>>
> >>>>>>> Dear Simon,
> >>>>>>>
> >>>>>>> From the point of view of writing a pure "CIF2" application
> >>>>>>> that is not aware of the whitespace, particular quote marks,
> >>>>>>> comments, etc, those two string are identical.
> >>>>>>>
> >>>>>>> From the point of view of a more general CIF API, in which
> >>>>>>> comments, magic numbers, and partiular quote marks, those
> >>>>>>> two string are different in precisely the same way that
> >>>>>>> the string 'ABC' and "ABC" are different, and 13.4 and
> >>>>>>> 1.34e1 are different.
> >>>>>>>
> >>>>>>> This is _not_ an ambiguity. It is a matter of whether
> >>>>>>> we are looking for the information in a file or looking
> >>>>>>> for the representations of the data in the file.
> >>>>>>>
> >>>>>>> Regards,
> >>>>>>> Herbert
> >>>>>>>
> >>>>>>>
> >>>>>>> =====================================================
> >>>>>>> Herbert J. Bernstein, Professor of Computer Science
> >>>>>>> Dowling College, Kramer Science Center, KSC 121
> >>>>>>> Idle Hour Blvd, Oakdale, NY, 11769
> >>>>>>>
> >>>>>>> +1-631-244-3035
> >>>>>>> yaya@dowling.edu
> >>>>>>> =====================================================
> >>>>>>>
> >>>>>>> On Tue, 22 Feb 2011, SIMON WESTRIP wrote:
> >>>>>>>
> >>>>>>>> So
> >>>>>>>> """\\\"""" and r"""\""""
> >>>>>>>> should strictly be treated as different, despite any recommendations
> >>>>>>>> you
> >>>>>>>> may
> >>>>>>>> have made to the contrary?
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>____________________________________________________________________________
> >>>>>>>> From: Herbert J. Bernstein <yaya@bernstein-plus-sons.com>
> >>>>>>>> To: Group finalising DDLm and associated dictionaries
> >>>>>>>> <ddlm-group@iucr.org>
> >>>>>>>> Sent: Tuesday, 22 February, 2011 12:46:57
> >>>>>>>> Subject: Re: [ddlm-group] Technical issues with Proposal P
> >>>>>>>>
> >>>>>>>>> So what is r"""C\"""" ?
> >>>>>>>>>
> >>>>>>>>> Is it C\" or is it C" ?
> >>>>>>>>
> >>>>>>>> """C\"""" is C"
> >>>>>>>>
> >>>>>>>> r"""C\"""" is C\"
> >>>>>>>>
> >>>>>>>> You can test this with IDLE. It is very clearly defined and
> >>>>>>>> reproducible Python string behavior, and I believe helps to make
> >>>>>>>> the case for sticking to the Python approach. It is very easy
> >>>>>>>> for any software developer or user to work out how the boundary
> >>>>>>>> cases are being handled.
> >>>>>>>>
> >>>>>>>> Regards,
> >>>>>>>> Herbert
> >>>>>>>>
> >>>>>>>> =====================================================
> >>>>>>>> Herbert J. Bernstein, Professor of Computer Science
> >>>>>>>> Dowling College, Kramer Science Center, KSC 121
> >>>>>>>> Idle Hour Blvd, Oakdale, NY, 11769
> >>>>>>>>
> >>>>>>>> +1-631-244-3035
> >>>>>>>> yaya@dowling.edu
> >>>>>>>> =====================================================
> >>>>>>>>
> >>>>>>>> On Tue, 22 Feb 2011, SIMON WESTRIP wrote:
> >>>>>>>>
> >>>>>>>>> I am a little confused:
> >>>>>>>>>
> >>>>>>>>> So what is r"""C\"""" ?
> >>>>>>>>>
> >>>>>>>>> Is it C\" or is it C" ?
> >>>>>>>>>
> >>>>>>>>> Python says it should be C\", so CIF2 should say its C\" if CIF2 is
> >>>>>>>>
> >>>>>>>> adopting
> >>>>>>>>>
> >>>>>>>>> Python?
> >>>>>>>>>
> >>>>>>>>> Or are you suggesting that we should adopt a fuzzy interpretation of
> > >>>>>>>
> >>>>>>>> Python?
> >>>>>>>>>
> >>>>>>>>> Cheers
> >>>>>>>>>
> >>>>>>>>> Simon
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>___________________________________________________________________________
> >>>>>>>>
> >>>>>>>> _
> >>>>>>>>>
> >>>>>>>>> From: Herbert J. Bernstein <yaya@bernstein-plus-sons.com>
> >>>>>>>>> To: Group finalising DDLm and associated dictionaries
> >>>>>>>>
> >>>>>>>> <ddlm-group@iucr.org>
> >>>>>>>>>
> >>>>>>>>> Sent: Tuesday, 22 February, 2011 12:01:23
> >>>>>>>>> Subject: Re: [ddlm-group] Technical issues with Proposal P
> >>>>>>>>>
> >>>>>>>>> Dear Colleagues,
> >>>>>>>>>
> >>>>>>>>> Working under the assumption of Ralf's proposal, rather
> >>>>>>>>> than Simon's, we have several very distinct string presentaions
> >>>>>>>>> to consider: a (non-raw) treble quoted string, a raw treble
> >>>>>>>>> quoted string a unicode treble quoted string and a raw unicode
> >>>>>>>>> treble quoted string. As for Python 3, under CIF2, because
> >>>>>>>>> the "native" character encoding is UTF-8, under reasonable coding
> >>>>>>>>> constraints, this collapses to just two cases the application
> >>>>>>>>> needs to deal with: non-raw (i.e. cooked) versus raw. The intent
> >>>>>>>>> of
> >>>>>>>>> the cooked is for the lexer to process the elides, so the response
> >>>>>>>>> I gave is, I believe correct -- just push the string through IDLE.
> >>>>>>>>> The intent of the raw is precisely to push through the string
> >>>>>>>>> with the backslahes still in place, e.g. for TeX text in which
> >>>>>>>>> you don't want to double-up your backslashes. While I personally
> >>>>>>>>> would recommend against such a use of raw, it is not ambiguous.
> >>>>>>>>> It gives the application a very well-defined string of characters
> >>>>>>>>> to deal with. Yes, there are applications that are intended to
> >>>>>>>>> deal with CIF with the encoding exposed (e.g. cif2cbf, cif2cif,
> >>>>>>>>> etc.)
> >>>>>>>>> bit, I agree that the cleanest design is for an application to
> >>>>>>>>> only make use of the string content, not the representation.
> >>>>>>>>>
> >>>>>>>>> Thus, for most applications, I would recommend that they treat
> >>>>>>>>>
> >>>>>>>>> """\\\"""" and r"""\""""
> >>>>>>>>>
> >>>>>>>>> as equivalent, but for applications that are, for example,
> >>>>>>>>> intended to do faithful copies of the representations that
> >>>>>>>>> they treat them as different.
> >>>>>>>>>
> >>>>>>>>> We have had, and will continue to have this subtle problem
> >>>>>>>>> with all versions of CIF in the handling of things such as
> >>>>>>>>> magic number, comments, white space, line folding, and choices
> >>>>>>>>> of quoting characters. I don't see how the introduction of
> >>>>>>>>> the Python treble quote makes the situation any worse or
> >>>>>>>>> any more or less ambiguous.
> >>>>>>>>>
> >>>>>>>>> Regards,
> >>>>>>>>> Herbert
> >>>>>>>>>
> >>>>>>>>> =====================================================
> >>>>>>>>> Herbert J. Bernstein, Professor of Computer Science
> >>>>>>>>> Dowling College, Kramer Science Center, KSC 121
> >>>>>>>>> Idle Hour Blvd, Oakdale, NY, 11769
> >>>>>>>>>
> >>>>>>>>> +1-631-244-3035
> >>>>>>>>> yaya@dowling.edu
> >>>>>>>>> =====================================================
> >>>>>>>>>
> >>>>>>>>> On Tue, 22 Feb 2011, James Hester wrote:
> >>>>>>>>>
> >>>>>>>>>> I will focus this email on the technical issues and try to return
> >>>>>>>>>> to
> >>>>>>>>>> the other issues at a later date (I've changed the subject
> >>>>>>>>>> accordingly)
> >>>>>>>>>>
> >>>>>>>>>> [edit]
> >>>>>>>>>>
> >>>>>>>>>> My apologies for not being clear: my examples of embedded elides
> >>>>>>>>>> already give the internal representation of the strings,
> >>>>>>>>>> deliberately
> >>>>>>>>>> leaving out the particular delimiters that might have been used to
> >>>>>>>>>> produce those strings. Herbert mistakenly thought I was giving
> >>>>>>>>>> triple-double-quote delimited strings and asking what the internal
> >>>>>>>>>> representation was. So, unfortunately, IDLE cannot help here, as
> >>>>>>>>>> the
> >>>>>>>>>> internal representation is not in question.
> >>>>>>>>>>
> >>>>>>>>>> My question therefore remains: how does the CIF application
> >>>>>>>>>> interpret
> >>>>>>>>>> these strings? Is the <backslash><delimiter> in my examples simply
> > >>>>>>>>> an
> >>>>>>>>>> elide that could not be removed from a raw string and therefore
> >>>>>>>>>> should
> >>>>>>>>>> be ignored, or is it a character sequence intended for the
> >>>>>>>>>> application
> >>>>>>>>>> (eg a LaTeX accent on the o or e)?
> >>>>>>>>>>
> >>>>>>>>>> In your answer you may assume that the CIF application knows that
> >>>>>>>>>> the
> >>>>>>>>>> string was a raw string delimited by triple double quotes (even
> >>>>>>>>>> though
> >>>>>>>>>> requiring communication of such information would be a very
> >>>>>>>>>> unfortunate loss of clean design).
> >>>>>>>>>>
> >>>>>>>>>> Those strings again:
> >>>>>>>>>>
> >>>>>>>>>> <start> I have no idea what the last characters of this string
> >>>>>>>>>
> >>>>>>>>> are\"<finish>
> >>>>>>>>>>
> >>>>>>>>>> <start> Does this string have two\""" or three internal
> >>>>>>>>>> quotes?<finish>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> Herbert writes:
> >>>>>>>>>>>
> >>>>>>>>>>> Now for your two examples of embedded elides of quotes:
> >>>>>>>>>>>
> >>>>>>>>>>> <start> I have no idea what the last characters of this string
> >>>>>>>>>
> >>>>>>>>> are\"<finish>
> >>>>>>>>>>>
> >>>>>>>>>>> is, internally, as a C-string
> >>>>>>>>>>>
> >>>>>>>>>>> I have no idea what the last characters of this string are"\0
> >>>>>>>>>>>
> >>>>>>>>>>> <start> Does this string have two\""" or three internal
> >>>>>>>>>>> quotes?<finish>
> >>>>>>>>>>>
> >>>>>>>>>>> is, internally as a C-string
> >>>>>>>>>>>
> >>>>>>>>>>> Does this string have two""" or three internal quotes?\0
> >>>>>>>>>>>
> >>>>>>>>>>> I settled that by simply cranking up IDLE and doing:
> >>>>>>>>>>>
> >>>>>>>>>>>>>> print """I have no idea what the last characters of this
> >>>>>>>>>>>>>> string
> >>>>>>>>>>>>>> are\"""" I have no idea what the last characters of this string
> >>>>>>>>>>>>>> are" >>> print """Does this string have two\""" or three
> >>>>>>>>>>>>>> internal
> >>>>>>>>>>>>>> quotes?""" Does this string have two""" or three internal
> >>>>>>>>>>>>>> quotes?
> >>>>>>>>>>>
> >>>>>>>>>>> As you well know, having IDLE around is a big help.
> >>>>>>>>>>>
> >>>>>>>>>>> Thank you again for taking the time to clarify your position
> >>>>>>>>>>> on Ralf's proposal. I think I now understand why you prefer
> >>>>>>>>>>> Simon's
> >>>>>>>>>>> proposal.
> >>>>>>>>>>>
> >>>>>>>>>>> Regards,
> >>>>>>>>>>> Herbert
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>>> One technical issue with Proposal P that has not been resolved is
> >>>>>>>>>>>> how
> >>>>>>>>>>>> a CIF application is supposed to interpret the sequence
> >>>>>>>>>>>> <backslash><double quote> when encountered in a string returned
> >>>>>>>>>>>> from
> >>>>>>>>>>>> the parser. Is this sequence:
> >>>>>>>>>>>> (a) a terminator elide sequence that was left in a raw string, so
> >>>>>>>>>>>> corresponds to <double quote>?
> >>>>>>>>>>>> (b) something with meaning for the application so should be
> >>>>>>>>>>>> <backslash><double quote>?
> >>>>>>>>>>>>
> >>>>>>>>>>>> Please therefore advise how a CIF application will disambiguate
> >>>>>>>>>>>> the
> >>>>>>>>>>>> following string content from a Proposal P parser:
> >>>>>>>>>>>>
> >>>>>>>>>>>> <start> I have no idea what the last characters of this string
> >>>>>>>>>
> >>>>>>>>> are\"<finish>
> >>>>>>>>>>>>
> >>>>>>>>>>>> <start> Does this string have two\""" or three internal
> >>>>>>>>
> >>>>>>>> quotes?<finish>
> >>>>>>>>>>>>
> >>>>>>>>>>>> James
> >>>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> --
> >>>>>>>>>> T +61 (02) 9717 9907
> >>>>>>>>>> F +61 (02) 9717 3145
> >>>>>>>>>> M +61 (04) 0249 4148
> >>>>>>>>>> _______________________________________________
> >>>>>>>>>> ddlm-group mailing list
> >>>>>>>>>> ddlm-group@iucr.org
> >>>>>>>>>> http://scripts.iucr.org/mailman/listinfo/ddlm-group
> >>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>> _______________________________________________
> >>>>>> ddlm-group mailing list
> >>>>>> ddlm-group@iucr.org
> >>>>>> http://scripts.iucr.org/mailman/listinfo/ddlm-group
> >>>>>>
> >>>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>> --
> >>>>> T +61 (02) 9717 9907
> >>>>> F +61 (02) 9717 3145
> >>>>> M +61 (04) 0249 4148
> >>>>> _______________________________________________
> >>>>> ddlm-group mailing list
> >>>>> ddlm-group@iucr.org
> >>>>> http://scripts.iucr.org/mailman/listinfo/ddlm-group
> >>>>
> >>>> _______________________________________________
> > >>> ddlm-group mailing list
> >>>> ddlm-group@iucr.org
> >>>> http://scripts.iucr.org/mailman/listinfo/ddlm-group
> >>>>
> >>>>
> >>>
> >>>
> >>>
> >>> --
> >>> T +61 (02) 9717 9907
> >>> F +61 (02) 9717 3145
> >>> M +61 (04) 0249 4148
> >>> _______________________________________________
> >>> ddlm-group mailing list
> >>> ddlm-group@iucr.org
> >>> http://scripts.iucr.org/mailman/listinfo/ddlm-group
> >>
> >> _______________________________________________
> >> ddlm-group mailing list
> >> ddlm-group@iucr.org
> >> http://scripts.iucr.org/mailman/listinfo/ddlm-group
> >>
> >>
> >
> >
> >
> >--
> >T +61 (02) 9717 9907
> >F +61 (02) 9717 3145
> >M +61 (04) 0249 4148
> >_______________________________________________
> >ddlm-group mailing list
> >ddlm-group@iucr.org
> >http://scripts.iucr.org/mailman/listinfo/ddlm-group
>
>
> --
> =====================================================
> Herbert J. Bernstein, Professor of Computer Science
> Dowling College, Kramer Science Center, KSC 121
> Idle Hour Blvd, Oakdale, NY, 11769
>
> +1-631-244-3035
> yaya@dowling.edu
> =====================================================
> _______________________________________________
> ddlm-group mailing list
> ddlm-group@iucr.org
> http://scripts.iucr.org/mailman/listinfo/ddlm-group
>
>
_______________________________________________ ddlm-group mailing list ddlm-group@iucr.org http://scripts.iucr.org/mailman/listinfo/ddlm-group
Reply to: [list | sender only]
- Follow-Ups:
- Re: [ddlm-group] Technical issues with Proposal P (Herbert J. Bernstein)
- References:
- [ddlm-group] Technical issues with Proposal P (James Hester)
- Re: [ddlm-group] Technical issues with Proposal P (Herbert J. Bernstein)
- Re: [ddlm-group] Technical issues with Proposal P (SIMON WESTRIP)
- Re: [ddlm-group] Technical issues with Proposal P (Herbert J. Bernstein)
- Re: [ddlm-group] Technical issues with Proposal P (SIMON WESTRIP)
- Re: [ddlm-group] Technical issues with Proposal P (Herbert J. Bernstein)
- Re: [ddlm-group] Technical issues with Proposal P (SIMON WESTRIP)
- Re: [ddlm-group] Technical issues with Proposal P (SIMON WESTRIP)
- Re: [ddlm-group] Technical issues with Proposal P (Herbert J. Bernstein)
- Re: [ddlm-group] Technical issues with Proposal P (SIMON WESTRIP)
- Re: [ddlm-group] Technical issues with Proposal P (James Hester)
- Re: [ddlm-group] Technical issues with Proposal P (Herbert J. Bernstein)
- Re: [ddlm-group] Technical issues with Proposal P (James Hester)
- Re: [ddlm-group] Technical issues with Proposal P (Herbert J. Bernstein)
- Re: [ddlm-group] Technical issues with Proposal P (James Hester)
- Re: [ddlm-group] Technical issues with Proposal P (Herbert J. Bernstein)
- Re: [ddlm-group] Technical issues with Proposal P (SIMON WESTRIP)
- Re: [ddlm-group] Technical issues with Proposal P (SIMON WESTRIP)
- Re: [ddlm-group] Technical issues with Proposal P (Herbert J. Bernstein)
- Prev by Date: Re: [ddlm-group] Technical issues with Proposal P
- Next by Date: Re: [ddlm-group] Technical issues with Proposal P
- Prev by thread: Re: [ddlm-group] Technical issues with Proposal P
- Next by thread: Re: [ddlm-group] Technical issues with Proposal P
- Index(es):