[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Reply to: [list | sender only]
Re: [ddlm-group] Technical issues with Proposal P
- To: Group finalising DDLm and associated dictionaries <ddlm-group@iucr.org>
- Subject: Re: [ddlm-group] Technical issues with Proposal P
- From: "Herbert J. Bernstein" <yaya@bernstein-plus-sons.com>
- Date: Tue, 22 Feb 2011 15:22:57 -0500 (EST)
- In-Reply-To: <710426.91151.qm@web87002.mail.ird.yahoo.com>
- References: <AANLkTi=kadbHikjabDyioDOw=L_pthGORgi6w2b45yX6@mail.gmail.com><alpine.BSF.2.00.1102220644270.84613@epsilon.pair.com><417719.45449.qm@web87006.mail.ird.yahoo.com><alpine.BSF.2.00.1102220741480.84613@epsilon.pair.com><301639.7573.qm@web87001.mail.ird.yahoo.com><alpine.BSF.2.00.1102220845481.84613@epsilon.pair.com><228483.70348.qm@web87004.mail.ird.yahoo.com><710426.91151.qm@web87002.mail.ird.yahoo.com>
Dear Simon, I make mistakes on this, too. That is why I like having IDLE handy and sticking to Python syntax. Regards, Herbert ===================================================== Herbert J. Bernstein, Professor of Computer Science Dowling College, Kramer Science Center, KSC 121 Idle Hour Blvd, Oakdale, NY, 11769 +1-631-244-3035 yaya@dowling.edu ===================================================== On Tue, 22 Feb 2011, SIMON WESTRIP wrote: > Dear Herbert - I've just realized I confused myself by misreading your example > and treating it as equivalent to my own example! Sorry about that. > > Cheers > > Simon > > > _______________________________________________________________________________________________________________________ > From: SIMON WESTRIP <simonwestrip@btinternet.com> > To: Group finalising DDLm and associated dictionaries <ddlm-group@iucr.org> > Sent: Tuesday, 22 February, 2011 14:51:03 > Subject: Re: [ddlm-group] Technical issues with Proposal P > > Dear Herbert > > I'm still a bit confused. Following python semantics, > a CIF application reading the following items > > _item_a """C\"""" > _item_b r"""C\"""" > > should return values of > > C" for _item_a > C\" for _item_b > > Are you suggesting that the application should then *assume* that in the case of > _item_b the use of the backslash was purely to escape the final quote and should > discard the backslash from the value, thus assuming a value of C" ? > > Cheers > > Simon > > _______________________________________________________________________________________________________________________ > From: Herbert J. Bernstein <yaya@bernstein-plus-sons.com> > To: Group finalising DDLm and associated dictionaries <ddlm-group@iucr.org> > Sent: Tuesday, 22 February, 2011 13:51:02 > Subject: Re: [ddlm-group] Technical issues with Proposal P > > Dear Simon, > > From the point of view of writing a pure "CIF2" application > that is not aware of the whitespace, particular quote marks, > comments, etc, those two string are identical. > > From the point of view of a more general CIF API, in which > comments, magic numbers, and partiular quote marks, those > two string are different in precisely the same way that > the string 'ABC' and "ABC" are different, and 13.4 and > 1.34e1 are different. > > This is _not_ an ambiguity. It is a matter of whether > we are looking for the information in a file or looking > for the representations of the data in the file. > > Regards, > Herbert > > > ===================================================== > Herbert J. Bernstein, Professor of Computer Science > Dowling College, Kramer Science Center, KSC 121 > Idle Hour Blvd, Oakdale, NY, 11769 > > +1-631-244-3035 > yaya@dowling.edu > ===================================================== > > On Tue, 22 Feb 2011, SIMON WESTRIP wrote: > > > So > > """\\\"""" and r"""\"""" > > should strictly be treated as different, despite any recommendations you may > > have made to the contrary? > > > > > > ____________________________________________________________________________ > > From: Herbert J. Bernstein <yaya@bernstein-plus-sons.com> > > To: Group finalising DDLm and associated dictionaries <ddlm-group@iucr.org> > > Sent: Tuesday, 22 February, 2011 12:46:57 > > Subject: Re: [ddlm-group] Technical issues with Proposal P > > > > > So what is r"""C\"""" ? > > > > > > Is it C\" or is it C" ? > > > > """C\"""" is C" > > > > r"""C\"""" is C\" > > > > You can test this with IDLE. It is very clearly defined and > > reproducible Python string behavior, and I believe helps to make > > the case for sticking to the Python approach. It is very easy > > for any software developer or user to work out how the boundary > > cases are being handled. > > > > Regards, > > Herbert > > > > ===================================================== > > Herbert J. Bernstein, Professor of Computer Science > > Dowling College, Kramer Science Center, KSC 121 > > Idle Hour Blvd, Oakdale, NY, 11769 > > > > +1-631-244-3035 > > yaya@dowling.edu > > ===================================================== > > > > On Tue, 22 Feb 2011, SIMON WESTRIP wrote: > > > > > I am a little confused: > > > > > > So what is r"""C\"""" ? > > > > > > Is it C\" or is it C" ? > > > > > > Python says it should be C\", so CIF2 should say its C\" if CIF2 is > > adopting > > > Python? > > > > > > Or are you suggesting that we should adopt a fuzzy interpretation of > > Python? > > > > > > Cheers > > > > > > Simon > > > > > >___________________________________________________________________________ > > _ > > > From: Herbert J. Bernstein <yaya@bernstein-plus-sons.com> > > > To: Group finalising DDLm and associated dictionaries > > <ddlm-group@iucr.org> > > > Sent: Tuesday, 22 February, 2011 12:01:23 > > > Subject: Re: [ddlm-group] Technical issues with Proposal P > > > > > > Dear Colleagues, > > > > > > Working under the assumption of Ralf's proposal, rather > > > than Simon's, we have several very distinct string presentaions > > > to consider: a (non-raw) treble quoted string, a raw treble > > > quoted string a unicode treble quoted string and a raw unicode > > > treble quoted string. As for Python 3, under CIF2, because > > > the "native" character encoding is UTF-8, under reasonable coding > > > constraints, this collapses to just two cases the application > > > needs to deal with: non-raw (i.e. cooked) versus raw. The intent of > > > the cooked is for the lexer to process the elides, so the response > > > I gave is, I believe correct -- just push the string through IDLE. > > > The intent of the raw is precisely to push through the string > > > with the backslahes still in place, e.g. for TeX text in which > > > you don't want to double-up your backslashes. While I personally > > > would recommend against such a use of raw, it is not ambiguous. > > > It gives the application a very well-defined string of characters > > > to deal with. Yes, there are applications that are intended to > > > deal with CIF with the encoding exposed (e.g. cif2cbf, cif2cif, etc.) > > > bit, I agree that the cleanest design is for an application to > > > only make use of the string content, not the representation. > > > > > > Thus, for most applications, I would recommend that they treat > > > > > > """\\\"""" and r"""\"""" > > > > > > as equivalent, but for applications that are, for example, > > > intended to do faithful copies of the representations that > > > they treat them as different. > > > > > > We have had, and will continue to have this subtle problem > > > with all versions of CIF in the handling of things such as > > > magic number, comments, white space, line folding, and choices > > > of quoting characters. I don't see how the introduction of > > > the Python treble quote makes the situation any worse or > > > any more or less ambiguous. > > > > > > Regards, > > > Herbert > > > > > > ===================================================== > > > Herbert J. Bernstein, Professor of Computer Science > > > Dowling College, Kramer Science Center, KSC 121 > > > Idle Hour Blvd, Oakdale, NY, 11769 > > > > > > +1-631-244-3035 > > > yaya@dowling.edu > > > ===================================================== > > > > > > On Tue, 22 Feb 2011, James Hester wrote: > > > > > > > I will focus this email on the technical issues and try to return to > > > > the other issues at a later date (I've changed the subject > > > > accordingly) > > > > > > > > [edit] > > > > > > > > My apologies for not being clear: my examples of embedded elides > > > > already give the internal representation of the strings, deliberately > > > > leaving out the particular delimiters that might have been used to > > > > produce those strings. Herbert mistakenly thought I was giving > > > > triple-double-quote delimited strings and asking what the internal > > > > representation was. So, unfortunately, IDLE cannot help here, as the > > > > internal representation is not in question. > > > > > > > > My question therefore remains: how does the CIF application interpret > > > > these strings? Is the <backslash><delimiter> in my examples simply an > > > > elide that could not be removed from a raw string and therefore should > > > > be ignored, or is it a character sequence intended for the application > > > > (eg a LaTeX accent on the o or e)? > > > > > > > > In your answer you may assume that the CIF application knows that the > > > > string was a raw string delimited by triple double quotes (even though > > > > requiring communication of such information would be a very > > > > unfortunate loss of clean design). > > > > > > > > Those strings again: > > > > > > > > <start> I have no idea what the last characters of this string > > > are\"<finish> > > > > <start> Does this string have two\""" or three internal quotes?<finish> > > > > > > > > > > > > Herbert writes: > > > >> Now for your two examples of embedded elides of quotes: > > > >> > > > >> <start> I have no idea what the last characters of this string > > > are\"<finish> > > > >> > > > >> is, internally, as a C-string > > > >> > > > >> I have no idea what the last characters of this string are"\0 > > > >> > > > >> <start> Does this string have two\""" or three internal quotes?<finish> > > > >> > > > >> is, internally as a C-string > > > >> > > > >> Does this string have two""" or three internal quotes?\0 > > > >> > > > >> I settled that by simply cranking up IDLE and doing: > > > >> > > > >>>>> print """I have no idea what the last characters of this string > > > >>>>> are\"""" I have no idea what the last characters of this string > > > >>>>> are" >>> print """Does this string have two\""" or three internal > > > >>>>> quotes?""" Does this string have two""" or three internal quotes? > > > >> > > > >> As you well know, having IDLE around is a big help. > > > >> > > > >> Thank you again for taking the time to clarify your position > > > >> on Ralf's proposal. I think I now understand why you prefer Simon's > > > >> proposal. > > > >> > > > >> Regards, > > > >> Herbert > > > >> > > > >> > > > >> > > > >> > > > >> > > > > > > > >>> One technical issue with Proposal P that has not been resolved is how > > > >>> a CIF application is supposed to interpret the sequence > > > >>> <backslash><double quote> when encountered in a string returned from > > > >>> the parser. Is this sequence: > > > >>> (a) a terminator elide sequence that was left in a raw string, so > > > >>> corresponds to <double quote>? > > > >>> (b) something with meaning for the application so should be > > > >>> <backslash><double quote>? > > > >>> > > > >>> Please therefore advise how a CIF application will disambiguate the > > > >>> following string content from a Proposal P parser: > > > >>> > > > >>> <start> I have no idea what the last characters of this string > > > are\"<finish> > > > >>> <start> Does this string have two\""" or three internal > > quotes?<finish> > > > >>> > > > >>> James > > > >>> > > > > > > > > > > > > > > > > -- > > > > T +61 (02) 9717 9907 > > > > F +61 (02) 9717 3145 > > > > M +61 (04) 0249 4148 > > > > _______________________________________________ > > > > ddlm-group mailing list > > > > ddlm-group@iucr.org > > > > http://scripts.iucr.org/mailman/listinfo/ddlm-group > > > > > > > > > > > > > > > >
_______________________________________________ ddlm-group mailing list ddlm-group@iucr.org http://scripts.iucr.org/mailman/listinfo/ddlm-group
Reply to: [list | sender only]
- Follow-Ups:
- Re: [ddlm-group] Technical issues with Proposal P (SIMON WESTRIP)
- References:
- [ddlm-group] Technical issues with Proposal P (James Hester)
- Re: [ddlm-group] Technical issues with Proposal P (Herbert J. Bernstein)
- Re: [ddlm-group] Technical issues with Proposal P (SIMON WESTRIP)
- Re: [ddlm-group] Technical issues with Proposal P (Herbert J. Bernstein)
- Re: [ddlm-group] Technical issues with Proposal P (SIMON WESTRIP)
- Re: [ddlm-group] Technical issues with Proposal P (Herbert J. Bernstein)
- Re: [ddlm-group] Technical issues with Proposal P (SIMON WESTRIP)
- Re: [ddlm-group] Technical issues with Proposal P (SIMON WESTRIP)
- Prev by Date: Re: [ddlm-group] Technical issues with Proposal P
- Next by Date: Re: [ddlm-group] Technical issues with Proposal P
- Prev by thread: Re: [ddlm-group] Technical issues with Proposal P
- Next by thread: Re: [ddlm-group] Technical issues with Proposal P
- Index(es):