[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Reply to: [list | sender only]
Re: [ddlm-group] Technical issues with Proposal P
- To: Group finalising DDLm and associated dictionaries <ddlm-group@iucr.org>
- Subject: Re: [ddlm-group] Technical issues with Proposal P
- From: "Herbert J. Bernstein" <yaya@bernstein-plus-sons.com>
- Date: Thu, 24 Feb 2011 08:02:08 -0500 (EST)
- In-Reply-To: <635800.54481.qm@web87007.mail.ird.yahoo.com>
- References: <AANLkTi=kadbHikjabDyioDOw=L_pthGORgi6w2b45yX6@mail.gmail.com><alpine.BSF.2.00.1102220644270.84613@epsilon.pair.com><417719.45449.qm@web87006.mail.ird.yahoo.com><alpine.BSF.2.00.1102220741480.84613@epsilon.pair.com><301639.7573.qm@web87001.mail.ird.yahoo.com><alpine.BSF.2.00.1102220845481.84613@epsilon.pair.com><228483.70348.qm@web87004.mail.ird.yahoo.com><710426.91151.qm@web87002.mail.ird.yahoo.com><alpine.BSF.2.00.1102221522040.36016@epsilon.pair.com><639597.72610.qm@web87009.mail.ird.yahoo.com><AANLkTikReSDV23THOOXBA2mhnq8O1-jF79_zQRbDp4Q8@mail.gmail.com><alpine.BSF.2.00.1102221758260.23065@epsilon.pair.com><AANLkTi=4wXyFTquP88JDjUc+w480Y5M1uzOimmayTaRY@mail.gmail.com><alpine.BSF.2.00.1102221915580.23065@epsilon.pair.com><AANLkTikgabBbmimHEWm9gD_kEnFdCSVV4-G-i+Z2nsYw@mail.gmail.com><a06240801c98a3e532621@[192.168.2.102]><219810.84158.qm@web87007.mail.ird.yahoo.com><635800.54481.qm@web87007.mail.ird.yahoo.com>
Dear Simon, Yes, the closest approximation to the current line folding would be a cooked python style treble-quoted string. The main use for the raw strings is for people who don't like having to double-up backslashes to present things like TeX. Not my taste, but some people like it, and there is no downside that I can see in giving them the capability. Regards, Herbert ===================================================== Herbert J. Bernstein, Professor of Computer Science Dowling College, Kramer Science Center, KSC 121 Idle Hour Blvd, Oakdale, NY, 11769 +1-631-244-3035 yaya@dowling.edu ===================================================== On Thu, 24 Feb 2011, SIMON WESTRIP wrote: > A possible minor inconvenience of proposal P: > > Given that CIF strings are essentially 'raw' as CIF2 now stands (and CIF1 strings too can > be reconciled with the raw variant), and as I understand it python raw strings do not > support line continuation, > under proposal P a string will have to be 'cooked' in order to employ line folding? > > Please forgive me if these seems a little trivial, but I am really struggling to see any > benefit > in adopting proposal P, especially for the end user. Maybe someone can help by providing > an example where the use of cooked strings will make life easier for the end user? > > Cheers > > Simon > > __________________________________________________________________________________________ > From: SIMON WESTRIP <simonwestrip@btinternet.com> > To: Group finalising DDLm and associated dictionaries <ddlm-group@iucr.org> > Sent: Wednesday, 23 February, 2011 13:06:15 > Subject: Re: [ddlm-group] Technical issues with Proposal P > > I don't know if this will help much in making a choice about proposal P, but > it might be worth looking at current practice in the case of one CIF user group - > namely authors submitting CIFs to journals. > > In many respects an F-type scheme has been used for preparing the text sections of > CIFs for publication for many years. The backslash is used to escape accents and greek > letters > as well as itself, e.g. \"u for uumul, \a for alpha, \\a for \a... > In addition, we have the line-folding protocol, although that is rarely necessary and > very rarely applied manually. Although these 'common semantic features' will not be > a part of CIF2, they may well remain in use at the application level. Under scheme F, > a field containing this markup delimited by semicolons could readily be dropped > into a field delimited by tripple quotes (though double-backslash control sequences > would be returned as single-backslash control sequences - but fortunately there are few of > these in use, e.g. \\rightarrow...). Under a P-type proposal, extra care would be required > when choosing between the 'raw' and 'cooked' variants before dumping the contents of > a semi-colon delimited field into a tripple-quoted field. > > I've mentioned before that handling the transition from CIF1's 'common semantic features' > to CIF2 unicode will require care in any case; I've yet to be convinced that complex > python semantics will help here, nor offer any real benefit in general, given that by > only adopting them for one means of delimiting a data value, great care has to be taken > when > switching delimiters (and for no obvious reason or benefit if you're only concern when > working with a > raw CIF is to complete it for publication purposes). > > Cheers > > Simon > > __________________________________________________________________________________________ > From: Herbert J. Bernstein <yaya@bernstein-plus-sons.com> > To: Group finalising DDLm and associated dictionaries <ddlm-group@iucr.org> > Sent: Wednesday, 23 February, 2011 4:40:39 > Subject: Re: [ddlm-group] Technical issues with Proposal P > > Dear James, > > I don't see any reason to disagree with your summary of limitations > on the use of raw strings. That is why I find it easier to use cooked > strings. But there are people who like them, so, since, once you > have Python-style triple quotes at all, it is easy to support them, > I would be inclined to do so, rather than have a one-size fits all > solution. > > The real question is whether to support the cooked Python strings > with all Python elides or the more limited set of elides in proposal > F. I repeat my suggestion that people try working with both as > I have for years, and we can see which proves more or less confusing > in what situations. I have always found the mixing of things > like TeX with its backslashes with the line folding protocol without > doubling up the TeX backslashes very confusing. Maybe it is just the > way my head works. Maybe other people will have a different view. > We won't know until people other than me try it. > > Regards, > Herbert > > At 1:35 PM +1100 2/23/11, James Hester wrote: > >I absolutely agree that it is up to the application to decide on the > >meanings of strings given to it, with reference to the dictionary. I > >am very happy that we agree on this iron separation between syntax and > >content. > > > >In a situation that <backslash><delimiter> has meaning to a CIF > >application, it follows that you cannot in general use raw strings to > >express a data value that > >(i) contains both triple double quotes and triple single quotes > >(ii) contains triple double quotes and terminates with a single quote > >(iii) contains triple single quotes and terminates with a double quote > > > >Furthermore, you cannot use triple-double-quote delimited raw strings > >to delimit a string terminating in a double quote and/or containing > >triple double quotes, likewise for single quote. > > > >These idiosyncracies would need to be documented if we were to adopt > >Proposal P, and we would need to be confident that CIF2 implementers > >and users would not make inadvertent errors in their selection of > >quotes and string types. As the manifestation of an error would > >typically be nothing more than a stray accent, there is no way, beyond > >careful proof-reading, that such mistakes would be caught. > > > >Proposals F and F' give less opportunity for error and are simpler to use. > > > >On Wed, Feb 23, 2011 at 11:27 AM, Herbert J. Bernstein > ><yaya@bernstein-plus-sons.com> wrote: > >> Dear James, > >> > >> I am really lost here. I believe it is up to the application > >> to decide on the meaning of strings given as tag value, hopefully > >> using a dictionary to inform that decision. I really don't > >> see what the use of r""" versus """ versus ; versus ' versus " > >> to get the string into its internal form has to do with its > >> meaning to the application, unless the application is one > >> of these CIF copy/transform applications that violate some > >> of the CIF rules to see through to the original represenatation > >> rather than stopping with the data, and then we are moving outside the > >> rules of CIF itself to more general text processing. > >> > >> As I said, one of the nicer uses for the raw treble quote strings > >> is to bring TeX into an application without having to double-up > >> backslashes. That is a very clear case in which the application > >> have very different backslash processing than Python and you > >> want to suppress most of the Python processing. If what you > >> are trying to do is to elide the quote marks, then you will have > >> an easier time using the regular treble quotes. > >> > >> Regards, > >> Herbert > >> ===================================================== > >> Herbert J. Bernstein, Professor of Computer Science > >> Dowling College, Kramer Science Center, KSC 121 > >> Idle Hour Blvd, Oakdale, NY, 11769 > >> > >> +1-631-244-3035 > > > yaya@dowling.edu > > > ===================================================== > >> > >> On Wed, 23 Feb 2011, James Hester wrote: > >> > >>> Dear Herbert, > >>> > >>> Because raw strings must retain any eliding backslashes in the string > >>> (unlike cooked strings), a backslash in the internal string > >>> representation may indeed be an artefact of the syntax proposed in > >>> Proposal P. Or might not. The application can't always tell. See my > >>> other email for a way to resolve this. > >>> > >>> If everything is so clear, could you please just answer the following > >>> rephrased questions? "The CIF application" refers to an application > >>> for which <backslash><delimiter> means "accent the letter preceding > >>> the backslash". > >>> > >>> Should the CIF application interpret the first string as finishing > >>> with a double quote, or with an accented e? > >>> Should the CIF application interpret the second string as containing > >>> an accented o, followed by two double quotes, or a letter o followed > >>> by three quotes? > >>> > >>> On Wed, Feb 23, 2011 at 10:16 AM, Herbert J. Bernstein > >>> <yaya@bernstein-plus-sons.com> wrote: > >>>> > >>>> Dear James, > >>>> > >>>> I still don't understand. Neither python nor I think \" > >>>> from a raw string is an artifact of anything. It is > >>>> just a backslash followed by a double quotemark. The > >>>> point of the raw string is to provide a quick and > >>>> convenient way to input something like TeX without > >>>> having to double-up the backsashes. Personally, I am > >>>> happy to double up the backslashes, but I can see the > >>>> value to people who have to deal with lots of TeX in > >>>> not needing to do so. > >>>> > >>>>> Does the first string finish with a double quote, or with an accented e? > >>>>> Does the second string contain an accented o, followed by two double > >>>>> quotes, or a letter o followed by three quotes? > >>>> > >>>> are not questions related to the quoting mechanism used, but > >>>> purely to the application. Working purely in CIF1.1 all > >>>> of the following are equivalent, external representations: > >>>> > >>>> Set 1 > >>>> ;\ > >>>> I have no idea what the last characters of this string are\"\ > >>>> ; > >>>> 'I have no idea what the last characters of this string are\"' > >>>> "I have no idea what the last characters of this string are\"" > >>>> > >>>> and in all cases the last 2 characters are backslash followed by > >>>> double quote > >>>> > >>>> Set 2 > >>>> ;\ > >>>> Does this string have two\""" or three internal quotes?\ > >>>> ; > >>>> 'Does this string have two\""" or three internal quotes?' > >>>> > >>>> and in both cases there are three internal quotes > >>>> > >>>> I don't see how this differs in any way from > >>>> > >>>> r'''I have no idea what the last characters of this string are\"''' > >>>> or > >>>> '''I have no idea what the last characters of this string are\\"''' > >>>> or > >>>> """I have no idea what the last characters of this string are\\\"""" > >>>> > >>>> and > >>>> > >>>> r'''Does this string have two\""" or three internal quotes?''' > >>>> or > >>>> '''Does this string have two\\""" or three internal quotes?''' > >>>> or > >>>> """Does this string have two\\\"\"\" or three internal quotes?""" > >>>> > >>>> There are very real problems with the raw string that are noted > >>>> in the Pyhton documentation, but they do have their uses. This > >>>> ambiguity is not one of the problems. > >>>> > >>>> Regards, > >>>> Herbert > >>>> > >>>> ===================================================== > >>>> Herbert J. Bernstein, Professor of Computer Science > >>>> Dowling College, Kramer Science Center, KSC 121 > >>>> Idle Hour Blvd, Oakdale, NY, 11769 > >>>> > >>>> +1-631-244-3035 > >>>> yaya@dowling.edu > >>>> ===================================================== > >>>> > >>>> On Wed, 23 Feb 2011, James Hester wrote: > >>>> > >>>>> I am trying to focus relentlessly on a particular and very real > >>>>> technical issue. I repeat that I am not concerned about the > >>>>> transformation from surface syntax to a sequence of characters. I > >>>>> accept that that is well-defined and unambiguous for all proposals on > >>>>> the table. If you think that IDLE can resolve this problem, you > >>>>> haven't understood my question. > >>>>> > >>>>> My question relates to the next step: how does the CIF application > > >>>> downstream from the parser interpret this sequence of characters? > >>>>> Under all previous incarnations of CIF, it was safe to assume that no > >>>>> artefacts of syntactical representation were left in the string, so > >>>>> the string had purely domain-specific meaning. However, with the > >>>>> introduction of raw strings, <backslash><delimiter> will escape the > >>>>> delimiter, but the <backslash> is required to remain in the string. > >>>>> So the downstream application must decide between artefacts of the > >>>>> syntactical representation (<backslash><delimiter>) that have remained > >>>>> in raw strings, and domain-specific character sequences > >>>>> (<backslash><delimiter>). Here those examples are again (remember > >>>>> this is the character sequence after syntactic processing): > >>>>> > >>>>> <start> I have no idea what the last characters of this string > >>>>> are\"<finish> > >>>>> <start> Does this string have two\""" or three internal quotes?<finish> > >>>>> > >>>>> Assume the domain-specific meaning of <backslash><quote> when found in > >>>>> a datavalue is to accent the letter preceding the <backslash>. > >>>>> > >>>>> Does the first string finish with a double quote, or with an accented e? > >>>>> Does the second string contain an accented o, followed by two double > >>>>> quotes, or a letter o followed by three quotes? > >>>>> > >>>>> > >>>>> On Wed, Feb 23, 2011 at 8:01 AM, SIMON WESTRIP > >>>>> <simonwestrip@btinternet.com> wrote: > >>>>>> > >>>>>> Dear all > >>>>>> > >>>>>> Reviewing the exchanges in this thread ("Technical issues with Proposal > >>>>>> P"), > >>>>>> it seems that > >>>>>> the 'technical issues' might better be described as 'potentially > >>>>>> confusing > >>>>>> issues' :-) > >>>>>> That is, under proposal P, there is no ambiguity about how the string > >>>>>> should > >>>>>> be read, but > >>>>>> there is potential for misinterpretation by the user (e.g. an erroneous > >>>>>> assumption that by using a backslash > >>>>>> to escape a quotation mark, the backslash will not be included as part > >>>>>> of > >>>>>> the parsed data value (in the raw variant)). > >>>>>> So, as John says, perhaps this simply demonstrates that "the complexity > >>>>>> of > >>>>>> the syntax and semantics > >>>>>> provided by proposal P would be likely to be a source of confusion for > >>>>>> developers and users both", and maybe > >>>>>> therein lies the merit of this particular thread? It reinforces those > >>>>>> arguements against proposal P that suggest > >>>>>> that the introduction of a more complex syntax for one of the delimiter > >>>>>> types is a potential source of > >>>>>> confusion for many existing CIF users. > >>>>>> > >>>>>> Cheers > >>>>>> > >>>>>> Simon > >>>>>> ________________________________ > >>>>>> From: Herbert J. Bernstein <yaya@bernstein-plus-sons.com> > >>>>>> To: Group finalising DDLm and associated dictionaries > >>>>>> <ddlm-group@iucr.org> > >>>>>> Sent: Tuesday, 22 February, 2011 20:22:57 > >>>>>> Subject: Re: [ddlm-group] Technical issues with Proposal P > >>>>>> > >>>>>> Dear Simon, > >>>>>> > >>>>>> I make mistakes on this, too. That is why I like having IDLE > >>>>>> handy and sticking to Python syntax. > >>>>>> > >>>>>> Regards, > >>>>>> Herbert > >>>>>> > >>>>>> ===================================================== > >>>>>> Herbert J. Bernstein, Professor of Computer Science > >>>>>> Dowling College, Kramer Science Center, KSC 121 > >>>>>> Idle Hour Blvd, Oakdale, NY, 11769 > >>>>>> > >>>>>> +1-631-244-3035 > >>>>>> yaya@dowling.edu > >>>>>> ===================================================== > >>>>>> > >>>>>> On Tue, 22 Feb 2011, SIMON WESTRIP wrote: > >>>>>> > >>>>>>> Dear Herbert - I've just realized I confused myself by misreading your > >>>>>>> example > >>>>>>> and treating it as equivalent to my own example! Sorry about that. > >>>>>>> > >>>>>>> Cheers > >>>>>>> > >>>>>>> Simon > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>>__________________________________________________________________________________ > _____________________________________ > >>>>>>> From: SIMON WESTRIP <simonwestrip@btinternet.com> > >>>>>>> To: Group finalising DDLm and associated dictionaries > >>>>>>> <ddlm-group@iucr.org> > >>>>>>> Sent: Tuesday, 22 February, 2011 14:51:03 > >>>>>>> Subject: Re: [ddlm-group] Technical issues with Proposal P > > >>>>>> > >>>>>>> Dear Herbert > >>>>>>> > >>>>>>> I'm still a bit confused. Following python semantics, > >>>>>>> a CIF application reading the following items > >>>>>>> > >>>>>>> _item_a """C\"""" > >>>>>>> _item_b r"""C\"""" > >>>>>>> > >>>>>>> should return values of > >>>>>>> > >>>>>>> C" for _item_a > >>>>>>> C\" for _item_b > >>>>>>> > >>>>>>> Are you suggesting that the application should then *assume* that in > >>>>>>> the > >>>>>>> case of > >>>>>>> _item_b the use of the backslash was purely to escape the final quote > >>>>>>> and > >>>>>>> should > >>>>>>> discard the backslash from the value, thus assuming a value of C" ? > >>>>>>> > >>>>>>> Cheers > >>>>>>> > >>>>>>> Simon > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>>__________________________________________________________________________________ > _____________________________________ > >>>>>>> From: Herbert J. Bernstein <yaya@bernstein-plus-sons.com> > >>>>>>> To: Group finalising DDLm and associated dictionaries > >>>>>>> <ddlm-group@iucr.org> > >>>>>>> Sent: Tuesday, 22 February, 2011 13:51:02 > >>>>>>> Subject: Re: [ddlm-group] Technical issues with Proposal P > >>>>>>> > >>>>>>> Dear Simon, > >>>>>>> > >>>>>>> From the point of view of writing a pure "CIF2" application > >>>>>>> that is not aware of the whitespace, particular quote marks, > >>>>>>> comments, etc, those two string are identical. > >>>>>>> > >>>>>>> From the point of view of a more general CIF API, in which > >>>>>>> comments, magic numbers, and partiular quote marks, those > >>>>>>> two string are different in precisely the same way that > >>>>>>> the string 'ABC' and "ABC" are different, and 13.4 and > >>>>>>> 1.34e1 are different. > >>>>>>> > >>>>>>> This is _not_ an ambiguity. It is a matter of whether > >>>>>>> we are looking for the information in a file or looking > >>>>>>> for the representations of the data in the file. > >>>>>>> > >>>>>>> Regards, > >>>>>>> Herbert > >>>>>>> > >>>>>>> > >>>>>>> ===================================================== > >>>>>>> Herbert J. Bernstein, Professor of Computer Science > >>>>>>> Dowling College, Kramer Science Center, KSC 121 > >>>>>>> Idle Hour Blvd, Oakdale, NY, 11769 > >>>>>>> > >>>>>>> +1-631-244-3035 > >>>>>>> yaya@dowling.edu > >>>>>>> ===================================================== > >>>>>>> > >>>>>>> On Tue, 22 Feb 2011, SIMON WESTRIP wrote: > >>>>>>> > >>>>>>>> So > >>>>>>>> """\\\"""" and r"""\"""" > >>>>>>>> should strictly be treated as different, despite any recommendations > >>>>>>>> you > >>>>>>>> may > >>>>>>>> have made to the contrary? > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>>____________________________________________________________________________ > >>>>>>>> From: Herbert J. Bernstein <yaya@bernstein-plus-sons.com> > >>>>>>>> To: Group finalising DDLm and associated dictionaries > >>>>>>>> <ddlm-group@iucr.org> > >>>>>>>> Sent: Tuesday, 22 February, 2011 12:46:57 > >>>>>>>> Subject: Re: [ddlm-group] Technical issues with Proposal P > >>>>>>>> > >>>>>>>>> So what is r"""C\"""" ? > >>>>>>>>> > >>>>>>>>> Is it C\" or is it C" ? > >>>>>>>> > >>>>>>>> """C\"""" is C" > >>>>>>>> > >>>>>>>> r"""C\"""" is C\" > >>>>>>>> > >>>>>>>> You can test this with IDLE. It is very clearly defined and > >>>>>>>> reproducible Python string behavior, and I believe helps to make > >>>>>>>> the case for sticking to the Python approach. It is very easy > >>>>>>>> for any software developer or user to work out how the boundary > >>>>>>>> cases are being handled. > >>>>>>>> > >>>>>>>> Regards, > >>>>>>>> Herbert > >>>>>>>> > >>>>>>>> ===================================================== > >>>>>>>> Herbert J. Bernstein, Professor of Computer Science > >>>>>>>> Dowling College, Kramer Science Center, KSC 121 > >>>>>>>> Idle Hour Blvd, Oakdale, NY, 11769 > >>>>>>>> > >>>>>>>> +1-631-244-3035 > >>>>>>>> yaya@dowling.edu > >>>>>>>> ===================================================== > >>>>>>>> > >>>>>>>> On Tue, 22 Feb 2011, SIMON WESTRIP wrote: > >>>>>>>> > >>>>>>>>> I am a little confused: > >>>>>>>>> > >>>>>>>>> So what is r"""C\"""" ? > >>>>>>>>> > >>>>>>>>> Is it C\" or is it C" ? > >>>>>>>>> > >>>>>>>>> Python says it should be C\", so CIF2 should say its C\" if CIF2 is > >>>>>>>> > >>>>>>>> adopting > >>>>>>>>> > >>>>>>>>> Python? > >>>>>>>>> > >>>>>>>>> Or are you suggesting that we should adopt a fuzzy interpretation of > > >>>>>>> > >>>>>>>> Python? > >>>>>>>>> > >>>>>>>>> Cheers > >>>>>>>>> > >>>>>>>>> Simon > >>>>>>>>> > >>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>>___________________________________________________________________________ > >>>>>>>> > >>>>>>>> _ > >>>>>>>>> > >>>>>>>>> From: Herbert J. Bernstein <yaya@bernstein-plus-sons.com> > >>>>>>>>> To: Group finalising DDLm and associated dictionaries > >>>>>>>> > >>>>>>>> <ddlm-group@iucr.org> > >>>>>>>>> > >>>>>>>>> Sent: Tuesday, 22 February, 2011 12:01:23 > >>>>>>>>> Subject: Re: [ddlm-group] Technical issues with Proposal P > >>>>>>>>> > >>>>>>>>> Dear Colleagues, > >>>>>>>>> > >>>>>>>>> Working under the assumption of Ralf's proposal, rather > >>>>>>>>> than Simon's, we have several very distinct string presentaions > >>>>>>>>> to consider: a (non-raw) treble quoted string, a raw treble > >>>>>>>>> quoted string a unicode treble quoted string and a raw unicode > >>>>>>>>> treble quoted string. As for Python 3, under CIF2, because > >>>>>>>>> the "native" character encoding is UTF-8, under reasonable coding > >>>>>>>>> constraints, this collapses to just two cases the application > >>>>>>>>> needs to deal with: non-raw (i.e. cooked) versus raw. The intent > >>>>>>>>> of > >>>>>>>>> the cooked is for the lexer to process the elides, so the response > >>>>>>>>> I gave is, I believe correct -- just push the string through IDLE. > >>>>>>>>> The intent of the raw is precisely to push through the string > >>>>>>>>> with the backslahes still in place, e.g. for TeX text in which > >>>>>>>>> you don't want to double-up your backslashes. While I personally > >>>>>>>>> would recommend against such a use of raw, it is not ambiguous. > >>>>>>>>> It gives the application a very well-defined string of characters > >>>>>>>>> to deal with. Yes, there are applications that are intended to > >>>>>>>>> deal with CIF with the encoding exposed (e.g. cif2cbf, cif2cif, > >>>>>>>>> etc.) > >>>>>>>>> bit, I agree that the cleanest design is for an application to > >>>>>>>>> only make use of the string content, not the representation. > >>>>>>>>> > >>>>>>>>> Thus, for most applications, I would recommend that they treat > >>>>>>>>> > >>>>>>>>> """\\\"""" and r"""\"""" > >>>>>>>>> > >>>>>>>>> as equivalent, but for applications that are, for example, > >>>>>>>>> intended to do faithful copies of the representations that > >>>>>>>>> they treat them as different. > >>>>>>>>> > >>>>>>>>> We have had, and will continue to have this subtle problem > >>>>>>>>> with all versions of CIF in the handling of things such as > >>>>>>>>> magic number, comments, white space, line folding, and choices > >>>>>>>>> of quoting characters. I don't see how the introduction of > >>>>>>>>> the Python treble quote makes the situation any worse or > >>>>>>>>> any more or less ambiguous. > >>>>>>>>> > >>>>>>>>> Regards, > >>>>>>>>> Herbert > >>>>>>>>> > >>>>>>>>> ===================================================== > >>>>>>>>> Herbert J. Bernstein, Professor of Computer Science > >>>>>>>>> Dowling College, Kramer Science Center, KSC 121 > >>>>>>>>> Idle Hour Blvd, Oakdale, NY, 11769 > >>>>>>>>> > >>>>>>>>> +1-631-244-3035 > >>>>>>>>> yaya@dowling.edu > >>>>>>>>> ===================================================== > >>>>>>>>> > >>>>>>>>> On Tue, 22 Feb 2011, James Hester wrote: > >>>>>>>>> > >>>>>>>>>> I will focus this email on the technical issues and try to return > >>>>>>>>>> to > >>>>>>>>>> the other issues at a later date (I've changed the subject > >>>>>>>>>> accordingly) > >>>>>>>>>> > >>>>>>>>>> [edit] > >>>>>>>>>> > >>>>>>>>>> My apologies for not being clear: my examples of embedded elides > >>>>>>>>>> already give the internal representation of the strings, > >>>>>>>>>> deliberately > >>>>>>>>>> leaving out the particular delimiters that might have been used to > >>>>>>>>>> produce those strings. Herbert mistakenly thought I was giving > >>>>>>>>>> triple-double-quote delimited strings and asking what the internal > >>>>>>>>>> representation was. So, unfortunately, IDLE cannot help here, as > >>>>>>>>>> the > >>>>>>>>>> internal representation is not in question. > >>>>>>>>>> > >>>>>>>>>> My question therefore remains: how does the CIF application > >>>>>>>>>> interpret > >>>>>>>>>> these strings? Is the <backslash><delimiter> in my examples simply > > >>>>>>>>> an > >>>>>>>>>> elide that could not be removed from a raw string and therefore > >>>>>>>>>> should > >>>>>>>>>> be ignored, or is it a character sequence intended for the > >>>>>>>>>> application > >>>>>>>>>> (eg a LaTeX accent on the o or e)? > >>>>>>>>>> > >>>>>>>>>> In your answer you may assume that the CIF application knows that > >>>>>>>>>> the > >>>>>>>>>> string was a raw string delimited by triple double quotes (even > >>>>>>>>>> though > >>>>>>>>>> requiring communication of such information would be a very > >>>>>>>>>> unfortunate loss of clean design). > >>>>>>>>>> > >>>>>>>>>> Those strings again: > >>>>>>>>>> > >>>>>>>>>> <start> I have no idea what the last characters of this string > >>>>>>>>> > >>>>>>>>> are\"<finish> > >>>>>>>>>> > >>>>>>>>>> <start> Does this string have two\""" or three internal > >>>>>>>>>> quotes?<finish> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> Herbert writes: > >>>>>>>>>>> > >>>>>>>>>>> Now for your two examples of embedded elides of quotes: > >>>>>>>>>>> > >>>>>>>>>>> <start> I have no idea what the last characters of this string > >>>>>>>>> > >>>>>>>>> are\"<finish> > >>>>>>>>>>> > >>>>>>>>>>> is, internally, as a C-string > >>>>>>>>>>> > >>>>>>>>>>> I have no idea what the last characters of this string are"\0 > >>>>>>>>>>> > >>>>>>>>>>> <start> Does this string have two\""" or three internal > >>>>>>>>>>> quotes?<finish> > >>>>>>>>>>> > >>>>>>>>>>> is, internally as a C-string > >>>>>>>>>>> > >>>>>>>>>>> Does this string have two""" or three internal quotes?\0 > >>>>>>>>>>> > >>>>>>>>>>> I settled that by simply cranking up IDLE and doing: > >>>>>>>>>>> > >>>>>>>>>>>>>> print """I have no idea what the last characters of this > >>>>>>>>>>>>>> string > >>>>>>>>>>>>>> are\"""" I have no idea what the last characters of this string > >>>>>>>>>>>>>> are" >>> print """Does this string have two\""" or three > >>>>>>>>>>>>>> internal > >>>>>>>>>>>>>> quotes?""" Does this string have two""" or three internal > >>>>>>>>>>>>>> quotes? > >>>>>>>>>>> > >>>>>>>>>>> As you well know, having IDLE around is a big help. > >>>>>>>>>>> > >>>>>>>>>>> Thank you again for taking the time to clarify your position > >>>>>>>>>>> on Ralf's proposal. I think I now understand why you prefer > >>>>>>>>>>> Simon's > >>>>>>>>>>> proposal. > >>>>>>>>>>> > >>>>>>>>>>> Regards, > >>>>>>>>>>> Herbert > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>>>> One technical issue with Proposal P that has not been resolved is > >>>>>>>>>>>> how > >>>>>>>>>>>> a CIF application is supposed to interpret the sequence > >>>>>>>>>>>> <backslash><double quote> when encountered in a string returned > >>>>>>>>>>>> from > >>>>>>>>>>>> the parser. Is this sequence: > >>>>>>>>>>>> (a) a terminator elide sequence that was left in a raw string, so > >>>>>>>>>>>> corresponds to <double quote>? > >>>>>>>>>>>> (b) something with meaning for the application so should be > >>>>>>>>>>>> <backslash><double quote>? > >>>>>>>>>>>> > >>>>>>>>>>>> Please therefore advise how a CIF application will disambiguate > >>>>>>>>>>>> the > >>>>>>>>>>>> following string content from a Proposal P parser: > >>>>>>>>>>>> > >>>>>>>>>>>> <start> I have no idea what the last characters of this string > >>>>>>>>> > >>>>>>>>> are\"<finish> > >>>>>>>>>>>> > >>>>>>>>>>>> <start> Does this string have two\""" or three internal > >>>>>>>> > >>>>>>>> quotes?<finish> > >>>>>>>>>>>> > >>>>>>>>>>>> James > >>>>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> -- > >>>>>>>>>> T +61 (02) 9717 9907 > >>>>>>>>>> F +61 (02) 9717 3145 > >>>>>>>>>> M +61 (04) 0249 4148 > >>>>>>>>>> _______________________________________________ > >>>>>>>>>> ddlm-group mailing list > >>>>>>>>>> ddlm-group@iucr.org > >>>>>>>>>> http://scripts.iucr.org/mailman/listinfo/ddlm-group > >>>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>> > >>>>>>> > >>>>>> _______________________________________________ > >>>>>> ddlm-group mailing list > >>>>>> ddlm-group@iucr.org > >>>>>> http://scripts.iucr.org/mailman/listinfo/ddlm-group > >>>>>> > >>>>>> > >>>>> > >>>>> > >>>>> > >>>>> -- > >>>>> T +61 (02) 9717 9907 > >>>>> F +61 (02) 9717 3145 > >>>>> M +61 (04) 0249 4148 > >>>>> _______________________________________________ > >>>>> ddlm-group mailing list > >>>>> ddlm-group@iucr.org > >>>>> http://scripts.iucr.org/mailman/listinfo/ddlm-group > >>>> > >>>> _______________________________________________ > > >>> ddlm-group mailing list > >>>> ddlm-group@iucr.org > >>>> http://scripts.iucr.org/mailman/listinfo/ddlm-group > >>>> > >>>> > >>> > >>> > >>> > >>> -- > >>> T +61 (02) 9717 9907 > >>> F +61 (02) 9717 3145 > >>> M +61 (04) 0249 4148 > >>> _______________________________________________ > >>> ddlm-group mailing list > >>> ddlm-group@iucr.org > >>> http://scripts.iucr.org/mailman/listinfo/ddlm-group > >> > >> _______________________________________________ > >> ddlm-group mailing list > >> ddlm-group@iucr.org > >> http://scripts.iucr.org/mailman/listinfo/ddlm-group > >> > >> > > > > > > > >-- > >T +61 (02) 9717 9907 > >F +61 (02) 9717 3145 > >M +61 (04) 0249 4148 > >_______________________________________________ > >ddlm-group mailing list > >ddlm-group@iucr.org > >http://scripts.iucr.org/mailman/listinfo/ddlm-group > > > -- > ===================================================== > Herbert J. Bernstein, Professor of Computer Science > Dowling College, Kramer Science Center, KSC 121 > Idle Hour Blvd, Oakdale, NY, 11769 > > +1-631-244-3035 > yaya@dowling.edu > ===================================================== > _______________________________________________ > ddlm-group mailing list > ddlm-group@iucr.org > http://scripts.iucr.org/mailman/listinfo/ddlm-group > >
_______________________________________________ ddlm-group mailing list ddlm-group@iucr.org http://scripts.iucr.org/mailman/listinfo/ddlm-group
Reply to: [list | sender only]
- Follow-Ups:
- Re: [ddlm-group] Technical issues with Proposal P (SIMON WESTRIP)
- References:
- [ddlm-group] Technical issues with Proposal P (James Hester)
- Re: [ddlm-group] Technical issues with Proposal P (Herbert J. Bernstein)
- Re: [ddlm-group] Technical issues with Proposal P (SIMON WESTRIP)
- Re: [ddlm-group] Technical issues with Proposal P (Herbert J. Bernstein)
- Re: [ddlm-group] Technical issues with Proposal P (SIMON WESTRIP)
- Re: [ddlm-group] Technical issues with Proposal P (Herbert J. Bernstein)
- Re: [ddlm-group] Technical issues with Proposal P (SIMON WESTRIP)
- Re: [ddlm-group] Technical issues with Proposal P (SIMON WESTRIP)
- Re: [ddlm-group] Technical issues with Proposal P (Herbert J. Bernstein)
- Re: [ddlm-group] Technical issues with Proposal P (SIMON WESTRIP)
- Re: [ddlm-group] Technical issues with Proposal P (James Hester)
- Re: [ddlm-group] Technical issues with Proposal P (Herbert J. Bernstein)
- Re: [ddlm-group] Technical issues with Proposal P (James Hester)
- Re: [ddlm-group] Technical issues with Proposal P (Herbert J. Bernstein)
- Re: [ddlm-group] Technical issues with Proposal P (James Hester)
- Re: [ddlm-group] Technical issues with Proposal P (Herbert J. Bernstein)
- Re: [ddlm-group] Technical issues with Proposal P (SIMON WESTRIP)
- Re: [ddlm-group] Technical issues with Proposal P (SIMON WESTRIP)
- Prev by Date: Re: [ddlm-group] Technical issues with Proposal P
- Next by Date: Re: [ddlm-group] Technical issues with Proposal P
- Prev by thread: Re: [ddlm-group] Technical issues with Proposal P
- Next by thread: Re: [ddlm-group] Technical issues with Proposal P
- Index(es):