[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Reply to: [list | sender only]
Re: [ddlm-group] Use of elides in strings
- To: Group finalising DDLm and associated dictionaries <ddlm-group@iucr.org>
- Subject: Re: [ddlm-group] Use of elides in strings
- From: James Hester <jamesrhester@gmail.com>
- Date: Wed, 25 Nov 2009 11:44:40 +1100
- In-Reply-To: <alpine.BSF.2.00.0911241916470.78685@epsilon.pair.com>
- References: <279aad2a0911231800g6c26bdaancdd4a38fecebbb7a@mail.gmail.com><C731AC95.125CB%nick@csse.uwa.edu.au><279aad2a0911241414j1d89b6b3mfec464fdc401fbfd@mail.gmail.com><alpine.BSF.2.00.0911241717100.78685@epsilon.pair.com><279aad2a0911241454h12811f4eqfc47dd5eafa22c84@mail.gmail.com><alpine.BSF.2.00.0911241807480.78685@epsilon.pair.com><279aad2a0911241602u63486a1es2e98c940526af7c4@mail.gmail.com><alpine.BSF.2.00.0911241916470.78685@epsilon.pair.com>
Would John and Brian and/or Simon please comment on this? On Wed, Nov 25, 2009 at 11:21 AM, Herbert J. Bernstein <yaya@bernstein-plus-sons.com> wrote: > My major concern about anything we do is to be able to preserve > the functionality of the practices that the IUCr is following in > journal publications and the PDB is following. Inasmuch as they seem able to > cope with no elide in CIF 1.1, the remaining question is whether > they will be negatively impacted by the change in string termination > without any elide. If they can use CIF 2 with these changes, my > objections are purely academic and irrelevant. -- Herberrt > > ===================================================== > Herbert J. Bernstein, Professor of Computer Science > Dowling College, Kramer Science Center, KSC 121 > Idle Hour Blvd, Oakdale, NY, 11769 > > +1-631-244-3035 > yaya@dowling.edu > ===================================================== > > On Wed, 25 Nov 2009, James Hester wrote: > >> Herbert: I have the dubious advantage of not having participated in >> all those CIF1.0/1.1 discussions, so only have the spec as written >> down to rely on. >> >> Anyway, how do you feel about abandoning any specification of elides >> in CIF2 syntax, as suggested by Nick? >> >> On Wed, Nov 25, 2009 at 10:53 AM, Herbert J. Bernstein >> <yaya@bernstein-plus-sons.com> wrote: >>> >>> Dear James, >>> >>> I started to write: >>> "No, in CIF 1.1, none of the terminal quote marks, including the \n; are >>> effective unless followed by whitespace (\n, space, tab, of end of file). >>> This is a well-established, and very tricky part of the CIF spec going >>> back >>> to 1990. That is why Nick had to explicitly specify that a terminal >>> quote >>> mark would be effective no matter what it was followed by." >>> >>> But the grammer currently on the IUCr web site is _not_ the one that I >>> recall COMCIFs discussing and approving. It now explcitly removes >>> the requirement for terminal white space in the special case of >>> the \n; text field terminator. I don't recall when that change was >>> adopted, >>> but it appears that you are right under the current spec >>> about the example I chose. Inasmuch as there is a lot of working code >>> that enforces and uses the original whitespace handling and uses it >>> in line-folding, I will not revise CIFtbx 3, but I will try to do >>> something to adapt to this change for CIFtbx 4. >>> >>> I guess we are just going to have yet another few dialects of CIF. >>> >>> Regards, >>> Herbert >>> ===================================================== >>> Herbert J. Bernstein, Professor of Computer Science >>> Dowling College, Kramer Science Center, KSC 121 >>> Idle Hour Blvd, Oakdale, NY, 11769 >>> >>> +1-631-244-3035 >>> yaya@dowling.edu >>> ===================================================== >>> >>> On Wed, 25 Nov 2009, James Hester wrote: >>> >>>> To be precise, we are not 'referring all elides to the application' >>>> because no elides are recognised by the lexer under Nick's latest >>>> suggestion, so there are no elides to refer to the application. >>>> >>>> My understanding of CIF1.1 syntax suggests that the string you provide >>>> would produce a syntax error in CIF1.1, as the semicolon at the start >>>> of the second line would terminate the string, and so whitespace >>>> should then appear as the second character on the second line, rather >>>> than reverse solidus. >>>> >>>> On Wed, Nov 25, 2009 at 9:23 AM, Herbert J. Bernstein >>>> <yaya@bernstein-plus-sons.com> wrote: >>>>> >>>>> The only problem with referring all elisdes to the application is that >>>>> with the removal of the requirement of a blank after a \n; for it to be >>>>> effective, the line folding protocol develops a slight gap. The >>>>> case is as follows >>>>> >>>>> ;\ >>>>> ;\ >>>>> ; >>>>> >>>>> Is a valid single text field in CIF 1.1, which when handled with the >>>>> line folding protocol translates to the equivalent of ';' because the >>>>> embedded ;\ is not a valid text terminator. If we require that >>>>> a text field the begins with "\n;\\" must be terminated by "\n; " >>>>> or "\n;\n" or "\n;\t" that problem would be fixed. >>>>> >>>>> ===================================================== >>>>> Herbert J. Bernstein, Professor of Computer Science >>>>> Dowling College, Kramer Science Center, KSC 121 >>>>> Idle Hour Blvd, Oakdale, NY, 11769 >>>>> >>>>> +1-631-244-3035 >>>>> yaya@dowling.edu >>>>> ===================================================== >>>>> >>>>> On Wed, 25 Nov 2009, James Hester wrote: >>>>> >>>>>> I wholeheartedly agree with Nick's suggestion. >>>>>> >>>>>> On Tue, Nov 24, 2009 at 6:30 PM, Nick Spadaccini >>>>>> <nick@csse.uwa.edu.au> >>>>>> wrote: >>>>>>> >>>>>>> It appears to me that we have spent far too long on a syntactic issue >>>>>>> which >>>>>>> can be avoided 99.9999% of the time. Quite simply given the 5 ways to >>>>>>> delimit strings, it is next to impossible to get a situation where >>>>>>> you >>>>>>> cannot choose one of those to make the problem go away. >>>>>>> >>>>>>> I think the RCSB systematically avoid it by choosing >>>>>>> >>>>>>> "ab'cd" >>>>>>> 'ab"cd' >>>>>>> ;ab'"cd >>>>>>> ; >>>>>>> >>>>>>> But now we additionally have """ and ''' to choose from, making it >>>>>>> even >>>>>>> easier. >>>>>>> >>>>>>> So I propose in line with James' position there is NO eliding of >>>>>>> terminator >>>>>>> character at the CIF2 syntax level. ALL elides in the string are >>>>>>> assumed >>>>>>> to >>>>>>> be user specific encoding (say TeX, IUCr \greek) which can be >>>>>>> resolved >>>>>>> at >>>>>>> the dictionary level. >>>>>>> >>>>>>> This necessarily means NO terminator character can appear in a string >>>>>>> delimited by the same terminator character. You will need to choose a >>>>>>> different terminator character. That is >>>>>>> >>>>>>> No " in "strings" >>>>>>> No ' in 'strings' >>>>>>> No """ in """strings""" (but separable individual and doublet " are >>>>>>> allowed) >>>>>>> No ''' in '''strings''' (but separable individual and doublet ' are >>>>>>> allowed) >>>>>>> >>>>>>> EVERYTHING in the string is returned as raw (except the initiating >>>>>>> and >>>>>>> terminating character). >>>>>>> >>>>>>> The only time you will not be able to encode anything in a delimited >>>>>>> string >>>>>>> is when you want to include ' " """ ''' and \n; in the one string. >>>>>>> The >>>>>>> likelihood of that is almost zero, unless you may want to include a >>>>>>> CIF >>>>>>> within a CIF (a silly thing to do IMHO). In that case the contents >>>>>>> can >>>>>>> be >>>>>>> encoded in a dictionary driven way. I suggest it be declared as a >>>>>>> BASE64 >>>>>>> type and then all the syntactic ambiguity disappears. >>>>>>> >>>>>>> Problem solved! No need to elide because of CIF2 syntax rules all >>>>>>> elides >>>>>>> are >>>>>>> user driven, contents are returned raw. >>>>>>> >>>>>>> As for Herbs comment in a recent email what about line-folding, then >>>>>>> the >>>>>>> same holds. That is NOT a lexer issue and it has nothing to do with >>>>>>> the >>>>>>> parser, everything is read literally and returned raw and what to do >>>>>>> with >>>>>>> it >>>>>>> is promulgated to the downstream application. >>>>>>> >>>>>>> Straw vote - No elides of terminator strings as described above - >>>>>>> Nick >>>>>>> >>>>>>> >> >>> >> >> >> >> -- >> T +61 (02) 9717 9907 >> F +61 (02) 9717 3145 >> M +61 (04) 0249 4148 >> _______________________________________________ >> ddlm-group mailing list >> ddlm-group@iucr.org >> http://scripts.iucr.org/mailman/listinfo/ddlm-group > > _______________________________________________ > ddlm-group mailing list > ddlm-group@iucr.org > http://scripts.iucr.org/mailman/listinfo/ddlm-group > > -- T +61 (02) 9717 9907 F +61 (02) 9717 3145 M +61 (04) 0249 4148 _______________________________________________ ddlm-group mailing list ddlm-group@iucr.org http://scripts.iucr.org/mailman/listinfo/ddlm-group
Reply to: [list | sender only]
- References:
- Re: [ddlm-group] Use of elides in strings (James Hester)
- Re: [ddlm-group] Use of elides in strings (Nick Spadaccini)
- Re: [ddlm-group] Use of elides in strings (James Hester)
- Re: [ddlm-group] Use of elides in strings (Herbert J. Bernstein)
- Re: [ddlm-group] Use of elides in strings (James Hester)
- Re: [ddlm-group] Use of elides in strings (Herbert J. Bernstein)
- Re: [ddlm-group] Use of elides in strings (James Hester)
- Re: [ddlm-group] Use of elides in strings (Herbert J. Bernstein)
- Prev by Date: Re: [ddlm-group] Use of elides in strings
- Next by Date: Re: [ddlm-group] Use of elides in strings
- Prev by thread: Re: [ddlm-group] Use of elides in strings
- Next by thread: Re: [ddlm-group] Use of elides in strings
- Index(es):