[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Reply to: [list | sender only]
Re: [ddlm-group] Use of elides in strings
- To: James Hester <jamesrhester@gmail.com>
- Subject: Re: [ddlm-group] Use of elides in strings
- From: John Westbrook <jwest@pdb-mail.rutgers.edu>
- Date: Tue, 24 Nov 2009 21:33:28 -0500
- Cc: Group finalising DDLm and associated dictionaries <ddlm-group@iucr.org>
- In-Reply-To: <279aad2a0911241806s3129bd4bvc6f315ca0764d3e3@mail.gmail.com>
- Organization: RCSB Protein Data Bank
- References: <279aad2a0911231800g6c26bdaancdd4a38fecebbb7a@mail.gmail.com> <C731AC95.125CB%nick@csse.uwa.edu.au> <279aad2a0911241414j1d89b6b3mfec464fdc401fbfd@mail.gmail.com> <alpine.BSF.2.00.0911241717100.78685@epsilon.pair.com> <279aad2a0911241454h12811f4eqfc47dd5eafa22c84@mail.gmail.com> <alpine.BSF.2.00.0911241807480.78685@epsilon.pair.com> <279aad2a0911241602u63486a1es2e98c940526af7c4@mail.gmail.com> <alpine.BSF.2.00.0911241916470.78685@epsilon.pair.com> <4B0C825E.5020102@pdb-mail.rutgers.edu><279aad2a0911241806s3129bd4bvc6f315ca0764d3e3@mail.gmail.com>
Hi James, My preference is avoid the elides in the syntax for the purpose of escaping terminators in strings deferring interpretation to the application. I do not understand all of the issues related to line folding, which I believe is an issue for Brian and Simon. John James Hester wrote: > Thanks for the quick reply over Thanksgiving, John. I take from your > message that the PDB does not need any elide mechanism to be defined > in the CIF2 syntax. Would you therefore be prepared to vote in favour > of not defining any elides, or would you prefer to abstain? > > Votes so far: > > No elides: James, Nick, Herbert if the IUCr + PDB say it is OK > Elides:? > > Unknown: John, Joe, David B., Brian, Simon > > On Wed, Nov 25, 2009 at 12:03 PM, John Westbrook > <jwest@pdb-mail.rutgers.edu> wrote: >> I confess that I am having difficulty keeping up with all aspects >> of this discussion. Following Herb's suggestion I will try to >> summarize the quoting issues from the PDB perspective. >> >> 1. As there are multiple ways of quoting a string our tools and files >> surround embedded quotes with quotes of the opposite sense or with >> semicolons in the mixed case. I think that this point has been >> covered a number of times now and I believe that Nick has suggested >> that all reasonable cases can be handled by using this approach. >> >> 2. I too was not aware that original definition of terminators >> had changed and did not include either a leading or trailing >> whitespace. Certainly this must still be the case for single >> and double quotes. I cannot recall ever seeing an example >> where the terminator \n; was following by a whitespace character, >> but about half of the codes that I am familiar with would >> fall over on \n;next_token. >> >> 3. Line folding has never been an issue for PDB nor has line length. >> >> Regards, >> >> John >> >> >> Herbert J. Bernstein wrote: >>> My major concern about anything we do is to be able to preserve >>> the functionality of the practices that the IUCr is following in >>> journal publications and the PDB is following. Inasmuch as they seem >>> able to cope with no elide in CIF 1.1, the remaining question is whether >>> they will be negatively impacted by the change in string termination >>> without any elide. If they can use CIF 2 with these changes, my >>> objections are purely academic and irrelevant. -- Herberrt >>> >>> ===================================================== >>> Herbert J. Bernstein, Professor of Computer Science >>> Dowling College, Kramer Science Center, KSC 121 >>> Idle Hour Blvd, Oakdale, NY, 11769 >>> >>> +1-631-244-3035 >>> yaya@dowling.edu >>> ===================================================== >>> >>> On Wed, 25 Nov 2009, James Hester wrote: >>> >>>> Herbert: I have the dubious advantage of not having participated in >>>> all those CIF1.0/1.1 discussions, so only have the spec as written >>>> down to rely on. >>>> >>>> Anyway, how do you feel about abandoning any specification of elides >>>> in CIF2 syntax, as suggested by Nick? >>>> >>>> On Wed, Nov 25, 2009 at 10:53 AM, Herbert J. Bernstein >>>> <yaya@bernstein-plus-sons.com> wrote: >>>>> Dear James, >>>>> >>>>> I started to write: >>>>> "No, in CIF 1.1, none of the terminal quote marks, including the \n; >>>>> are >>>>> effective unless followed by whitespace (\n, space, tab, of end of >>>>> file). >>>>> This is a well-established, and very tricky part of the CIF spec >>>>> going back >>>>> to 1990. That is why Nick had to explicitly specify that a terminal >>>>> quote >>>>> mark would be effective no matter what it was followed by." >>>>> >>>>> But the grammer currently on the IUCr web site is _not_ the one that I >>>>> recall COMCIFs discussing and approving. It now explcitly removes >>>>> the requirement for terminal white space in the special case of >>>>> the \n; text field terminator. I don't recall when that change was >>>>> adopted, >>>>> but it appears that you are right under the current spec >>>>> about the example I chose. Inasmuch as there is a lot of working code >>>>> that enforces and uses the original whitespace handling and uses it >>>>> in line-folding, I will not revise CIFtbx 3, but I will try to do >>>>> something to adapt to this change for CIFtbx 4. >>>>> >>>>> I guess we are just going to have yet another few dialects of CIF. >>>>> >>>>> Regards, >>>>> Herbert >>>>> ===================================================== >>>>> Herbert J. Bernstein, Professor of Computer Science >>>>> Dowling College, Kramer Science Center, KSC 121 >>>>> Idle Hour Blvd, Oakdale, NY, 11769 >>>>> >>>>> +1-631-244-3035 >>>>> yaya@dowling.edu >>>>> ===================================================== >>>>> >>>>> On Wed, 25 Nov 2009, James Hester wrote: >>>>> >>>>>> To be precise, we are not 'referring all elides to the application' >>>>>> because no elides are recognised by the lexer under Nick's latest >>>>>> suggestion, so there are no elides to refer to the application. >>>>>> >>>>>> My understanding of CIF1.1 syntax suggests that the string you provide >>>>>> would produce a syntax error in CIF1.1, as the semicolon at the start >>>>>> of the second line would terminate the string, and so whitespace >>>>>> should then appear as the second character on the second line, rather >>>>>> than reverse solidus. >>>>>> >>>>>> On Wed, Nov 25, 2009 at 9:23 AM, Herbert J. Bernstein >>>>>> <yaya@bernstein-plus-sons.com> wrote: >>>>>>> The only problem with referring all elisdes to the application is that >>>>>>> with the removal of the requirement of a blank after a \n; for it >>>>>>> to be >>>>>>> effective, the line folding protocol develops a slight gap. The >>>>>>> case is as follows >>>>>>> >>>>>>> ;\ >>>>>>> ;\ >>>>>>> ; >>>>>>> >>>>>>> Is a valid single text field in CIF 1.1, which when handled with the >>>>>>> line folding protocol translates to the equivalent of ';' because the >>>>>>> embedded ;\ is not a valid text terminator. If we require that >>>>>>> a text field the begins with "\n;\\" must be terminated by "\n; " >>>>>>> or "\n;\n" or "\n;\t" that problem would be fixed. >>>>>>> >>>>>>> ===================================================== >>>>>>> Herbert J. Bernstein, Professor of Computer Science >>>>>>> Dowling College, Kramer Science Center, KSC 121 >>>>>>> Idle Hour Blvd, Oakdale, NY, 11769 >>>>>>> >>>>>>> +1-631-244-3035 >>>>>>> yaya@dowling.edu >>>>>>> ===================================================== >>>>>>> >>>>>>> On Wed, 25 Nov 2009, James Hester wrote: >>>>>>> >>>>>>>> I wholeheartedly agree with Nick's suggestion. >>>>>>>> >>>>>>>> On Tue, Nov 24, 2009 at 6:30 PM, Nick Spadaccini >>>>>>>> <nick@csse.uwa.edu.au> >>>>>>>> wrote: >>>>>>>>> It appears to me that we have spent far too long on a syntactic >>>>>>>>> issue >>>>>>>>> which >>>>>>>>> can be avoided 99.9999% of the time. Quite simply given the 5 >>>>>>>>> ways to >>>>>>>>> delimit strings, it is next to impossible to get a situation >>>>>>>>> where you >>>>>>>>> cannot choose one of those to make the problem go away. >>>>>>>>> >>>>>>>>> I think the RCSB systematically avoid it by choosing >>>>>>>>> >>>>>>>>> "ab'cd" >>>>>>>>> 'ab"cd' >>>>>>>>> ;ab'"cd >>>>>>>>> ; >>>>>>>>> >>>>>>>>> But now we additionally have """ and ''' to choose from, making >>>>>>>>> it even >>>>>>>>> easier. >>>>>>>>> >>>>>>>>> So I propose in line with James' position there is NO eliding of >>>>>>>>> terminator >>>>>>>>> character at the CIF2 syntax level. ALL elides in the string are >>>>>>>>> assumed >>>>>>>>> to >>>>>>>>> be user specific encoding (say TeX, IUCr \greek) which can be >>>>>>>>> resolved >>>>>>>>> at >>>>>>>>> the dictionary level. >>>>>>>>> >>>>>>>>> This necessarily means NO terminator character can appear in a >>>>>>>>> string >>>>>>>>> delimited by the same terminator character. You will need to >>>>>>>>> choose a >>>>>>>>> different terminator character. That is >>>>>>>>> >>>>>>>>> No " in "strings" >>>>>>>>> No ' in 'strings' >>>>>>>>> No """ in """strings""" (but separable individual and doublet " are >>>>>>>>> allowed) >>>>>>>>> No ''' in '''strings''' (but separable individual and doublet ' are >>>>>>>>> allowed) >>>>>>>>> >>>>>>>>> EVERYTHING in the string is returned as raw (except the >>>>>>>>> initiating and >>>>>>>>> terminating character). >>>>>>>>> >>>>>>>>> The only time you will not be able to encode anything in a delimited >>>>>>>>> string >>>>>>>>> is when you want to include ' " """ ''' and \n; in the one >>>>>>>>> string. The >>>>>>>>> likelihood of that is almost zero, unless you may want to include >>>>>>>>> a CIF >>>>>>>>> within a CIF (a silly thing to do IMHO). In that case the >>>>>>>>> contents can >>>>>>>>> be >>>>>>>>> encoded in a dictionary driven way. I suggest it be declared as a >>>>>>>>> BASE64 >>>>>>>>> type and then all the syntactic ambiguity disappears. >>>>>>>>> >>>>>>>>> Problem solved! No need to elide because of CIF2 syntax rules all >>>>>>>>> elides >>>>>>>>> are >>>>>>>>> user driven, contents are returned raw. >>>>>>>>> >>>>>>>>> As for Herbs comment in a recent email what about line-folding, then >>>>>>>>> the >>>>>>>>> same holds. That is NOT a lexer issue and it has nothing to do >>>>>>>>> with the >>>>>>>>> parser, everything is read literally and returned raw and what to do >>>>>>>>> with >>>>>>>>> it >>>>>>>>> is promulgated to the downstream application. >>>>>>>>> >>>>>>>>> Straw vote - No elides of terminator strings as described above - >>>>>>>>> Nick >>>>>>>>> >>>>>>>>> >>>> >>>> >>>> -- >>>> T +61 (02) 9717 9907 >>>> F +61 (02) 9717 3145 >>>> M +61 (04) 0249 4148 >>>> _______________________________________________ >>>> ddlm-group mailing list >>>> ddlm-group@iucr.org >>>> http://scripts.iucr.org/mailman/listinfo/ddlm-group >>>> >>> ------------------------------------------------------------------------ >>> >>> _______________________________________________ >>> ddlm-group mailing list >>> ddlm-group@iucr.org >>> http://scripts.iucr.org/mailman/listinfo/ddlm-group >> _______________________________________________ >> ddlm-group mailing list >> ddlm-group@iucr.org >> http://scripts.iucr.org/mailman/listinfo/ddlm-group >> > > > _______________________________________________ ddlm-group mailing list ddlm-group@iucr.org http://scripts.iucr.org/mailman/listinfo/ddlm-group
Reply to: [list | sender only]
- Follow-Ups:
- Re: [ddlm-group] Use of elides in strings (Nick Spadaccini)
- References:
- Re: [ddlm-group] Use of elides in strings (James Hester)
- Re: [ddlm-group] Use of elides in strings (Nick Spadaccini)
- Re: [ddlm-group] Use of elides in strings (James Hester)
- Re: [ddlm-group] Use of elides in strings (Herbert J. Bernstein)
- Re: [ddlm-group] Use of elides in strings (James Hester)
- Re: [ddlm-group] Use of elides in strings (Herbert J. Bernstein)
- Re: [ddlm-group] Use of elides in strings (James Hester)
- Re: [ddlm-group] Use of elides in strings (Herbert J. Bernstein)
- Re: [ddlm-group] Use of elides in strings (John Westbrook)
- Re: [ddlm-group] Use of elides in strings (James Hester)
- Prev by Date: Re: [ddlm-group] Use of elides in strings
- Next by Date: Re: [ddlm-group] Use of elides in strings
- Prev by thread: Re: [ddlm-group] Use of elides in strings
- Next by thread: Re: [ddlm-group] Use of elides in strings
- Index(es):