[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Reply to: [list | sender only]
Re: [ddlm-group] Use of elides in strings
- To: Group finalising DDLm and associated dictionaries <ddlm-group@iucr.org>
- Subject: Re: [ddlm-group] Use of elides in strings
- From: Nick Spadaccini <nick@csse.uwa.edu.au>
- Date: Thu, 26 Nov 2009 13:35:28 +0800
- Authentication-Results: postfix;
- In-Reply-To: <466696.60300.qm@web87013.mail.ird.yahoo.com>
On 25/11/09 10:24 PM, "SIMON WESTRIP" <simonwestrip@btinternet.com> wrote: > What Brian has said here - specifically > > "if this were dropped as part of the CIF2 specification, > we would need to think carefully about how else to retain this > functionality" > > is also relevant to how we handle the CIF1.1 markup conventions. > As I understand it in CIF1.1 these are the default conventions for > text fields unless the dictionary prohibits them, but in CIF2 all such > conventions will _not_ be part of the spec, and can only be interpretted at > the dictionary level. > > Is this correct? Yes, this is my understanding. There will be many different conventions I presume, some will be widely accepted and standard, they will be part of the underlying systems that interpret the files. For instance if something is declared as a TeX encoding, we know what to do. > > I'm only asking because we (at the IUCr at least) will have to address this > issue sooner rather than later when adopting CIF2, so I just want to make sure > I understand base CIF2 correctly > > Cheers > > Simon > > > > From: Brian McMahon <bm@iucr.org> > To: Group finalising DDLm and associated dictionaries <ddlm-group@iucr.org> > Sent: Wednesday, 25 November, 2009 13:34:05 > Subject: Re: [ddlm-group] Use of elides in strings > > (I've switched the thread title to deal separately with line folding.) > > As Herbert says, line folding is part of the CIF 1.1 spec (pages 34-35 > of the ITG bible). Currently, it invokes a special meaning for the > backslash (reverse solidus) character, but only when it is the first > non-blank after an opening semicolon or comment hash delimiter. We have > yet to discuss whether to extend it to other string types (specifically > the triple-quoted strings). > > It's quite easy these days to generate single strings that are longer > than 2048 characters (or any other arbitrary line limit) - e.g. a > protein or nucleic acid sequence. Many, many chemical names broke the old > 80-character line length limit. > > We're very happy with CIF applications that do not interpret the > line-folding protocol, so long as they preserve the existing backslashes. > However, a fully-compliant CIF 1.1 parser should be able to return an > unfolded string to an application that requests it. > > As Herbert says, if this were dropped as part of the CIF2 specification, > we would need to think carefully about how else to retain this > functionality. > > Regards > Brian > > On Wed, Nov 25, 2009 at 07:54:51AM -0500, Herbert J. Bernstein wrote: >> The line folding protocol was discussed and adopted by COMCIFS and is >> posted, aong with other "Common Semantic Features" at >> >> http://www.iucr.org/resources/cif/spec/version1.1/semantics >> >> but that is neither here nor there. The point is that the IUCr uses CIF >> to get work done. If we disable something they are using, we should offer >> some equivalent functionality so they can use CIF 2 to do their work. >> Otherwise, they will have to do the sensible thing, and continue to use >> CIF 1, or, worse, create their own dialect of CIF 2. >> >> Now, I broke my nose yesterday morning and find myself a bit punchy today, >> so I will drop out of this discussion for a while. Hopefully, when I >> return to it, this whole matter will be settled in some way that will >> allow people to actually use CIF 2, instead of it becoming what it seems >> on its way to becoming -- something elegant but not terrible useful, a bit >> like PL/I. >> >> Cheers, >> Herbert >> >> ===================================================== >> Herbert J. Bernstein, Professor of Computer Science >> Dowling College, Kramer Science Center, KSC 121 >> Idle Hour Blvd, Oakdale, NY, 11769 >> >> +1-631-244-3035 >> yaya@dowling.edu >> ===================================================== >> >> On Wed, 25 Nov 2009, Nick Spadaccini wrote: >> >>> I am with John. STAR has no line-folding protocol. As far as I can recall >>> neither did CIF. Somewhere along the way line folding was discussed (or >>> introduced?), but I am not sure it is formally part of any spec. >>> >>> None of my software handles anything about line folding. I can see no reason >>> for it, since with a 2048 maximum record length, and a free format structure >>> there is plenty of room to output your data. The only time it would be >>> necessary is when (dataname + space + datavalue)> 2048 and when is that >>> ever going to happen? >>> >>> May be the desire for it comes from making the data "pretty" and read well >>> in a text editor. Well that is the task of an application to read the CIF >>> and present it appropriately. The CIF is strictly about CONTENT and not >>> FORM. >>> >>> Since we have given up on elided characters being part of CIF syntax, and >>> the belief by others that this not be a lexer issue, I think we should >>> absolutely consistent. The lexer knows how to identify tokens and reads >>> everything within them as a raw string. >>> >>> If your "encoding" for \n; strings includes characters that break the lexer, >>> then protect it in some way so that when you pass that string back as raw in >>> your software, somebody knows how to unprotect it back to the original (as >>> with ALL string encoding). >>> >>> One concession I think we can consider is to change the delimiter from \n; >>> to \n;\n. I don't see this as causing me any problems, since I handle >>> >>> ; stuff >>> More stuff >>> ; _newname >>> >>> routinely, but others don't. I believe most people do use (and probably >>> think) the delimiter is \n;\n anyway. >>> >>> Two questions >>> >>> (1) Do you agree that line folding just another encoding and therefore not a >>> STAR/CIF issue? Consequently it is the responsibility of the encoding not to >>> break the lexer. >>> (2) Do we think \n;\n is a better delimiter? >>> >>> On 25/11/09 10:33 AM, "John Westbrook" <jwest@pdb-mail.rutgers.edu> wrote: >>> >>>> Hi James, >>>> >>>> My preference is avoid the elides in the syntax for the purpose of escaping >>>> terminators >>>> in strings deferring interpretation to the application. >>>> >>>> I do not understand all of the issues related to line folding, which I >>>> believe is an issue for Brian and Simon. >>>> >>>> John >>>> >>>> >>>> James Hester wrote: >>>>> Thanks for the quick reply over Thanksgiving, John. I take from your >>>>> message that the PDB does not need any elide mechanism to be defined >>>>> in the CIF2 syntax. Would you therefore be prepared to vote in favour >>>>> of not defining any elides, or would you prefer to abstain? >>>>> >>>>> Votes so far: >>>>> >>>>> No elides: James, Nick, Herbert if the IUCr + PDB say it is OK >>>>> Elides:? >>>>> >>>>> Unknown: John, Joe, David B., Brian, Simon >>>>> >>>>> On Wed, Nov 25, 2009 at 12:03 PM, John Westbrook >>>>> <jwest@pdb-mail.rutgers.edu> wrote: >>>>>> I confess that I am having difficulty keeping up with all aspects >>>>>> of this discussion. Following Herb's suggestion I will try to >>>>>> summarize the quoting issues from the PDB perspective. >>>>>> >>>>>> 1. As there are multiple ways of quoting a string our tools and files >>>>>> surround embedded quotes with quotes of the opposite sense or with >>>>>> semicolons in the mixed case. I think that this point has been >>>>>> covered a number of times now and I believe that Nick has suggested >>>>>> that all reasonable cases can be handled by using this approach. >>>>>> >>>>>> 2. I too was not aware that original definition of terminators >>>>>> had changed and did not include either a leading or trailing >>>>>> whitespace. Certainly this must still be the case for single >>>>>> and double quotes. I cannot recall ever seeing an example >>>>>> where the terminator \n; was following by a whitespace character, >>>>>> but about half of the codes that I am familiar with would >>>>>> fall over on \n;next_token. >>>>>> >>>>>> 3. Line folding has never been an issue for PDB nor has line length. >>>>>> >>>>>> Regards, >>>>>> >>>>>> John >>>>>> >>>>>> >>>>>> Herbert J. Bernstein wrote: >>>>>>> My major concern about anything we do is to be able to preserve >>>>>>> the functionality of the practices that the IUCr is following in >>>>>>> journal publications and the PDB is following. Inasmuch as they seem >>>>>>> able to cope with no elide in CIF 1.1, the remaining question is whether >>>>>>> they will be negatively impacted by the change in string termination >>>>>>> without any elide. If they can use CIF 2 with these changes, my >>>>>>> objections are purely academic and irrelevant. -- Herberrt >>>>>>> >>>>>>> ===================================================== >>>>>>> Herbert J. Bernstein, Professor of Computer Science >>>>>>> Dowling College, Kramer Science Center, KSC 121 >>>>>>> Idle Hour Blvd, Oakdale, NY, 11769 >>>>>>> >>>>>>> +1-631-244-3035 >>>>>>> yaya@dowling.edu >>>>>>> ===================================================== >>>>>>> >>>>>>> On Wed, 25 Nov 2009, James Hester wrote: >>>>>>> >>>>>>>> Herbert: I have the dubious advantage of not having participated in >>>>>>>> all those CIF1.0/1.1 discussions, so only have the spec as written >>>>>>>> down to rely on. >>>>>>>> >>>>>>>> Anyway, how do you feel about abandoning any specification of elides >>>>>>>> in CIF2 syntax, as suggested by Nick? >>>>>>>> >>>>>>>> On Wed, Nov 25, 2009 at 10:53 AM, Herbert J. Bernstein >>>>>>>> <yaya@bernstein-plus-sons.com> wrote: >>>>>>>>> Dear James, >>>>>>>>> >>>>>>>>> I started to write: >>>>>>>>> "No, in CIF 1.1, none of the terminal quote marks, including the \n; >>>>>>>>> are >>>>>>>>> effective unless followed by whitespace (\n, space, tab, of end of >>>>>>>>> file). >>>>>>>>> This is a well-established, and very tricky part of the CIF spec >>>>>>>>> going back >>>>>>>>> to 1990. That is why Nick had to explicitly specify that a terminal >>>>>>>>> quote >>>>>>>>> mark would be effective no matter what it was followed by." >>>>>>>>> >>>>>>>>> But the grammer currently on the IUCr web site is _not_ the one that >>>>>>>>> I >>>>>>>>> recall COMCIFs discussing and approving. It now explcitly removes >>>>>>>>> the requirement for terminal white space in the special case of >>>>>>>>> the \n; text field terminator. I don't recall when that change was >>>>>>>>> adopted, >>>>>>>>> but it appears that you are right under the current spec >>>>>>>>> about the example I chose. Inasmuch as there is a lot of working code >>>>>>>>> that enforces and uses the original whitespace handling and uses it >>>>>>>>> in line-folding, I will not revise CIFtbx 3, but I will try to do >>>>>>>>> something to adapt to this change for CIFtbx 4. >>>>>>>>> >>>>>>>>> I guess we are just going to have yet another few dialects of CIF. >>>>>>>>> >>>>>>>>> Regards, >>>>>>>>> Herbert >>>>>>>>> ===================================================== >>>>>>>>> Herbert J. Bernstein, Professor of Computer Science >>>>>>>>> Dowling College, Kramer Science Center, KSC 121 >>>>>>>>> Idle Hour Blvd, Oakdale, NY, 11769 >>>>>>>>> >>>>>>>>> +1-631-244-3035 >>>>>>>>> yaya@dowling.edu >>>>>>>>> ===================================================== >>>>>>>>> >>>>>>>>> On Wed, 25 Nov 2009, James Hester wrote: >>>>>>>>> >>>>>>>>>> To be precise, we are not 'referring all elides to the application' >>>>>>>>>> because no elides are recognised by the lexer under Nick's latest >>>>>>>>>> suggestion, so there are no elides to refer to the application. >>>>>>>>>> >>>>>>>>>> My understanding of CIF1.1 syntax suggests that the string you >>>>>>>>>> provide >>>>>>>>>> would produce a syntax error in CIF1.1, as the semicolon at the start >>>>>>>>>> of the second line would terminate the string, and so whitespace >>>>>>>>>> should then appear as the second character on the second line, rather >>>>>>>>>> than reverse solidus. >>>>>>>>>> >>>>>>>>>> On Wed, Nov 25, 2009 at 9:23 AM, Herbert J. Bernstein >>>>>>>>>> <yaya@bernstein-plus-sons.com> wrote: >>>>>>>>>>> The only problem with referring all elisdes to the application is >>>>>>>>>>> that >>>>>>>>>>> with the removal of the requirement of a blank after a \n; for it >>>>>>>>>>> to be >>>>>>>>>>> effective, the line folding protocol develops a slight gap. The >>>>>>>>>>> case is as follows >>>>>>>>>>> >>>>>>>>>>> ;\ >>>>>>>>>>> ;\ >>>>>>>>>>> ; >>>>>>>>>>> >>>>>>>>>>> Is a valid single text field in CIF 1.1, which when handled with the >>>>>>>>>>> line folding protocol translates to the equivalent of ';' because >>>>>>>>>>> the >>>>>>>>>>> embedded ;\ is not a valid text terminator. If we require that >>>>>>>>>>> a text field the begins with "\n;\\" must be terminated by "\n; " >>>>>>>>>>> or "\n;\n" or "\n;\t" that problem would be fixed. >>>>>>>>>>> >>>>>>>>>>> ===================================================== >>>>>>>>>>> Herbert J. Bernstein, Professor of Computer Science >>>>>>>>>>> Dowling College, Kramer Science Center, KSC 121 >>>>>>>>>>> Idle Hour Blvd, Oakdale, NY, 11769 >>>>>>>>>>> >>>>>>>>>>> +1-631-244-3035 >>>>>>>>>>> yaya@dowling.edu > _______________________________________________ > ddlm-group mailing list > ddlm-group@iucr.org > http://scripts.iucr.org/mailman/listinfo/ddlm-group > > > _______________________________________________ > ddlm-group mailing list > ddlm-group@iucr.org > http://scripts.iucr.org/mailman/listinfo/ddlm-group cheers Nick -------------------------------- Associate Professor N. Spadaccini, PhD School of Computer Science & Software Engineering The University of Western Australia t: +61 (0)8 6488 3452 35 Stirling Highway f: +61 (0)8 6488 1089 CRAWLEY, Perth, WA 6009 AUSTRALIA w3: www.csse.uwa.edu.au/~nick MBDP M002 CRICOS Provider Code: 00126G e: Nick.Spadaccini@uwa.edu.au _______________________________________________ ddlm-group mailing list ddlm-group@iucr.org http://scripts.iucr.org/mailman/listinfo/ddlm-group
Reply to: [list | sender only]
- Follow-Ups:
- Re: [ddlm-group] Use of elides in strings (Herbert J. Bernstein)
- References:
- Re: [ddlm-group] Use of elides in strings (SIMON WESTRIP)
- Prev by Date: Re: [ddlm-group] Handling single string values longer than maximumline length
- Next by Date: Re: [ddlm-group] Use of elides in strings
- Prev by thread: Re: [ddlm-group] Use of elides in strings
- Next by thread: Re: [ddlm-group] Use of elides in strings
- Index(es):