[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Reply to: [list | sender only]
Re: [ddlm-group] Handling single string values longer than maximumline length
- To: Group finalising DDLm and associated dictionaries <ddlm-group@iucr.org>
- Subject: Re: [ddlm-group] Handling single string values longer than maximumline length
- From: "Herbert J. Bernstein" <yaya@bernstein-plus-sons.com>
- Date: Wed, 25 Nov 2009 22:18:30 -0500 (EST)
- In-Reply-To: <279aad2a0911251506s68029e45ge8a2e35567e6b71f@mail.gmail.com>
- References: <20091125151844.GA14826@emerald.iucr.org><4B0D9CB3.7090302@niehs.nih.gov><279aad2a0911251506s68029e45ge8a2e35567e6b71f@mail.gmail.com>
The advantage of choosing option 2 is that it is very easy to implement: "\n;\\\n" flags line-folding. Inasmuch as the line folding protocol has been in CIF for a long time, it is highly unlikely that we will encounter a large number of other uses of text fields that start with "\n;\\\n". Any procesor that ignore this convention does no harm -- they just have a text field that begins with a backslash and a newline, and, if they put that text field out that way, nothing has been lost. For that reason, when the treble quote is introduced, I would suggest exacty the same convention. A treble quoted string starting with "\'\'\'\\\n" or "\"\"\"\\\n" would be reserved for flagging the line-folding protocol, so that a text string presented as a semi-colon delimited string or as a treble quoted string would have the same meaning Regards, Herbert ===================================================== Herbert J. Bernstein, Professor of Computer Science Dowling College, Kramer Science Center, KSC 121 Idle Hour Blvd, Oakdale, NY, 11769 +1-631-244-3035 yaya@dowling.edu ===================================================== On Thu, 26 Nov 2009, James Hester wrote: > The line-folding protocol is completely compatible with CIF1.0/1.1 syntax > (ITG 2.2.7.4.11 para 3). None of the syntax changes in CIF2 (so far) make > any difference to that, so the line folding protocol could be adopted for > CIF2. We have three options: > > (1) As for CIF1: the protocol exists but support for it is optional. > (2) The protocol is compulsory: the <line termination><semicolon><reverse > solidus> sequence must be recognised > (3) The availability of the protocol is selectable by dictionary writers via > a predefined type. In this case use of the protocol would still be optional > for CIF data file writers, but CIF reading applications would be required to > recognise the initiation sequence given in (2) (and any others we define). > > Note on (3) - this does not mean you have to read in a dictionary in order > to process a data file. It means that the programmer writing the CIF > application has to read the dictionary when creating the software, which > they do anyway. > > On Thu, Nov 26, 2009 at 8:08 AM, Joe Krahn <krahn@niehs.nih.gov> wrote: > If we allow for line-folding to occur after parsing, and > backslash (aka > reverse solidus) is not used as an elide/escape character, then > no > special rules are needed to put line-folded strings within > triple-quoted > strings. > > > Agreed. > > Semicolon-delimited strings are a special case because folding > is > considered in preventing a subsequent semicolon to be > interpreted as > beginning on a new line, and not accepted as the close-quote. > > > Actually, this is not the case. ITG sec 2.2.7.4.11 notes "The final line > termination-semicolon sequence of a text field takes priority over the > reassembly process and ends it, but a trailing backslash on the last line of > a text field very nicely conveys the information that no trailing line > termination is intended to be included within the character string". > > However, if the lexer does not unfold long strings, this > actually is an > elide mechanism. The consensus seems to be heading towards no > elide > mechanism, so maybe breaking long lines just before a semicolon > should > simply be prohibited. > > ALSO, SOMETHING I JUST REALIZED: Line folding makes it possible > to break > an embedded triple-quote into two parts, so it actually provides > a way > to elide triple quotes indirectly within a large triple-quote > string. > Therefore, as long as CIF2 keeps line-folding, it is trivial to > put > CIF-within-CIF, or any other unrestricted string, without any of > the > reverse-solidus elide rules. > > > Good point. The line-folding protocol would need to be extended for triple > quoted strings, as there is currently no defined way to signal that it is in > operation. Presumably the characters <quote><quote><quote><reverse-solidus> > could signal that the line-folding protocol was operational. > > Joe > > Brian McMahon wrote: > >> (I've switched the thread title to deal separately with line > folding.) > > > > Well, I didn't because I was distracted when about to hit the > > 'Send' button! So this is just a repeat of the previous posting but > > under a new thread in case we wish to take up this general > discussion > > later. > > > > Regards > > Brian > > > > As Herbert says, line folding is part of the CIF 1.1 spec (pages > 34-35 > > of the ITG bible). Currently, it invokes a special meaning for the > > backslash (reverse solidus) character, but only when it is the first > > non-blank after an opening semicolon or comment hash delimiter. We > have > > yet to discuss whether to extend it to other string types > (specifically > > the triple-quoted strings). > > > > It's quite easy these days to generate single strings that are > longer > > than 2048 characters (or any other arbitrary line limit) - e.g. a > > protein or nucleic acid sequence. Many, many chemical names broke > the old > > 80-character line length limit. > > > > We're very happy with CIF applications that do not interpret the > > line-folding protocol, so long as they preserve the existing > backslashes. > > However, a fully-compliant CIF 1.1 parser should be able to return > an > > unfolded string to an application that requests it. > > > > As Herbert says, if this were dropped as part of the CIF2 > specification, > > we would need to think carefully about how else to retain this > > functionality. > > > > Regards > > Brian > > > > On Wed, Nov 25, 2009 at 07:54:51AM -0500, Herbert J. Bernstein > wrote: > >> The line folding protocol was discussed and adopted by COMCIFS and > is > >> posted, aong with other "Common Semantic Features" at > >> > >> http://www.iucr.org/resources/cif/spec/version1.1/semantics > >> > >> but that is neither here nor there. The point is that the IUCr > uses CIF > >> to get work done. If we disable something they are using, we > should offer > >> some equivalent functionality so they can use CIF 2 to do their > work. > >> Otherwise, they will have to do the sensible thing, and continue to > use > >> CIF 1, or, worse, create their own dialect of CIF 2. > >> > >> Now, I broke my nose yesterday morning and find myself a bit punchy > today, > >> so I will drop out of this discussion for a while. Hopefully, when > I > >> return to it, this whole matter will be settled in some way that > will > >> allow people to actually use CIF 2, instead of it becoming what it > seems > >> on its way to becoming -- something elegant but not terrible > useful, a bit > >> like PL/I. > >> > >> Cheers, > >> Herbert > >> > >> ===================================================== > >> Herbert J. Bernstein, Professor of Computer Science > >> Dowling College, Kramer Science Center, KSC 121 > >> Idle Hour Blvd, Oakdale, NY, 11769 > >> > >> +1-631-244-3035 > >> yaya@dowling.edu > >> ===================================================== > > _______________________________________________ > > ddlm-group mailing list > > ddlm-group@iucr.org > > http://scripts.iucr.org/mailman/listinfo/ddlm-group > > > > _______________________________________________ > ddlm-group mailing list > ddlm-group@iucr.org > http://scripts.iucr.org/mailman/listinfo/ddlm-group > > > > > -- > T +61 (02) 9717 9907 > F +61 (02) 9717 3145 > M +61 (04) 0249 4148 > >
_______________________________________________ ddlm-group mailing list ddlm-group@iucr.org http://scripts.iucr.org/mailman/listinfo/ddlm-group
Reply to: [list | sender only]
- References:
- Prev by Date: Re: [ddlm-group] Handling single string values longer than maximumline length
- Next by Date: Re: [ddlm-group] Handling single string values longer than maximumline length
- Prev by thread: Re: [ddlm-group] Handling single string values longer than maximumline length
- Next by thread: Re: [ddlm-group] Handling single string values longer than maximumline length
- Index(es):