[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Reply to: [list | sender only]
Re: [ddlm-group] Handling single string values longer than maximumline length
- To: Group finalising DDLm and associated dictionaries <[email protected]>
- Subject: Re: [ddlm-group] Handling single string values longer than maximumline length
- From: "Herbert J. Bernstein" <[email protected]>
- Date: Wed, 25 Nov 2009 22:18:30 -0500 (EST)
- In-Reply-To: <[email protected]>
- References: <[email protected]><[email protected]><[email protected]>
The advantage of choosing option 2 is that it is very easy to implement:
"\n;\\\n" flags line-folding. Inasmuch as the line folding protocol has
been in CIF for a long time, it is highly unlikely that we will encounter
a large number of other uses of text fields that start with "\n;\\\n".
Any procesor that ignore this convention does no harm -- they just
have a text field that begins with a backslash and a newline, and, if they
put that text field out that way, nothing has been lost.
For that reason, when the treble quote is introduced, I would suggest
exacty the same convention. A treble quoted string starting with
"\'\'\'\\\n"
or
"\"\"\"\\\n"
would be reserved for flagging the line-folding protocol, so that a
text string presented as a semi-colon delimited string or as a treble
quoted string would have the same meaning
Regards,
Herbert
=====================================================
Herbert J. Bernstein, Professor of Computer Science
Dowling College, Kramer Science Center, KSC 121
Idle Hour Blvd, Oakdale, NY, 11769
+1-631-244-3035
[email protected]
=====================================================
On Thu, 26 Nov 2009, James Hester wrote:
> The line-folding protocol is completely compatible with CIF1.0/1.1 syntax
> (ITG 2.2.7.4.11 para 3).� None of the syntax changes in CIF2 (so far) make
> any difference to that, so the line folding protocol could be adopted for
> CIF2.� We have three options:
>
> (1) As for CIF1: the protocol exists but support for it is optional.
> (2) The protocol is compulsory: the <line termination><semicolon><reverse
> solidus> sequence must be recognised
> (3) The availability of the protocol is selectable by dictionary writers via
> a predefined type.� In this case use of the protocol would still be optional
> for CIF data file writers, but CIF reading applications would be required to
> recognise the initiation sequence given in (2) (and any others we define).
>
> Note on (3) - this does not mean you have to read in a dictionary in order
> to process a data file.� It means that the programmer writing the CIF
> application has to read the dictionary when creating the software, which
> they do anyway.
>
> On Thu, Nov 26, 2009 at 8:08 AM, Joe Krahn <[email protected]> wrote:
> If we allow for line-folding to occur after parsing, and
> backslash (aka
> reverse solidus) is not used as an elide/escape character, then
> no
> special rules are needed to put line-folded strings within
> triple-quoted
> strings.
>
>
> Agreed.
>
> Semicolon-delimited strings are a special case because folding
> is
> considered in preventing a subsequent semicolon to be
> interpreted as
> beginning on a new line, and not accepted as the close-quote.
>
>
> Actually, this is not the case.� ITG sec 2.2.7.4.11 notes "The final line
> termination-semicolon sequence of a text field takes priority over the
> reassembly process and ends it, but a trailing backslash on the last line of
> a text field very nicely conveys the information that no trailing line
> termination is intended to be included within the character string".
> �
> However, if the lexer does not unfold long strings, this
> actually is an
> elide mechanism. The consensus seems to be heading towards no
> elide
> mechanism, so maybe breaking long lines just before a semicolon
> should
> simply be prohibited.
>
> ALSO, SOMETHING I JUST REALIZED: Line folding makes it possible
> to break
> an embedded triple-quote into two parts, so it actually provides
> a way
> to elide triple quotes indirectly within a large triple-quote
> string.
> Therefore, as long as CIF2 keeps line-folding, it is trivial to
> put
> CIF-within-CIF, or any other unrestricted string, without any of
> the
> reverse-solidus elide rules.
>
>
> Good point.� The line-folding protocol would need to be extended for triple
> quoted strings, as there is currently no defined way to signal that it is in
> operation.� Presumably the characters <quote><quote><quote><reverse-solidus>
> could signal that the line-folding protocol was operational.
> �
> Joe
>
> Brian McMahon wrote:
> >> (I've switched the thread title to deal separately with line
> folding.)
> >
> > Well, I didn't because I was distracted when about to hit the
> > 'Send' button! �So this is just a repeat of the previous posting but
> > under a new thread in case we wish to take up this general
> discussion
> > later.
> >
> > Regards
> > Brian
> >
> > As Herbert says, line folding is part of the CIF 1.1 spec (pages
> 34-35
> > of the ITG bible). Currently, it invokes a special meaning for the
> > backslash (reverse solidus) character, but only when it is the first
> > non-blank after an opening semicolon or comment hash delimiter. We
> have
> > yet to discuss whether to extend it to other string types
> (specifically
> > the triple-quoted strings).
> >
> > It's quite easy these days to generate single strings that are
> longer
> > than 2048 characters (or any other arbitrary line limit) - e.g. a
> > protein or nucleic acid sequence. Many, many chemical names broke
> the old
> > 80-character line length limit.
> >
> > We're very happy with CIF applications that do not interpret the
> > line-folding protocol, so long as they preserve the existing
> backslashes.
> > However, a fully-compliant CIF 1.1 parser should be able to return
> an
> > unfolded string to an application that requests it.
> >
> > As Herbert says, if this were dropped as part of the CIF2
> specification,
> > we would need to think carefully about how else to retain this
> > functionality.
> >
> > Regards
> > Brian
> >
> > On Wed, Nov 25, 2009 at 07:54:51AM -0500, Herbert J. Bernstein
> wrote:
> >> The line folding protocol was discussed and adopted by COMCIFS and
> is
> >> posted, aong with other "Common Semantic Features" at
> >>
> >> http://www.iucr.org/resources/cif/spec/version1.1/semantics
> >>
> >> but that is neither here nor there. �The point is that the IUCr
> uses CIF
> >> to get work done. �If we disable something they are using, we
> should offer
> >> some equivalent functionality so they can use CIF 2 to do their
> work.
> >> Otherwise, they will have to do the sensible thing, and continue to
> use
> >> CIF 1, or, worse, create their own dialect of CIF 2.
> >>
> >> Now, I broke my nose yesterday morning and find myself a bit punchy
> today,
> >> so I will drop out of this discussion for a while. �Hopefully, when
> I
> >> return to it, this whole matter will be settled in some way that
> will
> >> allow people to actually use CIF 2, instead of it becoming what it
> seems
> >> on its way to becoming -- something elegant but not terrible
> useful, a bit
> >> like PL/I.
> >>
> >> Cheers,
> >> � �Herbert
> >>
> >> =====================================================
> >> � Herbert J. Bernstein, Professor of Computer Science
> >> � � Dowling College, Kramer Science Center, KSC 121
> >> � � � � �Idle Hour Blvd, Oakdale, NY, 11769
> >>
> >> � � � � � � � � � +1-631-244-3035
> >> � � � � � � � � � [email protected]
> >> =====================================================
> > _______________________________________________
> > ddlm-group mailing list
> > [email protected]
> > http://scripts.iucr.org/mailman/listinfo/ddlm-group
> >
>
> _______________________________________________
> ddlm-group mailing list
> [email protected]
> http://scripts.iucr.org/mailman/listinfo/ddlm-group
>
>
>
>
> --
> T +61 (02) 9717 9907
> F +61 (02) 9717 3145
> M +61 (04) 0249 4148
>
>
_______________________________________________ ddlm-group mailing list [email protected] http://scripts.iucr.org/mailman/listinfo/ddlm-group
Reply to: [list | sender only]
- References:
- Prev by Date: Re: [ddlm-group] Handling single string values longer than maximumline length
- Next by Date: Re: [ddlm-group] Handling single string values longer than maximumline length
- Prev by thread: Re: [ddlm-group] Handling single string values longer than maximumline length
- Next by thread: Re: [ddlm-group] Handling single string values longer than maximumline length
- Index(es):

