Discussion List Archives

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [ddlm-group] Handling single string values longer than maximumline length

The advantage of choosing option 2 is that it is very easy to implement:
"\n;\\\n" flags line-folding.  Inasmuch as the line folding protocol has 
been in CIF for a long time, it is highly unlikely that we will encounter
a large number of other uses of text fields that start with "\n;\\\n".
Any procesor that ignore this convention does no harm -- they just
have a text field that begins with a backslash and a newline, and, if they
put that text field out that way, nothing has been lost.

For that reason, when the treble quote is introduced, I would suggest
exacty the same convention.  A treble quoted string starting with

   "\'\'\'\\\n"
or
   "\"\"\"\\\n"

would be reserved for flagging the line-folding protocol, so that a
text string presented as a semi-colon delimited string or as a treble
quoted string would have the same meaning

Regards,
   Herbert


=====================================================
  Herbert J. Bernstein, Professor of Computer Science
    Dowling College, Kramer Science Center, KSC 121
         Idle Hour Blvd, Oakdale, NY, 11769

                  +1-631-244-3035
                  yaya@dowling.edu
=====================================================

On Thu, 26 Nov 2009, James Hester wrote:

> The line-folding protocol is completely compatible with CIF1.0/1.1 syntax
> (ITG 2.2.7.4.11 para 3).  None of the syntax changes in CIF2 (so far) make
> any difference to that, so the line folding protocol could be adopted for
> CIF2.  We have three options:
> 
> (1) As for CIF1: the protocol exists but support for it is optional.
> (2) The protocol is compulsory: the <line termination><semicolon><reverse
> solidus> sequence must be recognised
> (3) The availability of the protocol is selectable by dictionary writers via
> a predefined type.  In this case use of the protocol would still be optional
> for CIF data file writers, but CIF reading applications would be required to
> recognise the initiation sequence given in (2) (and any others we define).
> 
> Note on (3) - this does not mean you have to read in a dictionary in order
> to process a data file.  It means that the programmer writing the CIF
> application has to read the dictionary when creating the software, which
> they do anyway.
> 
> On Thu, Nov 26, 2009 at 8:08 AM, Joe Krahn <krahn@niehs.nih.gov> wrote:
>       If we allow for line-folding to occur after parsing, and
>       backslash (aka
>       reverse solidus) is not used as an elide/escape character, then
>       no
>       special rules are needed to put line-folded strings within
>       triple-quoted
>       strings.
> 
> 
> Agreed.
>
>       Semicolon-delimited strings are a special case because folding
>       is
>       considered in preventing a subsequent semicolon to be
>       interpreted as
>       beginning on a new line, and not accepted as the close-quote.
> 
> 
> Actually, this is not the case.  ITG sec 2.2.7.4.11 notes "The final line
> termination-semicolon sequence of a text field takes priority over the
> reassembly process and ends it, but a trailing backslash on the last line of
> a text field very nicely conveys the information that no trailing line
> termination is intended to be included within the character string".
>  
>       However, if the lexer does not unfold long strings, this
>       actually is an
>       elide mechanism. The consensus seems to be heading towards no
>       elide
>       mechanism, so maybe breaking long lines just before a semicolon
>       should
>       simply be prohibited.
>
>       ALSO, SOMETHING I JUST REALIZED: Line folding makes it possible
>       to break
>       an embedded triple-quote into two parts, so it actually provides
>       a way
>       to elide triple quotes indirectly within a large triple-quote
>       string.
>       Therefore, as long as CIF2 keeps line-folding, it is trivial to
>       put
>       CIF-within-CIF, or any other unrestricted string, without any of
>       the
>       reverse-solidus elide rules.
> 
> 
> Good point.  The line-folding protocol would need to be extended for triple
> quoted strings, as there is currently no defined way to signal that it is in
> operation.  Presumably the characters <quote><quote><quote><reverse-solidus>
> could signal that the line-folding protocol was operational.
>  
>       Joe
> 
> Brian McMahon wrote:
> >> (I've switched the thread title to deal separately with line
> folding.)
> >
> > Well, I didn't because I was distracted when about to hit the
> > 'Send' button!  So this is just a repeat of the previous posting but
> > under a new thread in case we wish to take up this general
> discussion
> > later.
> >
> > Regards
> > Brian
> >
> > As Herbert says, line folding is part of the CIF 1.1 spec (pages
> 34-35
> > of the ITG bible). Currently, it invokes a special meaning for the
> > backslash (reverse solidus) character, but only when it is the first
> > non-blank after an opening semicolon or comment hash delimiter. We
> have
> > yet to discuss whether to extend it to other string types
> (specifically
> > the triple-quoted strings).
> >
> > It's quite easy these days to generate single strings that are
> longer
> > than 2048 characters (or any other arbitrary line limit) - e.g. a
> > protein or nucleic acid sequence. Many, many chemical names broke
> the old
> > 80-character line length limit.
> >
> > We're very happy with CIF applications that do not interpret the
> > line-folding protocol, so long as they preserve the existing
> backslashes.
> > However, a fully-compliant CIF 1.1 parser should be able to return
> an
> > unfolded string to an application that requests it.
> >
> > As Herbert says, if this were dropped as part of the CIF2
> specification,
> > we would need to think carefully about how else to retain this
> > functionality.
> >
> > Regards
> > Brian
> >
> > On Wed, Nov 25, 2009 at 07:54:51AM -0500, Herbert J. Bernstein
> wrote:
> >> The line folding protocol was discussed and adopted by COMCIFS and
> is
> >> posted, aong with other "Common Semantic Features" at
> >>
> >> http://www.iucr.org/resources/cif/spec/version1.1/semantics
> >>
> >> but that is neither here nor there.  The point is that the IUCr
> uses CIF
> >> to get work done.  If we disable something they are using, we
> should offer
> >> some equivalent functionality so they can use CIF 2 to do their
> work.
> >> Otherwise, they will have to do the sensible thing, and continue to
> use
> >> CIF 1, or, worse, create their own dialect of CIF 2.
> >>
> >> Now, I broke my nose yesterday morning and find myself a bit punchy
> today,
> >> so I will drop out of this discussion for a while.  Hopefully, when
> I
> >> return to it, this whole matter will be settled in some way that
> will
> >> allow people to actually use CIF 2, instead of it becoming what it
> seems
> >> on its way to becoming -- something elegant but not terrible
> useful, a bit
> >> like PL/I.
> >>
> >> Cheers,
> >>    Herbert
> >>
> >> =====================================================
> >>   Herbert J. Bernstein, Professor of Computer Science
> >>     Dowling College, Kramer Science Center, KSC 121
> >>          Idle Hour Blvd, Oakdale, NY, 11769
> >>
> >>                   +1-631-244-3035
> >>                   yaya@dowling.edu
> >> =====================================================
> > _______________________________________________
> > ddlm-group mailing list
> > ddlm-group@iucr.org
> > http://scripts.iucr.org/mailman/listinfo/ddlm-group
> >
> 
> _______________________________________________
> ddlm-group mailing list
> ddlm-group@iucr.org
> http://scripts.iucr.org/mailman/listinfo/ddlm-group
> 
> 
> 
> 
> --
> T +61 (02) 9717 9907
> F +61 (02) 9717 3145
> M +61 (04) 0249 4148
> 
>
_______________________________________________
ddlm-group mailing list
ddlm-group@iucr.org
http://scripts.iucr.org/mailman/listinfo/ddlm-group

Reply to: [list | sender only]
International Union of Crystallography

Scientific Union Member of the International Science Council (admitted 1947). Member of CODATA, the ISC Committee on Data. Partner with UNESCO, the United Nations Educational, Scientific and Cultural Organization in the International Year of Crystallography 2014.

International Science Council Scientific Freedom Policy

The IUCr observes the basic policy of non-discrimination and affirms the right and freedom of scientists to associate in international scientific activity without regard to such factors as ethnic origin, religion, citizenship, language, political stance, gender, sex or age, in accordance with the Statutes of the International Council for Science.