Discussion List Archives

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [ddlm-group] Handling single string values longer than maximumline length

The line-folding protocol is completely compatible with CIF1.0/1.1 syntax (ITG 2.2.7.4.11 para 3).  None of the syntax changes in CIF2 (so far) make any difference to that, so the line folding protocol could be adopted for CIF2.  We have three options:

(1) As for CIF1: the protocol exists but support for it is optional.
(2) The protocol is compulsory: the <line termination><semicolon><reverse solidus> sequence must be recognised
(3) The availability of the protocol is selectable by dictionary writers via a predefined type.  In this case use of the protocol would still be optional for CIF data file writers, but CIF reading applications would be required to recognise the initiation sequence given in (2) (and any others we define).

Note on (3) - this does not mean you have to read in a dictionary in order to process a data file.  It means that the programmer writing the CIF application has to read the dictionary when creating the software, which they do anyway.

On Thu, Nov 26, 2009 at 8:08 AM, Joe Krahn <krahn@niehs.nih.gov> wrote:
If we allow for line-folding to occur after parsing, and backslash (aka
reverse solidus) is not used as an elide/escape character, then no
special rules are needed to put line-folded strings within triple-quoted
strings.

Agreed.

Semicolon-delimited strings are a special case because folding is
considered in preventing a subsequent semicolon to be interpreted as
beginning on a new line, and not accepted as the close-quote.

Actually, this is not the case.  ITG sec 2.2.7.4.11 notes "The final line termination-semicolon sequence of a text field takes priority over the reassembly process and ends it, but a trailing backslash on the last line of a text field very nicely conveys the information that no trailing line termination is intended to be included within the character string".
 
However, if the lexer does not unfold long strings, this actually is an
elide mechanism. The consensus seems to be heading towards no elide
mechanism, so maybe breaking long lines just before a semicolon should
simply be prohibited.

ALSO, SOMETHING I JUST REALIZED: Line folding makes it possible to break
an embedded triple-quote into two parts, so it actually provides a way
to elide triple quotes indirectly within a large triple-quote string.
Therefore, as long as CIF2 keeps line-folding, it is trivial to put
CIF-within-CIF, or any other unrestricted string, without any of the
reverse-solidus elide rules.

Good point.  The line-folding protocol would need to be extended for triple quoted strings, as there is currently no defined way to signal that it is in operation.  Presumably the characters <quote><quote><quote><reverse-solidus> could signal that the line-folding protocol was operational.
 
Joe

Brian McMahon wrote:
>> (I've switched the thread title to deal separately with line folding.)
>
> Well, I didn't because I was distracted when about to hit the
> 'Send' button!  So this is just a repeat of the previous posting but
> under a new thread in case we wish to take up this general discussion
> later.
>
> Regards
> Brian
>
> As Herbert says, line folding is part of the CIF 1.1 spec (pages 34-35
> of the ITG bible). Currently, it invokes a special meaning for the
> backslash (reverse solidus) character, but only when it is the first
> non-blank after an opening semicolon or comment hash delimiter. We have
> yet to discuss whether to extend it to other string types (specifically
> the triple-quoted strings).
>
> It's quite easy these days to generate single strings that are longer
> than 2048 characters (or any other arbitrary line limit) - e.g. a
> protein or nucleic acid sequence. Many, many chemical names broke the old
> 80-character line length limit.
>
> We're very happy with CIF applications that do not interpret the
> line-folding protocol, so long as they preserve the existing backslashes.
> However, a fully-compliant CIF 1.1 parser should be able to return an
> unfolded string to an application that requests it.
>
> As Herbert says, if this were dropped as part of the CIF2 specification,
> we would need to think carefully about how else to retain this
> functionality.
>
> Regards
> Brian
>
> On Wed, Nov 25, 2009 at 07:54:51AM -0500, Herbert J. Bernstein wrote:
>> The line folding protocol was discussed and adopted by COMCIFS and is
>> posted, aong with other "Common Semantic Features" at
>>
>> http://www.iucr.org/resources/cif/spec/version1.1/semantics
>>
>> but that is neither here nor there.  The point is that the IUCr uses CIF
>> to get work done.  If we disable something they are using, we should offer
>> some equivalent functionality so they can use CIF 2 to do their work.
>> Otherwise, they will have to do the sensible thing, and continue to use
>> CIF 1, or, worse, create their own dialect of CIF 2.
>>
>> Now, I broke my nose yesterday morning and find myself a bit punchy today,
>> so I will drop out of this discussion for a while.  Hopefully, when I
>> return to it, this whole matter will be settled in some way that will
>> allow people to actually use CIF 2, instead of it becoming what it seems
>> on its way to becoming -- something elegant but not terrible useful, a bit
>> like PL/I.
>>
>> Cheers,
>>    Herbert
>>
>> =====================================================
>>   Herbert J. Bernstein, Professor of Computer Science
>>     Dowling College, Kramer Science Center, KSC 121
>>          Idle Hour Blvd, Oakdale, NY, 11769
>>
>>                   +1-631-244-3035
>>                   yaya@dowling.edu
>> =====================================================
> _______________________________________________
> ddlm-group mailing list
> ddlm-group@iucr.org
> http://scripts.iucr.org/mailman/listinfo/ddlm-group
>

_______________________________________________
ddlm-group mailing list
ddlm-group@iucr.org
http://scripts.iucr.org/mailman/listinfo/ddlm-group



--
T +61 (02) 9717 9907
F +61 (02) 9717 3145
M +61 (04) 0249 4148
_______________________________________________
ddlm-group mailing list
ddlm-group@iucr.org
http://scripts.iucr.org/mailman/listinfo/ddlm-group

Reply to: [list | sender only]
International Union of Crystallography

Scientific Union Member of the International Science Council (admitted 1947). Member of CODATA, the ISC Committee on Data. Partner with UNESCO, the United Nations Educational, Scientific and Cultural Organization in the International Year of Crystallography 2014.

International Science Council Scientific Freedom Policy

The IUCr observes the basic policy of non-discrimination and affirms the right and freedom of scientists to associate in international scientific activity without regard to such factors as ethnic origin, religion, citizenship, language, political stance, gender, sex or age, in accordance with the Statutes of the International Council for Science.