Discussion List Archives

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [ddlm-group] Handling single string values longer than maximumline length

If we allow for line-folding to occur after parsing, and backslash (aka 
reverse solidus) is not used as an elide/escape character, then no 
special rules are needed to put line-folded strings within triple-quoted 
strings.

Semicolon-delimited strings are a special case because folding is 
considered in preventing a subsequent semicolon to be interpreted as 
beginning on a new line, and not accepted as the close-quote.

However, if the lexer does not unfold long strings, this actually is an 
elide mechanism. The consensus seems to be heading towards no elide 
mechanism, so maybe breaking long lines just before a semicolon should 
simply be prohibited.

ALSO, SOMETHING I JUST REALIZED: Line folding makes it possible to break 
an embedded triple-quote into two parts, so it actually provides a way 
to elide triple quotes indirectly within a large triple-quote string. 
Therefore, as long as CIF2 keeps line-folding, it is trivial to put 
CIF-within-CIF, or any other unrestricted string, without any of the 
reverse-solidus elide rules.

Joe

Brian McMahon wrote:
>> (I've switched the thread title to deal separately with line folding.)
> 
> Well, I didn't because I was distracted when about to hit the
> 'Send' button!  So this is just a repeat of the previous posting but
> under a new thread in case we wish to take up this general discussion
> later.
> 
> Regards
> Brian
> 
> As Herbert says, line folding is part of the CIF 1.1 spec (pages 34-35
> of the ITG bible). Currently, it invokes a special meaning for the
> backslash (reverse solidus) character, but only when it is the first
> non-blank after an opening semicolon or comment hash delimiter. We have
> yet to discuss whether to extend it to other string types (specifically
> the triple-quoted strings).
> 
> It's quite easy these days to generate single strings that are longer
> than 2048 characters (or any other arbitrary line limit) - e.g. a
> protein or nucleic acid sequence. Many, many chemical names broke the old
> 80-character line length limit.
> 
> We're very happy with CIF applications that do not interpret the
> line-folding protocol, so long as they preserve the existing backslashes.
> However, a fully-compliant CIF 1.1 parser should be able to return an
> unfolded string to an application that requests it.
> 
> As Herbert says, if this were dropped as part of the CIF2 specification,
> we would need to think carefully about how else to retain this
> functionality.
> 
> Regards
> Brian
> 
> On Wed, Nov 25, 2009 at 07:54:51AM -0500, Herbert J. Bernstein wrote:
>> The line folding protocol was discussed and adopted by COMCIFS and is
>> posted, aong with other "Common Semantic Features" at
>>
>> http://www.iucr.org/resources/cif/spec/version1.1/semantics
>>
>> but that is neither here nor there.  The point is that the IUCr uses CIF
>> to get work done.  If we disable something they are using, we should offer
>> some equivalent functionality so they can use CIF 2 to do their work.
>> Otherwise, they will have to do the sensible thing, and continue to use
>> CIF 1, or, worse, create their own dialect of CIF 2.
>>
>> Now, I broke my nose yesterday morning and find myself a bit punchy today,
>> so I will drop out of this discussion for a while.  Hopefully, when I
>> return to it, this whole matter will be settled in some way that will
>> allow people to actually use CIF 2, instead of it becoming what it seems
>> on its way to becoming -- something elegant but not terrible useful, a bit
>> like PL/I.
>>
>> Cheers,
>>    Herbert
>>
>> =====================================================
>>   Herbert J. Bernstein, Professor of Computer Science
>>     Dowling College, Kramer Science Center, KSC 121
>>          Idle Hour Blvd, Oakdale, NY, 11769
>>
>>                   +1-631-244-3035
>>                   yaya@dowling.edu
>> =====================================================
> _______________________________________________
> ddlm-group mailing list
> ddlm-group@iucr.org
> http://scripts.iucr.org/mailman/listinfo/ddlm-group
> 

_______________________________________________
ddlm-group mailing list
ddlm-group@iucr.org
http://scripts.iucr.org/mailman/listinfo/ddlm-group

Reply to: [list | sender only]
International Union of Crystallography

Scientific Union Member of the International Science Council (admitted 1947). Member of CODATA, the ISC Committee on Data. Partner with UNESCO, the United Nations Educational, Scientific and Cultural Organization in the International Year of Crystallography 2014.

International Science Council Scientific Freedom Policy

The IUCr observes the basic policy of non-discrimination and affirms the right and freedom of scientists to associate in international scientific activity without regard to such factors as ethnic origin, religion, citizenship, language, political stance, gender, sex or age, in accordance with the Statutes of the International Council for Science.