Discussion List Archives

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [ddlm-group] Handling single string values longer than maximumline length

I don't find the necessity for line folding a convincing argument, but so
long as I don't have to worry about it when parsing a file, I am not fussed.

Line-folding has to exist for an 80 byte restriction, because the
restriction is ludicrous. STAR has no restriction, CIFx has 2048 bytes
(still silly but imposed by outside factors). One may have data values
longer than 2048 (I have yet to see any), and sequencing data perhaps will
fall in to this category. But if John doesn't (seemingly) understand the
line folding issues, I am guessing the PDB doesn't employ it. If the
custodians of macromolecular data and (presumably) sequencing data have a
solution that does not require the convoluted line folding operations
specified on the IUCr website, then who does?

I see Joe has already made the mistake of thinking that
Xxxx\
;

Means the trailing ; is not a token delimiter. Well every other line-folding
convention would conclude that, but the IUCr interpretation is that the
trailing ; DOES terminate the string, and that last \ is actually stripping
off the final \n (which isn't there anyway because that got stripped off as
part of the lexing process -  the string terminators are supposed to be
removed).

OR I have completely misunderstood the line folding protocol and the example
on the IUCr webpage is wrong? I am not sure which.

Either way do we all agree that the line folding is not a lexer issue?

On 25/11/09 11:18 PM, "Brian McMahon" <bm@iucr.org> wrote:

>> (I've switched the thread title to deal separately with line folding.)
> 
> Well, I didn't because I was distracted when about to hit the
> 'Send' button!  So this is just a repeat of the previous posting but
> under a new thread in case we wish to take up this general discussion
> later.
> 
> Regards
> Brian
> 
> As Herbert says, line folding is part of the CIF 1.1 spec (pages 34-35
> of the ITG bible). Currently, it invokes a special meaning for the
> backslash (reverse solidus) character, but only when it is the first
> non-blank after an opening semicolon or comment hash delimiter. We have
> yet to discuss whether to extend it to other string types (specifically
> the triple-quoted strings).
> 
> It's quite easy these days to generate single strings that are longer
> than 2048 characters (or any other arbitrary line limit) - e.g. a
> protein or nucleic acid sequence. Many, many chemical names broke the old
> 80-character line length limit.
> 
> We're very happy with CIF applications that do not interpret the
> line-folding protocol, so long as they preserve the existing backslashes.
> However, a fully-compliant CIF 1.1 parser should be able to return an
> unfolded string to an application that requests it.
> 
> As Herbert says, if this were dropped as part of the CIF2 specification,
> we would need to think carefully about how else to retain this
> functionality.
> 
> Regards
> Brian
> 
> On Wed, Nov 25, 2009 at 07:54:51AM -0500, Herbert J. Bernstein wrote:
>> The line folding protocol was discussed and adopted by COMCIFS and is
>> posted, aong with other "Common Semantic Features" at
>> 
>> http://www.iucr.org/resources/cif/spec/version1.1/semantics
>> 
>> but that is neither here nor there.  The point is that the IUCr uses CIF
>> to get work done.  If we disable something they are using, we should offer
>> some equivalent functionality so they can use CIF 2 to do their work.
>> Otherwise, they will have to do the sensible thing, and continue to use
>> CIF 1, or, worse, create their own dialect of CIF 2.
>> 
>> Now, I broke my nose yesterday morning and find myself a bit punchy today,
>> so I will drop out of this discussion for a while.  Hopefully, when I
>> return to it, this whole matter will be settled in some way that will
>> allow people to actually use CIF 2, instead of it becoming what it seems
>> on its way to becoming -- something elegant but not terrible useful, a bit
>> like PL/I.
>> 
>> Cheers,
>>    Herbert
>> 
>> =====================================================
>>   Herbert J. Bernstein, Professor of Computer Science
>>     Dowling College, Kramer Science Center, KSC 121
>>          Idle Hour Blvd, Oakdale, NY, 11769
>> 
>>                   +1-631-244-3035
>>                   yaya@dowling.edu
>> =====================================================
> _______________________________________________
> ddlm-group mailing list
> ddlm-group@iucr.org
> http://scripts.iucr.org/mailman/listinfo/ddlm-group

cheers

Nick

--------------------------------
Associate Professor N. Spadaccini, PhD
School of Computer Science & Software Engineering

The University of Western Australia    t: +61 (0)8 6488 3452
35 Stirling Highway                    f: +61 (0)8 6488 1089
CRAWLEY, Perth,  WA  6009 AUSTRALIA   w3: www.csse.uwa.edu.au/~nick
MBDP  M002

CRICOS Provider Code: 00126G

e: Nick.Spadaccini@uwa.edu.au




_______________________________________________
ddlm-group mailing list
ddlm-group@iucr.org
http://scripts.iucr.org/mailman/listinfo/ddlm-group

Reply to: [list | sender only]
International Union of Crystallography

Scientific Union Member of the International Council for Science (admitted 1947). Member of CODATA, the ICSU Committee on Data. Member of ICSTI, the International Council for Scientific and Technical Information. Partner with UNESCO, the United Nations Educational, Scientific and Cultural Organization in the International Year of Crystallography 2014.

ICSU Scientific Freedom Policy

The IUCr observes the basic policy of non-discrimination and affirms the right and freedom of scientists to associate in international scientific activity without regard to such factors as ethnic origin, religion, citizenship, language, political stance, gender, sex or age, in accordance with the Statutes of the International Council for Science.