Discussion List Archives

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [ddlm-group] Handling single string values longer than maximumline length

Nick Spadaccini wrote:
> I don't find the necessity for line folding a convincing argument, but so
> long as I don't have to worry about it when parsing a file, I am not fussed.
> Line-folding has to exist for an 80 byte restriction, because the
> restriction is ludicrous. STAR has no restriction, CIFx has 2048 bytes
> (still silly but imposed by outside factors). One may have data values
> longer than 2048 (I have yet to see any), and sequencing data perhaps will
> fall in to this category. But if John doesn't (seemingly) understand the
> line folding issues, I am guessing the PDB doesn't employ it. If the
> custodians of macromolecular data and (presumably) sequencing data have a
> solution that does not require the convoluted line folding operations
> specified on the IUCr website, then who does?
I agree that folding just to avoid long lines is not that important. It 
is mostly a line-oriented I/O work-around, which some current Fortran 
software still needs for the near future. However, some people might 
want folded lines just to make it easier to view CIF files in an editor. 
  I am interested mainly because folding can be used to elide triple-quotes.
> I see Joe has already made the mistake of thinking that
> Xxxx\
> ;
> Means the trailing ; is not a token delimiter. Well every other line-folding
> convention would conclude that, but the IUCr interpretation is that the
> trailing ; DOES terminate the string, and that last \ is actually stripping
> off the final \n (which isn't there anyway because that got stripped off as
> part of the lexing process -  the string terminators are supposed to be
> removed).
> OR I have completely misunderstood the line folding protocol and the example
> on the IUCr webpage is wrong? I am not sure which.
I think you are right. was confused by Herb's example


which is the same as ";" in CIF 1.1. The middle semicolon is not a 
terminator due to the subsequent '\', because close-quotes are valid 
only if followed by whitespace. I didn't know that applied to semicolon 
delimited strings.

The current rule above does not make a lot of sense. How can \ strip off 
the \n when it is really part of the "\n;" close-quote characters? Maybe 
it was done to simplify a software issue?

> Either way do we all agree that the line folding is not a lexer issue?
I agree, but an implementation should be able to unfold/fold lines at 
the low-level I/O. The important point is to make sure the syntax is 
defined such that the lexer does not need to know about folding.


ddlm-group mailing list

Reply to: [list | sender only]
International Union of Crystallography

Scientific Union Member of the International Science Council (admitted 1947). Member of CODATA, the ISC Committee on Data. Partner with UNESCO, the United Nations Educational, Scientific and Cultural Organization in the International Year of Crystallography 2014.

International Science Council Scientific Freedom Policy

The IUCr observes the basic policy of non-discrimination and affirms the right and freedom of scientists to associate in international scientific activity without regard to such factors as ethnic origin, religion, citizenship, language, political stance, gender, sex or age, in accordance with the Statutes of the International Council for Science.