On Monday, June 21, 2010 1:13 AM, James Hester wrote:

>I prefer the XML treatment of newline (ie translated to 0x000A for
>processing purposes).  I would be in favour of restricting newline to
><0x000A>, <0x000D> or <0x000D 0x000A>, which means that only these
>combinations have the syntactic significance of a newline.

I would be satisfied with that approach.

> From
>memory, this significance is restricted to:
>1. end of comment
>2. whitespace
>3. use in <eol><semicolon> digraph

The significance also extends to 'single'- and "double"-quote delimited data values, in that these cannot contain end-of-line.

>I would also restrict the appearance of the remaining Unicode newline
>characters to delimited datavalues, to maintain consistent display of
>data files.

I'm seeing more and more upside to restricting *all* non-ASCII characters to delimited data values.  I don't have any objection to restricting U+0085, U+2028, and U+2029 (did I miss any?) to such contexts.

John C. Bollinger, Ph.D.
Department of Structural Biology
St. Jude Children's Research Hospital

