[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Reply to: [list | sender only]
Re: [ddlm-group] Vote on BOM
- To: Group finalising DDLm and associated dictionaries <ddlm-group@iucr.org>
- Subject: Re: [ddlm-group] Vote on BOM
- From: "Herbert J. Bernstein" <yaya@bernstein-plus-sons.com>
- Date: Wed, 16 Jun 2010 07:19:13 -0400 (EDT)
- In-Reply-To: <20100616095620.GB14815@emerald.iucr.org>
- References: <AANLkTikPRP0zLmeWCde-UjR599qJBDP4ps8mpT2FB07E@mail.gmail.com><20100616095620.GB14815@emerald.iucr.org>
Dear Colleagues, I vote for none of the false tricotomy presented. I vote for a CIF2 to be a text file containing its information as a sequence of valid printable unicode code points, however encoded, and that a BOM be treated as part of the encoding/decoding process, not as part of the information that has been encoded. This is similar to the original handling of nulls before C and the stdio got us all to become unclear about the distinction between text and binary, but even in the world of utf-8 streams, a null cannot be part of the text of a text file because it is the C-string terminator. I propose to treat the BOM with the same sort of caution. Regards, Herbert ===================================================== Herbert J. Bernstein, Professor of Computer Science Dowling College, Kramer Science Center, KSC 121 Idle Hour Blvd, Oakdale, NY, 11769 +1-631-244-3035 yaya@dowling.edu ===================================================== On Wed, 16 Jun 2010, Brian McMahon wrote: > My vote, in line with my "keep it simple/blunt" approach: > > 1(a) > 2(a) > 3(a) > > I understand many of the counter-arguments, and think that most other > outcomes are also acceptable if properly documented. 2(c)(ii) and perhaps > 2(d) might give rise in many naive rendering programs (e.g. older versions > of "vi") to the appearance of whitespace in datanames, which would confuse > many users, so I would be least happy with these outcomes. > > One can see from examples such as the W3C Working Group Note of > Unicode in XML and other Markup Languages (section 3.5 of > http://www.w3.org/TR/unicode-xml/ ) that we are not the only group > struggling to express a clean formulation of this topic. The solution > in that document is suggestive, but not necessarily applicable to CIF, > which is not exactly a "markup" language. > > Regards > brian > > On Wed, Jun 16, 2010 at 11:31:59AM +1000, James Hester wrote: >> For clarity, by 'UTF8 BOM' I mean the byte sequence 0xEF,0xBB,0xBF, >> which corresponds to Unicode code point 0xFEFF. A UCS2 BOM is the >> byte sequence 0xFE, 0xFF or the reverse. >> >> Please indicate your preferred behaviour below. I have inserted mine already: >> >> 1. Treatment of UTF8 BOM as first three bytes of a CIF2 file >> (a) Syntax error/Non CIF2 file >> (b) UTF8-BOM followed by #\#CIF2.0 is a valid CIF2 magic number >> James >> 2. Treatment of UTF8 BOM in a CIF file, other than as the first three bytes: >> (a) Always a syntax error >> (b) Syntactic whitespace >> (c) An ordinary character: >> (i) May appear only in delimited data values and comments >> James >> (ii) May appear anywhere other ordinary characters can >> appear (i.e. including datanames, datablock names etc.) >> (d) Silently ignored >> >> 3. Treatment of UCS BOM in a CIF file >> (a) Syntax error James >> (b) Encoding switch > _______________________________________________ > ddlm-group mailing list > ddlm-group@iucr.org > http://scripts.iucr.org/mailman/listinfo/ddlm-group >
_______________________________________________ ddlm-group mailing list ddlm-group@iucr.org http://scripts.iucr.org/mailman/listinfo/ddlm-group
Reply to: [list | sender only]
- Follow-Ups:
- Re: [ddlm-group] Vote on BOM (James Hester)
- References:
- [ddlm-group] Vote on BOM (James Hester)
- Re: [ddlm-group] Vote on BOM (Brian McMahon)
- Prev by Date: [ddlm-group] Handling of null byte in CIF2
- Next by Date: Re: [ddlm-group] UTF-8 BOM
- Prev by thread: Re: [ddlm-group] Vote on BOM
- Next by thread: Re: [ddlm-group] Vote on BOM
- Index(es):