[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Reply to: [list | sender only]
Re: [ddlm-group] Vote on BOM
- To: Group finalising DDLm and associated dictionaries <ddlm-group@iucr.org>
- Subject: Re: [ddlm-group] Vote on BOM
- From: Brian McMahon <bm@iucr.org>
- Date: Wed, 16 Jun 2010 10:56:20 +0100
- In-Reply-To: <AANLkTikPRP0zLmeWCde-UjR599qJBDP4ps8mpT2FB07E@mail.gmail.com>
- References: <AANLkTikPRP0zLmeWCde-UjR599qJBDP4ps8mpT2FB07E@mail.gmail.com>
My vote, in line with my "keep it simple/blunt" approach: 1(a) 2(a) 3(a) I understand many of the counter-arguments, and think that most other outcomes are also acceptable if properly documented. 2(c)(ii) and perhaps 2(d) might give rise in many naive rendering programs (e.g. older versions of "vi") to the appearance of whitespace in datanames, which would confuse many users, so I would be least happy with these outcomes. One can see from examples such as the W3C Working Group Note of Unicode in XML and other Markup Languages (section 3.5 of http://www.w3.org/TR/unicode-xml/ ) that we are not the only group struggling to express a clean formulation of this topic. The solution in that document is suggestive, but not necessarily applicable to CIF, which is not exactly a "markup" language. Regards brian On Wed, Jun 16, 2010 at 11:31:59AM +1000, James Hester wrote: > For clarity, by 'UTF8 BOM' I mean the byte sequence 0xEF,0xBB,0xBF, > which corresponds to Unicode code point 0xFEFF. A UCS2 BOM is the > byte sequence 0xFE, 0xFF or the reverse. > > Please indicate your preferred behaviour below. I have inserted mine already: > > 1. Treatment of UTF8 BOM as first three bytes of a CIF2 file > (a) Syntax error/Non CIF2 file > (b) UTF8-BOM followed by #\#CIF2.0 is a valid CIF2 magic number > James > 2. Treatment of UTF8 BOM in a CIF file, other than as the first three bytes: > (a) Always a syntax error > (b) Syntactic whitespace > (c) An ordinary character: > (i) May appear only in delimited data values and comments > James > (ii) May appear anywhere other ordinary characters can > appear (i.e. including datanames, datablock names etc.) > (d) Silently ignored > > 3. Treatment of UCS BOM in a CIF file > (a) Syntax error James > (b) Encoding switch _______________________________________________ ddlm-group mailing list ddlm-group@iucr.org http://scripts.iucr.org/mailman/listinfo/ddlm-group
Reply to: [list | sender only]
- Follow-Ups:
- Re: [ddlm-group] Vote on BOM (Herbert J. Bernstein)
- References:
- [ddlm-group] Vote on BOM (James Hester)
- Prev by Date: Re: [ddlm-group] UTF-8 BOM
- Next by Date: [ddlm-group] Handling of null byte in CIF2
- Prev by thread: [ddlm-group] Vote on BOM
- Next by thread: Re: [ddlm-group] Vote on BOM
- Index(es):