[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Reply to: [list | sender only]
Re: [ddlm-group] Vote on BOM
- To: Group finalising DDLm and associated dictionaries <[email protected]>
- Subject: Re: [ddlm-group] Vote on BOM
- From: "Herbert J. Bernstein" <[email protected]>
- Date: Wed, 16 Jun 2010 07:19:13 -0400 (EDT)
- In-Reply-To: <[email protected]>
- References: <[email protected]><[email protected]>
Dear Colleagues,
I vote for none of the false tricotomy presented. I vote for
a CIF2 to be a text file containing its information as a sequence of
valid printable unicode code points, however encoded, and that a BOM be
treated as part of the encoding/decoding process, not as part of the
information that has been encoded.
This is similar to the original handling of nulls before C and the
stdio got us all to become unclear about the distinction between
text and binary, but even in the world of utf-8 streams, a null cannot
be part of the text of a text file because it is the C-string terminator.
I propose to treat the BOM with the same sort of caution.
Regards,
Herbert
=====================================================
Herbert J. Bernstein, Professor of Computer Science
Dowling College, Kramer Science Center, KSC 121
Idle Hour Blvd, Oakdale, NY, 11769
+1-631-244-3035
[email protected]
=====================================================
On Wed, 16 Jun 2010, Brian McMahon wrote:
> My vote, in line with my "keep it simple/blunt" approach:
>
> 1(a)
> 2(a)
> 3(a)
>
> I understand many of the counter-arguments, and think that most other
> outcomes are also acceptable if properly documented. 2(c)(ii) and perhaps
> 2(d) might give rise in many naive rendering programs (e.g. older versions
> of "vi") to the appearance of whitespace in datanames, which would confuse
> many users, so I would be least happy with these outcomes.
>
> One can see from examples such as the W3C Working Group Note of
> Unicode in XML and other Markup Languages (section 3.5 of
> http://www.w3.org/TR/unicode-xml/ ) that we are not the only group
> struggling to express a clean formulation of this topic. The solution
> in that document is suggestive, but not necessarily applicable to CIF,
> which is not exactly a "markup" language.
>
> Regards
> brian
>
> On Wed, Jun 16, 2010 at 11:31:59AM +1000, James Hester wrote:
>> For clarity, by 'UTF8 BOM' I mean the byte sequence 0xEF,0xBB,0xBF,
>> which corresponds to Unicode code point 0xFEFF.� A UCS2 BOM is the
>> byte sequence 0xFE, 0xFF or the reverse.
>>
>> Please indicate your preferred behaviour below.� I have inserted mine already:
>>
>> 1.�Treatment of UTF8 BOM as first three bytes of a CIF2 file
>> (a) Syntax error/Non CIF2 file
>> (b) UTF8-BOM followed by #\#CIF2.0 is a valid CIF2 magic number
>> James
>> 2. Treatment of UTF8 BOM in a CIF file, other than as the first three bytes:
>> (a) Always a syntax error
>> (b) Syntactic whitespace
>> (c) An ordinary character:
>> (i) May appear only in delimited data values and comments
>> James
>> (ii) May appear anywhere other ordinary characters can
>> appear (i.e. including datanames, datablock names etc.)
>> (d) Silently ignored
>>
>> 3. Treatment of UCS BOM in a CIF file
>> (a) Syntax error James
>> (b) Encoding switch
> _______________________________________________
> ddlm-group mailing list
> [email protected]
> http://scripts.iucr.org/mailman/listinfo/ddlm-group
>
_______________________________________________ ddlm-group mailing list [email protected] http://scripts.iucr.org/mailman/listinfo/ddlm-group
Reply to: [list | sender only]
- Follow-Ups:
- Re: [ddlm-group] Vote on BOM (James Hester)
- References:
- [ddlm-group] Vote on BOM (James Hester)
- Re: [ddlm-group] Vote on BOM (Brian McMahon)
- Prev by Date: [ddlm-group] Handling of null byte in CIF2
- Next by Date: Re: [ddlm-group] UTF-8 BOM
- Prev by thread: Re: [ddlm-group] Vote on BOM
- Next by thread: Re: [ddlm-group] Vote on BOM
- Index(es):

