[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [ddlm-group] Vote on BOM

To: Group finalising DDLm and associated dictionaries <[email protected]>
Subject: Re: [ddlm-group] Vote on BOM
From: SIMON WESTRIP <[email protected]>
Date: Fri, 18 Jun 2010 10:39:13 -0700 (PDT)
In-Reply-To: <8F77913624F7524AACD2A92EAF3BFA541661229515@SJMEMXMBS11.stjude.sjcrh.local>
References: <[email protected]><8F77913624F7524AACD2A92EAF3BFA541661229515@SJMEMXMBS11.stjude.sjcrh.local>

It seems that not all are agreed that CIF2 encoding is UTF-8. Multiple encodings would influence my vote on the UTF-8 BOM.

Simon

From: "Bollinger, John C" <[email protected]>
To: ddlm-group <[email protected]>
Sent: Friday, 18 June, 2010 16:33:01
Subject: Re: [ddlm-group] Vote on BOM

As far as I am aware, I do not have voting rights here, not formally being a member of the DDLm working group. If I did have, these would be my votes (and feel free to count them anyway ;-) ):

>1. Treatment of UTF8 BOM as first three bytes of a CIF2 file
> (a) Syntax error/Non CIF2 file
> (b) UTF8-BOM followed by #\#CIF2.0 is a valid CIF2 magic number

I favor Herb's position that CIF2 should be defined as a Unicode text format, in which context encoding would be out of scope. Thus an initial BOM should be allowed and handled by the decoder (or simply allowed by a parser that attempts to defer decoding). This assumes that the processor supports UTF-8, which I would be satisfied to make a non-exclusive requirement on CIF2 processors.

(b), more or less.

>2. Treatment of UTF8 BOM in a CIF file, other than as the first three bytes:
> (a) Always a syntax error
> (b) Syntactic whitespace
> (c) An ordinary character:
> (i) May appear only in delimited data values and comments
> (ii) May appear anywhere other ordinary characters can
>appear (i.e. including datanames, datablock names etc.)
> (d) Silently ignored

(c)(i)

>3. Treatment of UCS BOM in a CIF file
> (a) Syntax error
> (b) Encoding switch

Inasmuch as I favor defining CIF as a text format, these alternatives do not make sense, as they relate to encoding details. I am against CIF requiring processors to support encoding schemes that provide for embedded encoding switches, but I am perfectly satisfied for CIF to *allow* processors to support such schemes. That amounts to

(c) Encoding scheme dependent

John
--
John C. Bollinger, Ph.D.
Department of Structural Biology
St. Jude Children's Research Hospital

Email Disclaimer: www.stjude.org/emaildisclaimer

_______________________________________________
ddlm-group mailing list
[email protected]
http://scripts.iucr.org/mailman/listinfo/ddlm-group

_______________________________________________
ddlm-group mailing list
[email protected]
http://scripts.iucr.org/mailman/listinfo/ddlm-group

Reply to: [list | sender only]

Follow-Ups:

[ddlm-group] Character set for data block and save frame codes (Bollinger, John C)

References:

[ddlm-group] Vote on BOM (James Hester)

Re: [ddlm-group] Vote on BOM (Bollinger, John C)

Prev by Date: Re: [ddlm-group] options/text vs binary/end-of-line. .

Next by Date: [ddlm-group] Character set for data block and save frame codes

Prev by thread: Re: [ddlm-group] Vote on BOM

Next by thread: [ddlm-group] Character set for data block and save frame codes

Index(es):

Date

Thread

Discussion List Archives

Re: [ddlm-group] Vote on BOM