[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Reply to: [list | sender only]
Re: [ddlm-group] Vote on BOM
- To: Group finalising DDLm and associated dictionaries <ddlm-group@iucr.org>
- Subject: Re: [ddlm-group] Vote on BOM
- From: SIMON WESTRIP <simonwestrip@btinternet.com>
- Date: Fri, 18 Jun 2010 10:39:13 -0700 (PDT)
- In-Reply-To: <8F77913624F7524AACD2A92EAF3BFA541661229515@SJMEMXMBS11.stjude.sjcrh.local>
- References: <AANLkTikPRP0zLmeWCde-UjR599qJBDP4ps8mpT2FB07E@mail.gmail.com><8F77913624F7524AACD2A92EAF3BFA541661229515@SJMEMXMBS11.stjude.sjcrh.local>
It seems that not all are agreed that CIF2 encoding is UTF-8. Multiple encodings would influence my vote on the UTF-8 BOM.
Simon
From: "Bollinger, John C" <John.Bollinger@STJUDE.ORG>
To: ddlm-group <ddlm-group@iucr.org>
Sent: Friday, 18 June, 2010 16:33:01
Subject: Re: [ddlm-group] Vote on BOM
As far as I am aware, I do not have voting rights here, not formally being a member of the DDLm working group. If I did have, these would be my votes (and feel free to count them anyway ;-) ):
>1. Treatment of UTF8 BOM as first three bytes of a CIF2 file
> (a) Syntax error/Non CIF2 file
> (b) UTF8-BOM followed by #\#CIF2.0 is a valid CIF2 magic number
I favor Herb's position that CIF2 should be defined as a Unicode text format, in which context encoding would be out of scope. Thus an initial BOM should be allowed and handled by the decoder (or simply allowed by a parser that attempts to defer decoding). This assumes that the processor supports UTF-8, which I would be satisfied to make a non-exclusive requirement on CIF2 processors.
(b), more or less.
>2. Treatment of UTF8 BOM in a CIF file, other than as the first three bytes:
> (a) Always a syntax error
> (b) Syntactic whitespace
> (c) An ordinary character:
> (i) May appear only in delimited data values and comments
> (ii) May appear anywhere other ordinary characters can
>appear (i.e. including datanames, datablock names etc.)
> (d) Silently ignored
(c)(i)
>3. Treatment of UCS BOM in a CIF file
> (a) Syntax error
> (b) Encoding switch
Inasmuch as I favor defining CIF as a text format, these alternatives do not make sense, as they relate to encoding details. I am against CIF requiring processors to support encoding schemes that provide for embedded encoding switches, but I am perfectly satisfied for CIF to *allow* processors to support such schemes. That amounts to
(c) Encoding scheme dependent
John
--
John C. Bollinger, Ph.D.
Department of Structural Biology
St. Jude Children's Research Hospital
Email Disclaimer: www.stjude.org/emaildisclaimer
_______________________________________________
ddlm-group mailing list
ddlm-group@iucr.org
http://scripts.iucr.org/mailman/listinfo/ddlm-group
Simon
From: "Bollinger, John C" <John.Bollinger@STJUDE.ORG>
To: ddlm-group <ddlm-group@iucr.org>
Sent: Friday, 18 June, 2010 16:33:01
Subject: Re: [ddlm-group] Vote on BOM
As far as I am aware, I do not have voting rights here, not formally being a member of the DDLm working group. If I did have, these would be my votes (and feel free to count them anyway ;-) ):
>1. Treatment of UTF8 BOM as first three bytes of a CIF2 file
> (a) Syntax error/Non CIF2 file
> (b) UTF8-BOM followed by #\#CIF2.0 is a valid CIF2 magic number
I favor Herb's position that CIF2 should be defined as a Unicode text format, in which context encoding would be out of scope. Thus an initial BOM should be allowed and handled by the decoder (or simply allowed by a parser that attempts to defer decoding). This assumes that the processor supports UTF-8, which I would be satisfied to make a non-exclusive requirement on CIF2 processors.
(b), more or less.
>2. Treatment of UTF8 BOM in a CIF file, other than as the first three bytes:
> (a) Always a syntax error
> (b) Syntactic whitespace
> (c) An ordinary character:
> (i) May appear only in delimited data values and comments
> (ii) May appear anywhere other ordinary characters can
>appear (i.e. including datanames, datablock names etc.)
> (d) Silently ignored
(c)(i)
>3. Treatment of UCS BOM in a CIF file
> (a) Syntax error
> (b) Encoding switch
Inasmuch as I favor defining CIF as a text format, these alternatives do not make sense, as they relate to encoding details. I am against CIF requiring processors to support encoding schemes that provide for embedded encoding switches, but I am perfectly satisfied for CIF to *allow* processors to support such schemes. That amounts to
(c) Encoding scheme dependent
John
--
John C. Bollinger, Ph.D.
Department of Structural Biology
St. Jude Children's Research Hospital
Email Disclaimer: www.stjude.org/emaildisclaimer
_______________________________________________
ddlm-group mailing list
ddlm-group@iucr.org
http://scripts.iucr.org/mailman/listinfo/ddlm-group
_______________________________________________ ddlm-group mailing list ddlm-group@iucr.org http://scripts.iucr.org/mailman/listinfo/ddlm-group
Reply to: [list | sender only]
- Follow-Ups:
- [ddlm-group] Character set for data block and save frame codes (Bollinger, John C)
- References:
- [ddlm-group] Vote on BOM (James Hester)
- Re: [ddlm-group] Vote on BOM (Bollinger, John C)
- Prev by Date: Re: [ddlm-group] options/text vs binary/end-of-line. .
- Next by Date: [ddlm-group] Character set for data block and save frame codes
- Prev by thread: Re: [ddlm-group] Vote on BOM
- Next by thread: [ddlm-group] Character set for data block and save frame codes
- Index(es):