[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [ddlm-group] Vote on BOM

As far as I am aware, I do not have voting rights here, not formally being a member of the DDLm working group.  If I did have, these would be my votes (and feel free to count them anyway ;-) ):

>1. Treatment of UTF8 BOM as first three bytes of a CIF2 file
>    (a) Syntax error/Non CIF2 file
>    (b) UTF8-BOM followed by #\#CIF2.0 is a valid CIF2 magic number

I favor Herb's position that CIF2 should be defined as a Unicode text format, in which context encoding would be out of scope.  Thus an initial BOM should be allowed and handled by the decoder (or simply allowed by a parser that attempts to defer decoding).  This assumes that the processor supports UTF-8, which I would be satisfied to make a non-exclusive requirement on CIF2 processors.

(b), more or less.

>2. Treatment of UTF8 BOM in a CIF file, other than as the first three bytes:
>    (a) Always a syntax error
>    (b) Syntactic whitespace
>    (c) An ordinary character:
>          (i) May appear only in delimited data values and comments
>          (ii) May appear anywhere other ordinary characters can
>appear (i.e. including datanames, datablock names etc.)
>    (d) Silently ignored


>3. Treatment of UCS BOM in a CIF file
>   (a) Syntax error
>   (b) Encoding switch

Inasmuch as I favor defining CIF as a text format, these alternatives do not make sense, as they relate to encoding details.  I am against CIF requiring processors to support encoding schemes that provide for embedded encoding switches, but I am perfectly satisfied for CIF to *allow* processors to support such schemes.  That amounts to

(c) Encoding scheme dependent

John C. Bollinger, Ph.D.
Department of Structural Biology
St. Jude Children's Research Hospital

Email Disclaimer:  www.stjude.org/emaildisclaimer

ddlm-group mailing list

Reply to: [list | sender only]