[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Reply to: [list | sender only]
[ddlm-group] UTF-8 BOM
- To: "'ddlm-group@iucr.org'" <ddlm-group@iucr.org>
- Subject: [ddlm-group] UTF-8 BOM
- From: "Bollinger, John C" <John.Bollinger@STJUDE.ORG>
- Date: Mon, 10 May 2010 10:31:19 -0500
- Accept-Language: en-US
- acceptlanguage: en-US
I realize that earlier there was an extended discussion on this group about identification and / or declaration of character encodings, including the topic of using a byte-order mark to identify some encodings. Rest assured that I do not wish to reopen that discussion. I do, however, want to raise a related question: whether it is acceptable for a CIF2 processor to accept and ignore a UTF-8 BOM sequence (bytes 0xEF 0xBB 0xBF, the UTF-8 encoding of character U+FEFF) at the beginning of a CIF. Some text editors that support UTF-8 are known to ensure that UTF-8-encoded files they write start with this sequence. Inasmuch as it seems a goal of this group to continue to support users editing CIFs with general-purpose text editors, it therefore seems wise to me that an initial BOM sequence be considered ignorable metadata in CIF2. The alternative is for it to be an error, with the confusing result that editing some CIF2-compliant CIFs with some programs will corrupt the resulting file, whereas *either* using a different text editor or editing a different CIF (for example, one that contains no non-ASCII characters) works fine. This suggested behavior would not require a CIF2 lexical scanner to decode the BOM byte sequence to the corresponding character. A scanner operating directly on the raw byte stream can recognize and handle the literal byte sequence almost as easily as one operating on the corresponding decoded character stream could recognize and handle the decoded character. Best Regards, John -- John C. Bollinger, Ph.D. Department of Structural Biology St. Jude Children's Research Hospital Email Disclaimer: www.stjude.org/emaildisclaimer _______________________________________________ ddlm-group mailing list ddlm-group@iucr.org http://scripts.iucr.org/mailman/listinfo/ddlm-group
Reply to: [list | sender only]
- Follow-Ups:
- Re: [ddlm-group] UTF-8 BOM (Herbert J. Bernstein)
- Prev by Date: [ddlm-group] [John.Bollinger@STJUDE.ORG: Re: Feedback on draft CIF2specification from JohnBollinger]
- Next by Date: Re: [ddlm-group] UTF-8 BOM
- Prev by thread: [ddlm-group] Questions about Methods
- Next by thread: Re: [ddlm-group] UTF-8 BOM
- Index(es):