Discussion List Archives

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[ddlm-group] UTF-8 BOM

I realize that earlier there was an extended discussion on this group
about identification and / or declaration of character encodings,
including the topic of using a byte-order mark to identify some
encodings.  Rest assured that I do not wish to reopen that discussion.
I do, however, want to raise a related question: whether it is
acceptable for a CIF2 processor to accept and ignore a UTF-8 BOM
sequence (bytes 0xEF 0xBB 0xBF, the UTF-8 encoding of character
U+FEFF) at the beginning of a CIF.

Some text editors that support UTF-8 are known to ensure that
UTF-8-encoded files they write start with this sequence.  Inasmuch as
it seems a goal of this group to continue to support users editing
CIFs with general-purpose text editors, it therefore seems wise to me
that an initial BOM sequence be considered ignorable metadata in CIF2.
The alternative is for it to be an error, with the confusing result
that editing some CIF2-compliant CIFs with some programs will corrupt
the resulting file, whereas *either* using a different text editor or
editing a different CIF (for example, one that contains no non-ASCII
characters) works fine.

This suggested behavior would not require a CIF2 lexical scanner to
decode the BOM byte sequence to the corresponding character.  A
scanner operating directly on the raw byte stream can recognize and
handle the literal byte sequence almost as easily as one operating on
the corresponding decoded character stream could recognize and handle
the decoded character.

Best Regards,

John
--
John C. Bollinger, Ph.D.
Department of Structural Biology
St. Jude Children's Research Hospital



Email Disclaimer:  www.stjude.org/emaildisclaimer

_______________________________________________
ddlm-group mailing list
ddlm-group@iucr.org
http://scripts.iucr.org/mailman/listinfo/ddlm-group

Reply to: [list | sender only]
International Union of Crystallography

Scientific Union Member of the International Science Council (admitted 1947). Member of CODATA, the ISC Committee on Data. Partner with UNESCO, the United Nations Educational, Scientific and Cultural Organization in the International Year of Crystallography 2014.

International Science Council Scientific Freedom Policy

The IUCr observes the basic policy of non-discrimination and affirms the right and freedom of scientists to associate in international scientific activity without regard to such factors as ethnic origin, religion, citizenship, language, political stance, gender, sex or age, in accordance with the Statutes of the International Council for Science.