[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Reply to: [list | sender only]
Re: [ddlm-group] options/text vs binary/end-of-line. .. .. .. .. .
- To: Group finalising DDLm and associated dictionaries <ddlm-group@iucr.org>
- Subject: Re: [ddlm-group] options/text vs binary/end-of-line. .. .. .. .. .
- From: "Bollinger, John C" <John.Bollinger@STJUDE.ORG>
- Date: Wed, 23 Jun 2010 13:01:50 -0500
- Accept-Language: en-US
- acceptlanguage: en-US
- In-Reply-To: <alpine.BSF.2.00.1006231033360.56372@epsilon.pair.com>
- References: <AANLkTilyJE2mCxprlBYaSkysu1OBjY7otWrXDWm3oOT9@mail.gmail.com><alpine.BSF.2.00.1006212018430.91069@epsilon.pair.com><AANLkTilolZk4SzLF8mzqOz4EagFJcEHDKOAblGMnoqpW@mail.gmail.com><alpine.BSF.2.00.1006212120510.91069@epsilon.pair.com><AANLkTiklvzlKquqlRQIrpPGZjJfuRzLqiv2E6Stcq6wd@mail.gmail.com><alpine.BSF.2.00.1006212241210.4105@epsilon.pair.com><AANLkTilACXxnPRtJXEjGD39eleDl9dxlAcwar8j9MBPr@mail.gmail.com><alpine.BSF.2.00.1006220753471.87930@epsilon.pair.com><8F77913624F7524AACD2A92EAF3BFA54166122951E@SJMEMXMBS11.stjude.sjcrh.local><AANLkTikih0j6-vyLDPMOqcTkoiK545yE28y4fU9JTUa2@mail.gmail.com><20100623103310.GD15883@emerald.iucr.org><8F77913624F7524AACD2A92EAF3BFA541661229521@SJMEMXMBS11.stjude.sjcrh.local><alpine.BSF.2.00.1006231033360.56372@epsilon.pair.com>
On Wednesday, June 23, 2010 9:47 AM, Herbert J. Bernstein wrote: >If we impose a non-text canonical UTF-8 encoding that does not contain an >internal encoding signature, and that file is transmitted as text and >not binary from a machine for which, say, ASCII with code pages for, say, >western europe, is the native encoding, and the transmission converts >the UTF-8 charcaters as if they were accented characters in Latin-1, >then what is received may appear plausible at the receiving end, just >wrong. Surely that is a general issue with exchanging encoded text. It is not caused by designating a canonical encoding, and it would not be solved either by declining to designate a canonical encoding or by mandating UTF-8 as the only allowed encoding. >Therefore, I would suggest that we be very careful to make such a >canonical UTF-8 cif self identifying, by including not only a BOM, >but by adding some text in the range of #x128-#x254 to the magic >number to help in detecting such unintended transmission conversions. It would definitely ease encoding detection / correction if the magic number contained non-ASCII characters. Doing so, however, either will require CIF2 to be a hybrid binary/text format, or will effectively restrict CIF to be used only with encodings that support the chosen characters. (Or am I missing something?) I disfavor the former, and I think the latter is a serious restriction indeed. >In addition, I would suggest that, just as the first line of an XML >document specifies its encoding in plain text, that we add the same >information to our magic number. I have been giving some consideration to exactly that possibility. It works for all encodings that are supersets of ASCII. Other encodings would need to be detected some other way (e.g. byte-order mark, analysis of the encoded magic number), but they are not at such risk of encoding confusion. The signature of a CIF2 might then be something like these: #\#CIF_2.0 #\#CIF_2.0:UTF-8 #\#CIF_2.0:KOI8-R #\#CIF_2.0:ISO-8859-1 where the first two mean the same thing. If we do choose to not require UTF-8 then I favor this approach. John -- John C. Bollinger, Ph.D. Department of Structural Biology St. Jude Children's Research Hospital Email Disclaimer: www.stjude.org/emaildisclaimer _______________________________________________ ddlm-group mailing list ddlm-group@iucr.org http://scripts.iucr.org/mailman/listinfo/ddlm-group
Reply to: [list | sender only]
- Follow-Ups:
- Re: [ddlm-group] options/text vs binary/end-of-line. .. .. .. .. . (Herbert J. Bernstein)
- References:
- Re: [ddlm-group] options/text vs binary/end-of-line. .. . (James Hester)
- Re: [ddlm-group] options/text vs binary/end-of-line. .. . (Herbert J. Bernstein)
- Re: [ddlm-group] options/text vs binary/end-of-line. .. . (James Hester)
- Re: [ddlm-group] options/text vs binary/end-of-line. .. . (Herbert J. Bernstein)
- Re: [ddlm-group] options/text vs binary/end-of-line. .. . (James Hester)
- Re: [ddlm-group] options/text vs binary/end-of-line. .. . (Herbert J. Bernstein)
- Re: [ddlm-group] options/text vs binary/end-of-line. .. . (James Hester)
- Re: [ddlm-group] options/text vs binary/end-of-line. .. . (Herbert J. Bernstein)
- Re: [ddlm-group] options/text vs binary/end-of-line. .. .. . (Bollinger, John C)
- Re: [ddlm-group] options/text vs binary/end-of-line. .. .. . (James Hester)
- Re: [ddlm-group] options/text vs binary/end-of-line. .. .. . (Brian McMahon)
- Re: [ddlm-group] options/text vs binary/end-of-line. .. .. .. . (Bollinger, John C)
- Re: [ddlm-group] options/text vs binary/end-of-line. .. .. .. . (Herbert J. Bernstein)
- Prev by Date: Re: [ddlm-group] options/text vs binary/end-of-line. .. .. .. .
- Next by Date: Re: [ddlm-group] options/text vs binary/end-of-line. .. .. .. .. .
- Prev by thread: Re: [ddlm-group] options/text vs binary/end-of-line. .. .. .. .
- Next by thread: Re: [ddlm-group] options/text vs binary/end-of-line. .. .. .. .. .
- Index(es):