[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Reply to: [list | sender only]
Re: [ddlm-group] options/text vs binary/end-of-line. .. .. .. .... .
- To: Group finalising DDLm and associated dictionaries <ddlm-group@iucr.org>
- Subject: Re: [ddlm-group] options/text vs binary/end-of-line. .. .. .. .... .
- From: "Bollinger, John C" <John.Bollinger@STJUDE.ORG>
- Date: Wed, 23 Jun 2010 14:31:55 -0500
- Accept-Language: en-US
- acceptlanguage: en-US
- In-Reply-To: <alpine.BSF.2.00.1006231406010.30894@epsilon.pair.com>
- References: <AANLkTilyJE2mCxprlBYaSkysu1OBjY7otWrXDWm3oOT9@mail.gmail.com><alpine.BSF.2.00.1006212018430.91069@epsilon.pair.com><AANLkTilolZk4SzLF8mzqOz4EagFJcEHDKOAblGMnoqpW@mail.gmail.com><alpine.BSF.2.00.1006212120510.91069@epsilon.pair.com><AANLkTiklvzlKquqlRQIrpPGZjJfuRzLqiv2E6Stcq6wd@mail.gmail.com><alpine.BSF.2.00.1006212241210.4105@epsilon.pair.com><AANLkTilACXxnPRtJXEjGD39eleDl9dxlAcwar8j9MBPr@mail.gmail.com><alpine.BSF.2.00.1006220753471.87930@epsilon.pair.com><8F77913624F7524AACD2A92EAF3BFA54166122951E@SJMEMXMBS11.stjude.sjcrh.local><AANLkTikih0j6-vyLDPMOqcTkoiK545yE28y4fU9JTUa2@mail.gmail.com><20100623103310.GD15883@emerald.iucr.org><8F77913624F7524AACD2A92EAF3BFA541661229521@SJMEMXMBS11.stjude.sjcrh.local><alpine.BSF.2.00.1006231033360.56372@epsilon.pair.com><8F77913624F7524AACD2A92EAF3BFA541661229523@SJMEMXMBS11.stjude.sjcrh.local><alpine.BSF.2.00.1006231406010.30894@epsilon.pair.com>
On Wednesday, June 23, 2010 1:36 PM, Herbert J. Bernstein wrote: >All that is required to avoid the trap of unintended text transformations >from UTF-8 as if it were, say, Latin 1, is to add any string from the >Latin 1 supplement of the Unicode BMP. I would suggest > :#x00F2#x00F3#x00F4#x00F5#x00F6: >which as utf8 would be > >:#x00c3#x00b2#x00c3#x00b3#x00c3#x00b4#x00c3#x00b5#x00c3#0x00b6 > >which would come out as 5 accented lower case o's running through the >full set of accents if transmitted correctly, but as >capital A-tides alternating with SUPERSCRIPT TWO, SUPERSCRIPT THREE, >ACUTE ACCENT, MICRO SIGN, PILCROW SIGN in the most likely mis-transmission >of a UTF8 file as a Latin-1 file. And similarly, it would come out as a different sequence of characters if the stream were misinterpreted according to a different wrong encoding. So far so good. That's fine when the true encoding is UTF-8, UTF-16, or any other in which characters U+00F2 - U+00F6 are representable. It is inapplicable, however, when the true encoding is any of the many in which those characters are not representable, such as KOI8-R, many of the ISO-8859-x series, and as I understand it, most or all of the encodings specific to east Asian text (which generally do, whether formally or informally, incorporate ASCII as a subset, and are thus potentially suitable for CIF). >Let us call that the code-point sequence #x00F2#x00F3#x00F4#x00F5#x00F6 >the transmission check <tc>. Then the proposed magic number would be > >#\#CIF_2.0:<encoding>:<tc>: > >Both the encoding and the tc would be optional, but highly recommended. >This might not allow fully automated decoding, but it would at least >provide a decent error check for many of the most common cases that >cause trouble, and would actually give us an edge over the XML >convention (which only give th encoding) in terms of reliability. I am not opposed to the transmission check idea, but if something more generally applicable could be found then I would prefer it. Nevertheless, in conjunction with UTF-8 as a canonical CIF2 representation, the transmission check would have wide applicability, especially in those areas where encoding mismatches are most likely to occur. Regards, John -- John C. Bollinger, Ph.D. Department of Structural Biology St. Jude Children's Research Hospital Email Disclaimer: www.stjude.org/emaildisclaimer _______________________________________________ ddlm-group mailing list ddlm-group@iucr.org http://scripts.iucr.org/mailman/listinfo/ddlm-group
Reply to: [list | sender only]
- Follow-Ups:
- Re: [ddlm-group] options/text vs binary/end-of-line. .. .. .. .. ... (Herbert J. Bernstein)
- References:
- Re: [ddlm-group] options/text vs binary/end-of-line. .. . (James Hester)
- Re: [ddlm-group] options/text vs binary/end-of-line. .. . (Herbert J. Bernstein)
- Re: [ddlm-group] options/text vs binary/end-of-line. .. . (James Hester)
- Re: [ddlm-group] options/text vs binary/end-of-line. .. . (Herbert J. Bernstein)
- Re: [ddlm-group] options/text vs binary/end-of-line. .. . (James Hester)
- Re: [ddlm-group] options/text vs binary/end-of-line. .. . (Herbert J. Bernstein)
- Re: [ddlm-group] options/text vs binary/end-of-line. .. . (James Hester)
- Re: [ddlm-group] options/text vs binary/end-of-line. .. . (Herbert J. Bernstein)
- Re: [ddlm-group] options/text vs binary/end-of-line. .. .. . (Bollinger, John C)
- Re: [ddlm-group] options/text vs binary/end-of-line. .. .. . (James Hester)
- Re: [ddlm-group] options/text vs binary/end-of-line. .. .. . (Brian McMahon)
- Re: [ddlm-group] options/text vs binary/end-of-line. .. .. .. . (Bollinger, John C)
- Re: [ddlm-group] options/text vs binary/end-of-line. .. .. .. . (Herbert J. Bernstein)
- Re: [ddlm-group] options/text vs binary/end-of-line. .. .. .. .. . (Bollinger, John C)
- Re: [ddlm-group] options/text vs binary/end-of-line. .. .. .. .. . (Herbert J. Bernstein)
- Prev by Date: Re: [ddlm-group] options/text vs binary/end-of-line. .. .. .. .. .
- Next by Date: Re: [ddlm-group] options/text vs binary/end-of-line. .. .. .. .. ...
- Prev by thread: Re: [ddlm-group] options/text vs binary/end-of-line. .. .. .. .. .
- Next by thread: Re: [ddlm-group] options/text vs binary/end-of-line. .. .. .. .. ...
- Index(es):