[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Reply to: [list | sender only]
Re: [ddlm-group] options/text vs binary/end-of-line. .. .. .
- To: "'Group finalising DDLm and associated dictionaries'" <ddlm-group@iucr.org>
- Subject: Re: [ddlm-group] options/text vs binary/end-of-line. .. .. .
- From: "Bollinger, John C" <John.Bollinger@STJUDE.ORG>
- Date: Tue, 22 Jun 2010 12:25:10 -0500
- Accept-Language: en-US
- acceptlanguage: en-US
- In-Reply-To: <268424.95799.qm@web87015.mail.ird.yahoo.com>
- References: <alpine.BSF.2.00.1005111250250.60002@epsilon.pair.com><alpine.BSF.2.00.1006172025070.91418@epsilon.pair.com><AANLkTimEn-5bOcLNsa1DSOjDS7XqFmqVKA-W-6Z4NxFO@mail.gmail.com><alpine.BSF.2.00.1006172107430.91418@epsilon.pair.com><AANLkTilJUtXpw5UFQv0Y04Knrv9wCPLr5eertWPCcTzz@mail.gmail.com><alpine.BSF.2.00.1006180703230.91255@epsilon.pair.com><alpine.BSF.2.00.1006180837330.91255@epsilon.pair.com><AANLkTildS0DVEj76rffd8sgXgno2INL8zkXI_qsBjSLP@mail.gmail.com><a06240803c845518a843e@192.168.2.104><AANLkTilyJE2mCxprlBYaSkysu1OBjY7otWrXDWm3oOT9@mail.gmail.com><alpine.BSF.2.00.1006212018430.91069@epsilon.pair.com><AANLkTilolZk4SzLF8mzqOz4EagFJcEHDKOAblGMnoqpW@mail.gmail.com><alpine.BSF.2.00.1006212120510.91069@epsilon.pair.com><AANLkTiklvzlKquqlRQIrpPGZjJfuRzLqiv2E6Stcq6wd@mail.gmail.com><alpine.BSF.2.00.1006212241210.4105@epsilon.pair.com><AANLkTilACXxnPRtJXEjGD39eleDl9dxlAcwar8j9MBPr@mail.gmail.c! om><4C20C4C5.1040800@mcmaster.ca> <alpine.BSF.2.00.1006221021330.3911@epsilon.pair.com><268424.95799.qm@web87015.mail.ird.yahoo.com>
On Tuesday, June 22, 2010 11:06 AM, SIMON WESTRIP wrote: >CIF may currently be handled with multiple encodings, but as its restricted to ASCII, the >encoding issue hasn't really been relevent - most code pages include the ASCII code points? It is a common feature of many encodings to be congruent with 7-bit ASCII over its range, but that is not universal. UTF-16 and UTF-32, for example, are not congruent with ASCII anywhere. Neither is EBCDIC. Shift-JIS is mostly congruent with ASCII, but varies at two code points. >If CIF2 is also to allow multiple encodings, it is quite possible that a basic text editor will not render the content >appropriately for anything outside the ASCII range if it is unable to determine the encoding (it may not even attempt to >determine the encoding - my linux text editors aren't very good at autodetection - I don't know about windows notepad, >but last time I looked it couldn't even interpret linux line endings appropriately). Indeed. I believe some basic text editors will assume that any file presented to them uses the host's default encoding. In many cases that is not UTF-8, so selecting UTF-8 as the only CIF encoding does not promote CIF interoperability with those particular programs. >In the absence of a BOM, the only solution is to use an heuristic approach to determine the encoding? Not necessarily. If the data are delivered via web form or other HTTP-based method, for example, then the HTTP protocol provides support for specifying the encoding. Similarly, if the file is delivered as part of a MIME multipart message, then the content type specified by its MIME headers can express the encoding. >Such heuristics would also have to be applied in order to process the CIF (which I'd already decided I will have to do >because of the likelihood of receiving non-UTF8 CIF2's) Were I in your shoes, I would plan to transcode non-UTF-8 CIFs to UTF-8 upon receipt, as part of the verification process. I would store only the UTF-8 version; thereafter, no worries. One of the advantages of defining CIF2 as an encoding-independent text format would be that doing as I describe would preserve the original *CIF* data (i.e. the text) with 100% fidelity, even though it might not preserve the exact byte stream. >So I still beleive that as a *standard* we should specify UTF8. > >However, that does not mean that we cannot be tolerant of other encodings? >If a system exists that processes all its CIFs in a different encoding, I see no reason for it to change - >only when the CIF is to be made publically available should it be converted to UTF-8. >Likewise, if such a system is capable of handling current CIFs, surely it will manage UTF-8 CIFs with >little overhead? Afterall, CIF2 is going to be different from CIF1. This nicely captures my point about the CIF data format vs. CIF storage and interchange. UTF-8 can very easily be a standard for CIF interchange -- perhaps the only standard -- without conflating that with the CIF data format. Cheers, John -- John C. Bollinger, Ph.D. Department of Structural Biology St. Jude Children's Research Hospital Email Disclaimer: www.stjude.org/emaildisclaimer _______________________________________________ ddlm-group mailing list ddlm-group@iucr.org http://scripts.iucr.org/mailman/listinfo/ddlm-group
Reply to: [list | sender only]
- References:
- Re: [ddlm-group] [SPAM] ASSP UTF-8 BOM (Herbert J. Bernstein)
- Re: [ddlm-group] UTF-8 BOM (Herbert J. Bernstein)
- Re: [ddlm-group] UTF-8 BOM (James Hester)
- Re: [ddlm-group] UTF-8 BOM (Herbert J. Bernstein)
- Re: [ddlm-group] UTF-8 BOM (James Hester)
- Re: [ddlm-group] UTF-8 BOM (Herbert J. Bernstein)
- [ddlm-group] options/text vs binary/end-of-line (Herbert J. Bernstein)
- Re: [ddlm-group] options/text vs binary/end-of-line. . (James Hester)
- Re: [ddlm-group] options/text vs binary/end-of-line. .. . (James Hester)
- Re: [ddlm-group] options/text vs binary/end-of-line. .. . (Herbert J. Bernstein)
- Re: [ddlm-group] options/text vs binary/end-of-line. .. . (James Hester)
- Re: [ddlm-group] options/text vs binary/end-of-line. .. . (Herbert J. Bernstein)
- Re: [ddlm-group] options/text vs binary/end-of-line. .. . (James Hester)
- Re: [ddlm-group] options/text vs binary/end-of-line. .. . (Herbert J. Bernstein)
- Re: [ddlm-group] options/text vs binary/end-of-line. .. . (David Brown)
- Re: [ddlm-group] options/text vs binary/end-of-line. .. . (Herbert J. Bernstein)
- Re: [ddlm-group] options/text vs binary/end-of-line. .. . (SIMON WESTRIP)
- Prev by Date: Re: [ddlm-group] options/text vs binary/end-of-line. .. .. .
- Next by Date: Re: [ddlm-group] options/text vs binary/end-of-line. .. .. .
- Prev by thread: Re: [ddlm-group] options/text vs binary/end-of-line. .. .
- Next by thread: Re: [ddlm-group] options/text vs binary/end-of-line. .. .
- Index(es):