[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Reply to: [list | sender only]
Re: [ddlm-group] options/text vs binary/end-of-line. .. .
- To: Group finalising DDLm and associated dictionaries <ddlm-group@iucr.org>
- Subject: Re: [ddlm-group] options/text vs binary/end-of-line. .. .
- From: Brian McMahon <bm@iucr.org>
- Date: Mon, 28 Jun 2010 09:59:21 +0100
- In-Reply-To: <alpine.BSF.2.00.1006230633170.56615@epsilon.pair.com>
- References: <AANLkTilolZk4SzLF8mzqOz4EagFJcEHDKOAblGMnoqpW@mail.gmail.com><alpine.BSF.2.00.1006212120510.91069@epsilon.pair.com><AANLkTiklvzlKquqlRQIrpPGZjJfuRzLqiv2E6Stcq6wd@mail.gmail.com><alpine.BSF.2.00.1006212241210.4105@epsilon.pair.com><4C20C4C5.1040800@mcmaster.ca><alpine.BSF.2.00.1006221021330.3911@epsilon.pair.com><AANLkTimSWR8Hb7h2Gm6n9jFR9z0_hNvcLd4_zWAbdusq@mail.gmail.com><a06240800c847057c2f66@149.72.6.57><AANLkTinPRpPQqy2uHGdnMr9gwPYes-SFsrVeodnE91Vm@mail.gmail.com><alpine.BSF.2.00.1006230633170.56615@epsilon.pair.com>
With the usual apologies for my tardiness in keeping up with this correspondence... > You will have to ask Brian what encodings the IUCr saw in Chester. To the best of my knowledge, the only encodings we have had to deal with have been ASCII and "Quoted-Printable" (and possibly other base64 encodings), the latter having come either from broken mail transmissions or by naive extraction of message bodies from CIFs sent as (encoded) email messages. I don't draw too profound a conclusion from this: some proportion may in fact have originated on EBCDIC systems but been correctly translated by ftp text-mode or other transmission protocols. (I am aware that there is not a unique EBCDIC->ASCII translation, so "correctly" implies the use somewhere along the way of heuristic transformations.) Such things did exist, and worked OK in many cases. As a historical note, in the very early days of CIF some of the files we got will have reached us from BITNET hosts via JANET's Coloured Book protocol converters - our very first connection to the Internet was via an X.29 gateway. To guard against character set conversions, we appended a 4-line comment to our distribute template file: # The following lines are used to test the character set of files sent by # network email or other means. They are not part of the CIF data set. # abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789 # !@#$%^&*()_+{}:"~<>?|\-=[];'`,./ (I wonder how that looks in your email reader!) I think the curly braces were the most likely characters to change in EBCDIC/ASCII translation. This signature is still found on the templates on our ftp site. I don't know whether we still do a character check on this signature, if it is found. In the early days at least, the comparison would have been done by simple OS text-mode tools - grep, sed, string equality tests in Bourne shell... One imagines that something nmore sophisiticated would be needed if one wanted not only to test for deviation from a canonical encoding, but to suggest the most likely intended encoding. > Brian should also be able to provide the history behind the note > I cited on encodings in CIF1. I can't readily locate the correspondence, but will try to do so if anyone is very interested. To my recollection, there was a lengthy correspondence touching on many of the same requirements for accommodating authors whose working environment was beyond their control, understanding or interest. Then, as now, there was no real sense of conflicting interests, only a protracted exploration of how best to express the desired outcome without over-complicating the standard. As has been stated many times, the ideal outcome (perfect, guaranteed uncorrupted transmission of information) is never going to be attainable because of the many layers of transmission protocols that are implemented in the real world by different vendors, programmers etc. We're still tussling with the optimal tradeoff between complexity and functionality, heuristics and algorithms, respect and authoritarianism. James's summary, just received, lays out the dialectics quite nicely, and I'll respond when I've properly digested it. My *inclination* at this stage is towards establishing as compact a standard as possible that is yet amenable to extension in the light of need, dictated by real-world experience. Best wishes Brian On Wed, Jun 23, 2010 at 06:48:27AM -0400, Herbert J. Bernstein wrote: > Dear James, > > You seem to be asking for the specific encoding used for specific > CIFs in the time window from 1991 to 2010. Precisely because of > the automatic shifts in encodings in the transfer of text files, > I don't know the encodings that the CIFs I worked with from 1995 > on used. All I know are the encodings I used in working with them, > which were 7- and 8- bit ASCII, CDC display code and several different > code-page based encodings. Personally, I tried to avoid EBCDIC. You will > have to ask Brian what encodings the IUCr saw in Chester. I do know that > I had great difficulty nailing down the representation of anything in > Unicode until just a few years ago, and that just 2 years ago I had > serious trouble under Windows-XP with confusion between q and a when > working with an English keyboard on a French-localized system. > > Brian should also be able to provide the history behind the note > I cited on encodings in CIF1. > > Regards, > Herbert > ===================================================== > Herbert J. Bernstein, Professor of Computer Science > Dowling College, Kramer Science Center, KSC 121 > Idle Hour Blvd, Oakdale, NY, 11769 > > +1-631-244-3035 > yaya@dowling.edu > ===================================================== _______________________________________________ ddlm-group mailing list ddlm-group@iucr.org http://scripts.iucr.org/mailman/listinfo/ddlm-group
Reply to: [list | sender only]
- References:
- Re: [ddlm-group] options/text vs binary/end-of-line. .. . (James Hester)
- Re: [ddlm-group] options/text vs binary/end-of-line. .. . (Herbert J. Bernstein)
- Re: [ddlm-group] options/text vs binary/end-of-line. .. . (James Hester)
- Re: [ddlm-group] options/text vs binary/end-of-line. .. . (Herbert J. Bernstein)
- Re: [ddlm-group] options/text vs binary/end-of-line. .. . (David Brown)
- Re: [ddlm-group] options/text vs binary/end-of-line. .. . (Herbert J. Bernstein)
- Re: [ddlm-group] options/text vs binary/end-of-line. .. . (James Hester)
- Re: [ddlm-group] options/text vs binary/end-of-line. .. . (James Hester)
- Re: [ddlm-group] options/text vs binary/end-of-line. .. . (Herbert J. Bernstein)
- Prev by Date: Re: [ddlm-group] options/text vs binary/end-of-line. .. .. .. .. .... .
- Next by Date: Re: [ddlm-group] options/text vs binary/end-of-line. .. .. .. .. .... .
- Prev by thread: Re: [ddlm-group] options/text vs binary/end-of-line. .. .
- Next by thread: Re: [ddlm-group] UTF-8 BOM
- Index(es):