[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [ddlm-group] options/text vs binary/end-of-line. .. .

To: Group finalising DDLm and associated dictionaries <[email protected]>
Subject: Re: [ddlm-group] options/text vs binary/end-of-line. .. .
From: Brian McMahon <[email protected]>
Date: Mon, 28 Jun 2010 09:59:21 +0100
In-Reply-To: <[email protected]>
References: <[email protected]><[email protected]><[email protected]><[email protected]><[email protected]><[email protected]><[email protected]><[email protected]><[email protected]><[email protected]>

With the usual apologies for my tardiness in keeping up with this
correspondence...

> You will have to ask Brian what encodings the IUCr saw in Chester.

To the best of my knowledge, the only encodings we have had to deal
with have been ASCII and "Quoted-Printable" (and possibly other
base64 encodings), the latter having come either from broken mail
transmissions or by naive extraction of message bodies from CIFs sent
as (encoded) email messages.

I don't draw too profound a conclusion from this: some proportion may
in fact have originated on EBCDIC systems but been correctly
translated by ftp text-mode or other transmission protocols. (I am
aware that there is not a unique EBCDIC->ASCII translation, so
"correctly" implies the use somewhere along the way of heuristic
transformations.) Such things did exist, and worked OK in many
cases. As a historical note, in the very early days of CIF some of the
files we got will have reached us from BITNET hosts via JANET's
Coloured Book protocol converters - our very first connection to the
Internet was via an X.29 gateway.

To guard against character set conversions, we appended a 4-line
comment to our distribute template file:

# The following lines are used to test the character set of files sent by
# network email or other means. They are not part of the CIF data set.
# abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789
# !@#$%^&*()_+{}:"~<>?|\-=[];'`,./

(I wonder how that looks in your email reader!) I think the curly
braces were the most likely characters to change in EBCDIC/ASCII
translation. This signature is still found on the templates on our ftp
site. I don't know whether we still do a character check on this
signature, if it is found. In the early days at least, the comparison
would have been done by simple OS text-mode tools - grep, sed, string
equality tests in Bourne shell... One imagines that something nmore
sophisiticated would be needed if one wanted not only to test for
deviation from a canonical encoding, but to suggest the most likely
intended encoding.

> Brian should also be able to provide the history behind the note
> I cited on encodings in CIF1.

I can't readily locate the correspondence, but will try to do so if
anyone is very interested. To my recollection, there was a lengthy
correspondence touching on many of the same requirements for
accommodating authors whose working environment was beyond their
control, understanding or interest. Then, as now, there was no real
sense of conflicting interests, only a protracted exploration of how
best to express the desired outcome without over-complicating the
standard.

As has been stated many times, the ideal outcome (perfect, guaranteed
uncorrupted transmission of information) is never going to be
attainable because of the many layers of transmission protocols that
are implemented in the real world by different vendors, programmers
etc. We're still tussling with the optimal tradeoff between complexity
and functionality, heuristics and algorithms, respect and
authoritarianism.

James's summary, just received, lays out the dialectics quite nicely,
and I'll respond when I've properly digested it. My *inclination* at
this stage is towards establishing as compact a standard as possible
that is yet amenable to extension in the light of need, dictated by
real-world experience.

Best wishes
Brian

On Wed, Jun 23, 2010 at 06:48:27AM -0400, Herbert J. Bernstein wrote:
> Dear James,
> 
>   You seem to be asking for the specific encoding used for specific
> CIFs in the time window from 1991 to 2010.  Precisely because of
> the automatic shifts in encodings in the transfer of text files,
> I don't know the encodings that the CIFs I worked with from 1995
> on used.  All I know are the encodings I used in working with them,
> which were 7- and 8- bit ASCII, CDC display code and several different
> code-page based encodings.  Personally, I tried to avoid EBCDIC.  You will 
> have to ask Brian what encodings the IUCr saw in Chester.  I do know that 
> I had great difficulty nailing down the representation of anything in 
> Unicode until just a few years ago, and that just 2 years ago I had 
> serious trouble  under Windows-XP with confusion between q and a when 
> working with an English keyboard on a French-localized system.
> 
>   Brian should also be able to provide the history behind the note
> I cited on encodings in CIF1.
> 
>   Regards,
>     Herbert
> =====================================================
>  Herbert J. Bernstein, Professor of Computer Science
>    Dowling College, Kramer Science Center, KSC 121
>         Idle Hour Blvd, Oakdale, NY, 11769
> 
>                  +1-631-244-3035
>                  [email protected]
> =====================================================
_______________________________________________
ddlm-group mailing list
[email protected]
http://scripts.iucr.org/mailman/listinfo/ddlm-group

Reply to: [list | sender only]

References:

Re: [ddlm-group] options/text vs binary/end-of-line. .. . (James Hester)

Re: [ddlm-group] options/text vs binary/end-of-line. .. . (Herbert J. Bernstein)

Re: [ddlm-group] options/text vs binary/end-of-line. .. . (James Hester)

Re: [ddlm-group] options/text vs binary/end-of-line. .. . (Herbert J. Bernstein)

Re: [ddlm-group] options/text vs binary/end-of-line. .. . (David Brown)

Re: [ddlm-group] options/text vs binary/end-of-line. .. . (Herbert J. Bernstein)

Re: [ddlm-group] options/text vs binary/end-of-line. .. . (James Hester)

Re: [ddlm-group] options/text vs binary/end-of-line. .. . (James Hester)

Re: [ddlm-group] options/text vs binary/end-of-line. .. . (Herbert J. Bernstein)

Prev by Date: Re: [ddlm-group] options/text vs binary/end-of-line. .. .. .. .. .... .

Next by Date: Re: [ddlm-group] options/text vs binary/end-of-line. .. .. .. .. .... .

Prev by thread: Re: [ddlm-group] options/text vs binary/end-of-line. .. .

Next by thread: Re: [ddlm-group] UTF-8 BOM

Index(es):

Date

Thread

Discussion List Archives

Re: [ddlm-group] options/text vs binary/end-of-line. .. .