Discussion List Archives

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [ddlm-group] options/text vs binary/end-of-line. .. .. .. .... .. .. .. .. .

On Friday, June 25, 2010 6:17 PM, SIMON WESTRIP wrote:

[Herb Bernstein wrote:]
>>  We don't need everybody to be doing the same thing.  We need everybody
>>to be able to send everybody else their information in a form in which
>>other people can correctly undertstand what they have been sent.
>I totally agree with this - which is why I have advocated that the standard should be totally unambigous and
at the same time be as accessible as possible. I beleive that I have expressed before an acceptance that we
>may have to adopt a certain degree of heuristic encoding determination in order to accommodate user practice;
>I do not shy away from this. I am, however, seeking a way to avoid, if possible, the amiguity that code-page based
>encodings present.

As Herb has steadfastly maintained, there is no such way.  In particular, only by controlling the encoding process can you avoid having to deal with every conceivable text encoding scheme.  Given the specification and history of CIF1, and the goal of CIF being compatible with general-purpose text tools, it is unreasonable to believe that standardizing one or a few official text encoding schemes for CIF2 will provide an effective control on CIF2 encoding.  (That's important, so let's discuss it further if there is disagreement.)  That leaves everyone, in practice, having to deal with every conceivable encoding.

That does not mean that anyone must deal *equally* with every encoding, however.  That would be impossible, unless you count manipulations that are insensitive to the encoding.  The current spec draft provides little guidance here: it's requirement for UTF-8 could be taken to mean that CIF2 processors must reject otherwise-conformant CIFs encoded via some other scheme, but few here seem to anticipate that their software will actually be that strict.

So (everyone), within your domain, do you then favor addressing the encoding question administratively?  For example, you might as a matter of policy reject CIFs encoded via schemes outside some chosen set you can reliably recognize.  That would be entirely reasonable, but it does not rely on or benefit from any particular encoding requirement in the standard.

Or do you instead favor adapting as best you can to whatever you receive?  That might benefit from having a standardized mechanism for communicating encoding information along with a CIF, and at worst it would be no worse off for there being such a standard mechanism.

Or do you have another alternative?  If so, how does it benefit in practice from the standard designating one or a few allowed text encodings, or how is it harmed by a standardized mechanism for communicating encoding information?


John C. Bollinger, Ph.D.
Department of Structural Biology
St. Jude Children's Research Hospital

Email Disclaimer:  www.stjude.org/emaildisclaimer
ddlm-group mailing list

Reply to: [list | sender only]
International Union of Crystallography

Scientific Union Member of the International Science Council (admitted 1947). Member of CODATA, the ISC Committee on Data. Partner with UNESCO, the United Nations Educational, Scientific and Cultural Organization in the International Year of Crystallography 2014.

International Science Council Scientific Freedom Policy

The IUCr observes the basic policy of non-discrimination and affirms the right and freedom of scientists to associate in international scientific activity without regard to such factors as ethnic origin, religion, citizenship, language, political stance, gender, sex or age, in accordance with the Statutes of the International Council for Science.