Discussion List Archives

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [ddlm-group] options/text vs binary/end-of-line. .. .. .. .


On Tuesday, June 22, 2010 7:39 PM, James Hester wrote:
>On Wed, Jun 23, 2010 at 8:48 AM, Herbert J. Bernstein
><yaya@bernstein-plus-sons.com> wrote:
>> No, you are not obliged to accept any 'text' encoding.  It is perfectly
>> reasonble for you to insist that the user provide you with some encoding
>> that you are prepared to read reliably.
>
>No, it is not perfectly reasonable, because you and the "user" may not
>have the opportunity to negotiate encodings.  Think a collection of
>archived CIF files on CD - maybe they all have different encodings,
>and you just have to figure it out, for every file 32,000.

With respect, I think that's a straw man.  A CIF compilation that did not provide either explicit or implicit encoding information would indeed be of limited value, but that is a failing of the curator, not of a CIF2 allowance for multiple encodings.

It is part of the implied agreement between the curator of such a CIF collection and the consumers of it that the curator specifies the encodings in some way.  As Herb might point out, that is already a requirement under CIF 1.1.  Were I curating a CIF collection, I would transcode everything into UTF-8 (and document it), but that's not the only option.  The user does have the opportunity to negotiate encodings in the sense that he can use explicit or implicit metadata provided by the curator to determine the encodings.  "Negotiation" in this sense does not mean that either party must offer multiple options, nor that a mutually agreeable encoding necessarily will be found, but transcoding is always a viable option as long as the encodings are known.

[...]

>And wouldn't insisting on an encoding that you are
>prepared to accept contradict your principle about 'respect' for the
>way other people do things?

Whether you characterize it as 'respect', 'freedom', 'convenience', or something else, I see no inconsistency between the international standard allowing user choice of encoding on one hand, and individual users requiring or negotiating specific encodings for particular storage or interchange purposes on the other hand.

[...]

>My own point of view is that this step of 'insist that the user
>provide you with some encoding that you are prepared to read reliably'
>is best done right here, when the standard is drafted.  That way,
>there is no need for negotiation of encoding.

Much of my objection is that I don't think that specifying UTF-8 in the standard actually does remove all need for encoding negotiation.  I fully expect that any such encoding requirement will go largely ignored by the community at large unless it is actively enforced by a significant majority of the interested major players (Chester, PDB, CCDC, etc.).  Such active enforcement would constitute de facto encoding negotiation, even though the standard would not in principle require such.  Among other users, I expect either implicit or explicit encoding negotiation to occur anyway.

I think the standard would be better positioned by explicitly acknowledging that there is an encoding issue, and taking what steps it can to mitigate that issue in practice.  I also subscribe to Herb's point of view -- well-grounded in computing history -- that binary formats are a poor choice in general.

>Frankly, I am amazed that I am the only one who thinks mandating a
>single encoding is the obvious way forward.

[...]

>Why
>anybody would want to pass up the opportunity to settle on one is
>beyond me.

I don't think you're alone, James, but for my part, I want to pass up the opportunity largely because I don't think it's real.


John
--
John C. Bollinger, Ph.D.
Department of Structural Biology
St. Jude Children's Research Hospital




Email Disclaimer:  www.stjude.org/emaildisclaimer

_______________________________________________
ddlm-group mailing list
ddlm-group@iucr.org
http://scripts.iucr.org/mailman/listinfo/ddlm-group

Reply to: [list | sender only]
International Union of Crystallography

Scientific Union Member of the International Science Council (admitted 1947). Member of CODATA, the ISC Committee on Data. Partner with UNESCO, the United Nations Educational, Scientific and Cultural Organization in the International Year of Crystallography 2014.

International Science Council Scientific Freedom Policy

The IUCr observes the basic policy of non-discrimination and affirms the right and freedom of scientists to associate in international scientific activity without regard to such factors as ethnic origin, religion, citizenship, language, political stance, gender, sex or age, in accordance with the Statutes of the International Council for Science.