Discussion List Archives

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Cif2-encoding] Let's all take a deep breath.... .

On Sunday, September 26, 2010 6:47 PM, James Hester wrote:

>I am however
>unhappy that both Brian and Simon introduced new concerns and nobody
>has had a chance to comment on how the various proposals under
>consideration might affect those concerns.  I would therefore like to
>suggest that the voting period continues until the end of this week,
>and that we all endeavour to express any concerns or comments that we
>need to make in a timely fashion.

I have responded to Simon's new concerns, I think, but not to Brian's.  Supplemental to James's well-reasoned comments, then:

On Friday, September 24, 2010 4:24 AM, Brian McMahon wrote:

>I still feel this argument is at heart a "binary/text"
>dichotomy, where "binary" implies that one can prescribe specific byte-level representations of every distinct character; "text"
>implies that you're at the mercy of external libraries and mappings between encoding conventions - and those >mappings are not always explicit or easy to identify.

That characterization of "text" sounds suspiciously similar to the "local" part of option 5 -- as it should, because the two attempt to describe the same (I think) concept.  I am open to alternative definitions, but I do not comprehend the apparent aversion to defining these terms.  If they are so obvious as to not require definition, then providing definitions anyway will be simple and harmless.  If not, then how else do we expect consumers of the spec to come to the same conclusion about what it means?

>I sympathise greatly with James's desire for a prescriptive, "binary"
>approach, but its corollary is that a CIF application must take full responsibility for expressing any supported extended character set (I mean accented Latin letters, Greek characters, Cyrillic or Chinese alphabets).

I do not follow this logic, inasmuch as it seems to be about the CIF2 character repertoire, rather than about the encodings with which characters from that repertoire may be encoded.  The character repertoire is not the subject of this debate.

Relying on "text" to define allowed characters would mean that some reasonable content expressed in conformant CIF form on one system cannot be expressed in any conformant CIF form on another.  For example, a CIF-format, Chinese-language journal article encoded in EUC-CN might be perfectly valid CIF in the journal office, but there would be no CIF-conformant way to represent it at all on a system whose definition of "text" does not accommodate Chinese characters.


>I put option 5 at the bottom because of the non-portability of a "local" encoding.

This is the part I understand least.  "Text" is at least roughly equivalent to "local", and entirely as non-portable.  Merely tagging CIFs with encoding information doesn't fix that very well, as we covered in the course of our discussion, particularly when doing so is optional.  Moreover, even optional tagging is a feature only of choice 2, not choice 1.


John C. Bollinger, Ph.D.
Department of Structural Biology
St. Jude Children's Research Hospital

Email Disclaimer:  www.stjude.org/emaildisclaimer

cif2-encoding mailing list

Reply to: [list | sender only]
International Union of Crystallography

Scientific Union Member of the International Council for Science (admitted 1947). Member of CODATA, the ICSU Committee on Data. Member of ICSTI, the International Council for Scientific and Technical Information. Partner with UNESCO, the United Nations Educational, Scientific and Cultural Organization in the International Year of Crystallography 2014.

ICSU Scientific Freedom Policy

The IUCr observes the basic policy of non-discrimination and affirms the right and freedom of scientists to associate in international scientific activity without regard to such factors as ethnic origin, religion, citizenship, language, political stance, gender, sex or age, in accordance with the Statutes of the International Council for Science.