Discussion List Archives

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Cif2-encoding] How we wrap this up

Dear Herb,

On Monday, September 27, 2010 8:45 AM, Herbert J. Bernstein wrote:

>The problem is that options 3,4 and 5 specifically prescribe the use of Unicode characters (that is the entire point of those options -- and that is the point in dispute -- whether we should be prescribing UTF8 or using is as we now use ASCII, as a way to be clear what we are talking about as in CIF1) and we simply are not ready to deal such a requirement yet.

I think I have reached my own epiphany regarding your position.  Do correct me if I am wrong, but I now think you're saying that you don't want to distinguish any particular encoding(s) as universally acceptable (much less universally required), correct?  If so, would it be fair to describe that as just the "local" part of option 5?


On Monday, September 27, 2010 12:07 PM, Herbert J. Bernstein wrote:

>Ah, now I begin to understand the difference in our view.  I view CIF for
>journal use and PDB deposition as having a controlled vocabulary, via
>combinations of dictionaries, advice to authors, deposition standards,
>etc.  You seem to few CIF as allowing completely arbitrary, uncontrolled
>text.  [...]

Yes, I intentionally take an unilluminated view of the problem, but that is both purposeful and useful.  Text is the foundation on which CIF is built.  The bulk of the spec is devoted to defining which text conforms and which does not.  The "F" in "CIF" stands for "file," however, and if the spec is to answer the question of which *files* conform, or the related question of what a particular file means, then it needs to address the mapping between "text" and "file".  Options (1) and (2) seem crafted specifically to avoid doing so.

I understand using local convention to fill the gap (ala "local"), but I fail to see how any amount of author instructions, deposition standards, etc. can adequately do the same.  At best that moves a burden that rightfully should be borne by the format spec onto application-dependent external documents, some outside IUCr's control.  I have shown by my advocacy for option (5) that I am willing to make the definition of a conformant CIF system-dependent.  I acknowledge that that various applications place different demands on the data content of CIFs they consume.  I am not, however, willing to make the basic definition of CIF conformance application-dependent.

>Please note that proposals 1 and 2 do _not_ affect "which
>byte-sequence representations of those characters will conform to
>CIF2, under which circumstances" because they are not rigidly
>prescriptive about any
>particular byte sequences.

Options (1) and (2) certainly DO affect that question, if only by leaving it open to later, possibly conflicting, interpretation by COMCIFS, individual developers, and others.  Option 5 is about as permissive as it reasonably can be regarding the binary form a CIF may take, while still being definitive enough that general-purpose software can be written to read conformant CIFs.  If my new understanding of your viewpoint is correct, however, then your objection may be that option 5 is *too* permissive on account of its explicit allowance for UTF-8 and UTF-16.  I would be willing to drop the explicit UTF-16 support (though UTF-16 might nevertheless squeeze in as "local" in some environments).  I will under no circumstances, however, support an alternative that allows any file to be found non-conformant on account of its being encoded in UTF-8.


>This is really getting out of hand.  We need a meeting.  If
>everyone will send me their Skype id's, I will volunteer to
>set up a Skype conference call at some time that works for
>everybody (which I suspect will be 4 am EDT).  My guess is that
>1-2 hours of polite discussion will resolve this.  What
>do we have to lose?

Is there anything to gain?  The last few days have been more illuminating than the last several weeks, but it still seems evident to me that there is a fundamental difference of opinion.  I will not support an alternative that fails to make UTF-8 a universally supported character encoding for CIF, and it seems clear that James will not, either.  You seem adamant that there be no such universal requirement.  I think I understand your position better than I used to do, but I don't see where there is any scope for a consensus.  My best offer is already on the table in option (5) +- UTF-16.


John C. Bollinger, Ph.D.
Department of Structural Biology
St. Jude Children's Research Hospital

Email Disclaimer:  www.stjude.org/emaildisclaimer

cif2-encoding mailing list

Reply to: [list | sender only]
International Union of Crystallography

Scientific Union Member of the International Council for Science (admitted 1947). Member of CODATA, the ICSU Committee on Data. Member of ICSTI, the International Council for Scientific and Technical Information. Partner with UNESCO, the United Nations Educational, Scientific and Cultural Organization in the International Year of Crystallography 2014.

ICSU Scientific Freedom Policy

The IUCr observes the basic policy of non-discrimination and affirms the right and freedom of scientists to associate in international scientific activity without regard to such factors as ethnic origin, religion, citizenship, language, political stance, gender, sex or age, in accordance with the Statutes of the International Council for Science.