Discussion List Archives

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Cif2-encoding] Splitting of imgCIF and other sub-topics. .... .

On Tuesday, September 14, 2010 9:47 AM, Herbert J. Bernstein wrote:
>   To avoid any misunderstandings, rather than worrying about how we got to where we are, let us each just state a clear position.

Very well:

I favor restricting the scope of the CIF2 specification to the file format, excluding any explicit requirements of programs, users, or other entities.  *Non-normative* commentary on the meaning, impact, and use of the normative format definition is welcome, however.  In that light,

I favor CIF2 defining binary "CIFs" formed by encoding the underlying Unicode text according to local text conventions, as in CIF1, and those formed by encoding the underlying Unicode text according to UTF-8.  CIFs of the former type are "text files" in their context; those of latter type might also be text files under some circumstances.  If the Unicode text consists exclusively of ASCII characters then these two options are indistinguishable in many contexts.

I am open to CIF2 additionally defining binary CIFs formed by encoding the underlying Unicode text according to specific alternative schemes.  In particular, I would agree to UTF-16.  My support for other specific alternatives would be granted or withheld on a case-by-case basis.

I disfavor CIF2 defining binary CIFs formed in other ways, or leaving the definition of a "CIF" open-ended, but I favor express recognition of the possibility of alternative serializations CIF-conformant Unicode text.  In that vein, I favor creating a supplementary specification for CIF storage and exchange that addresses the multitude of possible encodings that CIF2 support for local defaults would permit in various environments.

My use of the term "Unicode text" is meant to emphasize that the vast majority of the CIF2 spec is independent of any encoding.  I think the latest (May) draft of the spec for the most part uses similar terminology, and I favor that form of description over one based on UTF-8 or some other specific encoding as a placeholder or reference.

It is my expectation that a result of the above provisions would be establishment of UTF-8 as a de facto default encoding for CIF2 CIFs.


John C. Bollinger, Ph.D.
Department of Structural Biology
St. Jude Children's Research Hospital

Email Disclaimer:  www.stjude.org/emaildisclaimer
cif2-encoding mailing list

Reply to: [list | sender only]
International Union of Crystallography

Scientific Union Member of the International Council for Science (admitted 1947). Member of CODATA, the ICSU Committee on Data. Member of ICSTI, the International Council for Scientific and Technical Information. Partner with UNESCO, the United Nations Educational, Scientific and Cultural Organization in the International Year of Crystallography 2014.

ICSU Scientific Freedom Policy

The IUCr observes the basic policy of non-discrimination and affirms the right and freedom of scientists to associate in international scientific activity without regard to such factors as ethnic origin, religion, citizenship, language, political stance, gender, sex or age, in accordance with the Statutes of the International Council for Science.