Discussion List Archives

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Cif2-encoding] How we wrap this up


On Wednesday, September 29, 2010 2:17 AM, SIMON WESTRIP

>John, I do not think a specification that suggests that a CIF can be invalidated simply by being moved to
>another environment is helpful to anyone.

In that case, you must be operating under a different definition of "text" than Herb provided yesterday:

On Tuesday, September 28, 2010 2:41 PM, Herbert J. Bernstein wrote:
>However, the real answer (not a joke) is that a text encoding is whatever
>the formatted I/O system in a fortran compiler on the system under
>discussion reads and writes or the format of a COBOL EBCDIC-sequential
>file or a COBOL ASCII line-sequential file, or what a text editor on the
>system handles.  That is the point -- text is something very, very system
>and language dependent. [...]

The potential for confusion over the meaning of "text" was by far my greatest cause for concern about the "As for CIF1..." alternatives, so I am very grateful to Herb for providing a definition.  I am furthermore very pleased that his definition matches so well the one that I have advanced under the label "local", which I think is also the best interpretation of the requirements of CIF1.  Even disregarding the definition of "text", however, CIF1 clearly holds that a CIF can indeed be invalidated simply by being moved to another environment.  In particular, CIF1 expressly specifies that CIF processors are not required to understand non-native line termination sequences.  I have used CIF1 processors on several platforms that do not do so.  As has been observed several times, CIF1 has nevertheless served well for years.  We would not be having this discussion now if it were not helpful to many people.

I submit that among the options on the table, only (3) and (4) do not leave CIF2 CIFs susceptible to invalidation upon being moved to a different environment.  These are not my overall preference, but I favor them over "text"-only because they permit use of UTF-8.  Under the above definition of "text" and the "As for CIF1..." proposals, any recommendation that the spec might make to use UTF-8 and / or UTF-16 would be futile.  Depending on the environment, either UTF-8(-16) would be required for conformance with the local definition of "text", or it would be forbidden as non-conforming (I disregard the case of ASCII-only CIFs for which the encoding could be construed as any ASCII-compatible encoding, including UTF-8).  In most current environments, UTF-8 would be forbidden.

As much as I join Herb in favoring support for "text" CIFs as he defines them, I remain convinced that UTF-8 must be a conformant option for CIF2 to move ahead.  I think UTF-8 would be sufficient to cover most (but not all) of the cases for which "text" ensures support, thus my preference for options (3) and (4) over options (1) and (2).  This is, again, the genesis of option (5), which I think now could be relabeled "text + UTF-8 (+- UTF-16)".


John C. Bollinger, Ph.D.
Department of Structural Biology
St. Jude Children's Research Hospital

Email Disclaimer:  www.stjude.org/emaildisclaimer
cif2-encoding mailing list

Reply to: [list | sender only]
International Union of Crystallography

Scientific Union Member of the International Council for Science (admitted 1947). Member of CODATA, the ICSU Committee on Data. Member of ICSTI, the International Council for Scientific and Technical Information. Partner with UNESCO, the United Nations Educational, Scientific and Cultural Organization in the International Year of Crystallography 2014.

ICSU Scientific Freedom Policy

The IUCr observes the basic policy of non-discrimination and affirms the right and freedom of scientists to associate in international scientific activity without regard to such factors as ethnic origin, religion, citizenship, language, political stance, gender, sex or age, in accordance with the Statutes of the International Council for Science.