Discussion List Archives

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Cif2-encoding] How we wrap this up

Dear Herb,

On Tuesday, September 28, 2010 2:41 PM, Herbert J. Bernstein

>   The norm in standards work is to deprecate features for a while
>(at least months and preferably years) before you remove them.

I acknowledge that principle, and I see no incompatibility between it and option 5.  More below.  Do not overlook my final comments.

>> Recommending UTF-8 and / or UTF-16 without mandating support for one or
>> both does not get us where I insist we need to be.
>The problem is coming to agreement on "support" and that pesky word

By "mandating support" I mean that a file containing a sequence of characters conforming to the CIF syntax and encoded via UTF-8 is defined to be a conformant CIF everywhere.  By itself, that would not obligate anyone to encode their CIFs in UTF-8.  It would, however, mean that fully-conformant CIF2 readers must be prepared to accept CIFs encoded in that manner.  Even that is no barrier to adoption, though, for CIF users must be prepared to deal with the encoding question under any alternative on the table, and if they can read only their local encoding then they would need to be able to transcode in any event.

>  Up until now in order for a CIF application developer or
>user to produce compliant CIFS, all they had to do was to produce a text
>file in whatever encoding was provided on their system. Now you wish to
>mandate that they be able to produce UTF8 or UTF16, even if they are
>running on some code-page based system.

Not at all.  It is the single objective of the "+ local" provision of my preferred alternative to enable application developers, authors, and anyone else to continue to do exactly what you describe them already doing.


>We have already made that mistake with other CIF2 features, e.g. the
>drastic change in string quoting.

I agree with you that such changes are mistaken.  That was my motivation in questioning UTF-8 only to begin with.


>The motion I have proposed does not make anything worse for anybody
>currently using CIF and allows them to start moving into CIF2 now.

Neither your motion nor my preferred one make anything worse for anybody using CIF1, and both allow them to start moving into CIF2 now.

>  Your
>approach imposes conditions it will take months or years to meet with no
>prospect that satisfying your demands will solve any problem for anybody.

My approach imposes no special conditions, but it offers the advantages of UTF-8 as an available standard feature.  As long as we are relying on "The norm in standards work," wouldn't you agree that it is normal to introduce new features to a standard no later than the time the features they supersede are deprecated?

>Please rethink your position.

I have considered my position carefully, and rethought it several times over the course of our discussion.  I firmly believe that I am advocating a solid and eminently workable compromise between support of the existing CIF1 base and the future needs of CIF2 users.

>If we recommend UTF8/UTF16 support we have a decent chance that somebody
>will simply provide it.  If we mandate UTF8/UTF16 support we force
>pointless delays in the adoption of the rest of CIF2 and gain what in

Even if CIF2 ended up UTF-8 only, people could write software exactly as they would do under your proposal, then wrap it in a transcoder.  Or in that event I think it likely that some would implement my preferred alternative (5) as an extension.  Perhaps you would agree, as that's the same end result that you think likely somebody will simply provide, coming from the opposite direction.

I see no reason to fear any significant delays in CIF2 adoption arising from any particular result this discussion may ultimately reach.


>However, the real answer (not a joke) is that a text encoding is whatever
>the formatted I/O system in a fortran compiler on the system under
>discussion reads and writes or the format of a COBOL EBCDIC-sequential
>file or a COBOL ASCII line-sequential file, or what a text editor on the
>system handles.  That is the point -- text is something very, very system
>and language dependent. The strange thing is that text files have a much
>longer practical survival time than binary files, as backwards as that may
>seem, because there is a much larger investment in ensuring the continued
>readbility of text files than of binary files.

I am laughing, but not because I think you're joking.  As far as I can tell, that answer is functionally identical to what I have been advocating as "local".   It's even worded similarly.  My desire to include it (but not to be limited to it) is the primary difference between James's most preferred position and mine.

Best Regards,

John C. Bollinger, Ph.D.
Department of Structural Biology
St. Jude Children's Research Hospital

Email Disclaimer:  www.stjude.org/emaildisclaimer

cif2-encoding mailing list

Reply to: [list | sender only]
International Union of Crystallography

Scientific Union Member of the International Council for Science (admitted 1947). Member of CODATA, the ICSU Committee on Data. Member of ICSTI, the International Council for Scientific and Technical Information. Partner with UNESCO, the United Nations Educational, Scientific and Cultural Organization in the International Year of Crystallography 2014.

ICSU Scientific Freedom Policy

The IUCr observes the basic policy of non-discrimination and affirms the right and freedom of scientists to associate in international scientific activity without regard to such factors as ethnic origin, religion, citizenship, language, political stance, gender, sex or age, in accordance with the Statutes of the International Council for Science.