Discussion List Archives

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Cif2-encoding] Let's all take a deep breath...

  • To: Group for discussing encoding and content validation schemes for CIF2 <cif2-encoding@xxxxxxxx>
  • Subject: [Cif2-encoding] Let's all take a deep breath...
  • From: James Hester <jamesrhester@xxxxxxxxx>
  • Date: Mon, 27 Sep 2010 09:46:49 +1000
Well, I didn't even manage to properly call a vote and everybody has
piled in, Simon even managed to vote twice (and that's quite OK Simon,
we are trying to determine what the will of the group is and so I
think it only reasonable that if somebody's assessment of the
situation changes that they can 'update' their vote).  I am however
unhappy that both Brian and Simon introduced new concerns and nobody
has had a chance to comment on how the various proposals under
consideration might affect those concerns.  I would therefore like to
suggest that the voting period continues until the end of this week,
and that we all endeavour to express any concerns or comments that we
need to make in a timely fashion.  I will be commenting on Brian and
Simon's concerns presently, and also on Herbert's proposal, which I
have not subjected to my hopefully not too long-winded scrutiny.

None of us should feel steamrolled by a certain artifical urgency that
has appeared in the dialogue - while we do need to wrap things up in a
timely fashion, it has only been 4 days since I even started
discussing the vote.

Some initial general comments (I will comment separately on Brian and
Simon's issues).

(i) We are *not* in an infinite loop.  The last few months have seen
several proposals analysed and explored, and it is my perception that
these discussions have led at least some participants (including
myself) to a better understanding of the consequences of what they are
proposing.  So nobody should feel that throwing out a new criticism of
an old or new proposal is somehow hindering progress by looping over
old ground.  Quite the reverse, it is making progress.  What *is*
important is to get your comments into the mix in a timely fashion,
because time is indeed short.

(ii) It is not correct to assume that we can figure out the encoding
issues later.  Maybe we can, but maybe we can't. Once CIF2 files are
produced and software is distributed, you can't put the genie back in
the bottle, by which I mean you can't easily change the way that
distributed software behaves, and how files are interpreted.  We have
to therefore be confident that the standard we promulgate does not
close off an avenue we need for solving encoding issues.

(iii) It is extremely misleading to think that simply substituting
UTF8 in CIF2 for ASCII in CIF1 will lead to even approximately the
same results as we had for CIF1.  The 'any encoding' clause in the
CIF1 standard was essentially irrelevant - encodings used in the
overwhelming majority of systems producing CIF1 files coincided with
ASCII for CIF text, as I have said many times before, so software had
no trouble in turning a stream of CIF bytes from any unknown source
into the same text that the CIF writer was working from.  If I repeat
this point endlessly, it is only because the CIF1 approach continues
to be invoked like magic fairy dust that will make everything OK, when
in fact the magic fairy dust was the dominance of ASCII encoding for
ASCII codepoints.  There is *no such uniformity* in encoding of
Unicode codepoints.  We have a new problem for CIF, and whatever we do
will have *new* consequences, and that very much includes the 'as for
CIF1' proposal.  So please, enough with the 'CIF1 has served us well
for 15 years' line.

(iv) The majority are currently in favour of the 'as for CIF1'
approach, which if nobody changes their vote by the end of the week,
is what we will be taking to the DDLm group and COMCIFS.  This means
we will have a pure text standard, and I mean really pure, because
there is no predictable link between this beautiful textual castle in
the sky and the solid ground of bytes on disk.

I am a cross-platform CIF programmer. Looking forward to the halcyon
'as for CIF1' days that await us, a small question occupies my mind.
As my program does not operate in that glorious abstract space
occupied by pure text standards that are most certainly not anybody's
laughing stock, my program will be forced to (as briefly as possible)
deal with humble plebiean bytes according to some encoding to obtain
the exalted CIF text.   Under the 'as for CIF1' proposal, how does my
program turn these bytes into text in the way that the writer of the
bytes intended?  If that is not yet resolved, how can anybody even
write a CIF2 program?

T +61 (02) 9717 9907
F +61 (02) 9717 3145
M +61 (04) 0249 4148
cif2-encoding mailing list

Reply to: [list | sender only]
International Union of Crystallography

Scientific Union Member of the International Council for Science (admitted 1947). Member of CODATA, the ICSU Committee on Data. Member of ICSTI, the International Council for Scientific and Technical Information. Partner with UNESCO, the United Nations Educational, Scientific and Cultural Organization in the International Year of Crystallography 2014.

ICSU Scientific Freedom Policy

The IUCr observes the basic policy of non-discrimination and affirms the right and freedom of scientists to associate in international scientific activity without regard to such factors as ethnic origin, religion, citizenship, language, political stance, gender, sex or age, in accordance with the Statutes of the International Council for Science.