Re: [Cif2-encoding] Drafting issues

On Friday, October 01, 2010 8:00 AM, James Hester wrote:

>Herbert, you have proposed an entirely reasonable rewriting of what I
>proposed with an entirely reasonable justification.  I'm happy to
>accept your new wording.

I, too, think it an improvement.  I believe that gives us the following text in Change 2, preceding the character set enumeration (my formatting):


CIF2 files are standard variable length plain text files, which for
compatibility with older processing systems will have a maximum line length
of 2048 characters. As discussed above and below, however, there are some
restrictions on the character set for token delimiters, separators and data

For compatibility with CIF1 behaviour, there is no formal
restriction on the encoding of CIF2 files providing they contain only code
points from the ASCII range.  If a CIF2 file contains characters equivalent
to Unicode code points greater than U+0077 (127 decimal), then the
particular encoding used must either be UTF8 or algorithmically
identifiable from the CIF2 file itself. Note that UTF16 with a BOM conforms
to this requirement.  The use of a BOM for Unicode encodings, including
UTF8, is recommended.

Acceptable identification algorithms will be
published as necessary as annexes to this standard (see description of
magic code and encoding disambiguation in Change 1).  A CIF2 file
containing characters outside the ASCII range with no BOM and no
disambiguation signature will be a UTF8 file.  A CIF2 file containing
characters outside the ASCII range with a valid UTF8 or UTF16 BOM and no
disambiguation signature, will be a Unicode file written in the indicated


I think with that we have reached an acceptable position.  I do propose three editorial changes, however, that I intend to clarify the wording without changing its meaning in any way:

1) I suggest that Herb's new text (the last two sentences above) be made the first annex, as it in fact constitutes the first acceptable identification algorithm that is defined.  Alternatively, let us slightly reword the preceding text to clarify that the last sentences describe one acceptable algorithm among potentially several.

2) I furthermore suggest that the sentence "Note that UTF16 with a BOM conforms to this requirement" be deleted, for that is redundant as a consequence of Herb's wording.

3) Finally, I recommend moving the sentence "The use of a BOM for Unicode encodings, including UTF8, is recommended" to the end of that passage, so as to place the comments about acceptable identification algorithms immediately after the requirement that some encodings be "algorithmically identifiable".  This will form a clearer logical progression.

I hope these changes will be adopted, but my acceptance of the proposal is not conditioned on that.

>The worst is behind us, and we are currently mopping up.  After making
>it through the mountain pass, surely you didn't expect to just fall
>off a cliff to the meadows below?  Perhaps that should be a haiku:
>Crunching through a snowlit pass
>A distant eagle floats above the sunny meadows
>Ah! The roads of the air.

The goal before us,
our travail yields a bounty.
My spirit aloft.

John C. Bollinger, Ph.D.
Department of Structural Biology
St. Jude Children's Research Hospital

