Discussion List Archives

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Request for a vote on a motion

  • To: comcifs@iucr.org, cif2-encoding@iucr.org
  • Subject: Request for a vote on a motion
  • From: "Herbert J. Bernstein" <yaya@bernstein-plus-sons.com>
  • Date: Fri, 17 Sep 2010 21:39:13 -0400
  • In-Reply-To: <8F77913624F7524AACD2A92EAF3BFA5416659DEDD6@SJMEMXMBS11.stjude.sjcrh.local>
  • References: <AANLkTilyJE2mCxprlBYaSkysu1OBjY7otWrXDWm3oOT9@mail.gmail.com><8F77913624F7524AACD2A92EAF3BFA541661229542@SJMEMXMBS11.stjude.sjcrh.local> <AANLkTikTee4PicHKjnnbAdipegyELQ6UWLXz9Zm08aVL@mail.gmail.com><8F77913624F7524AACD2A92EAF3BFA541661229552@SJMEMXMBS11.stjude.sjcrh.local> <AANLkTinZ4KNsnREOOU6sVFdGYR_aQHcjdWr_ko648NGm@mail.gmail.com><8F77913624F7524AACD2A92EAF3BFA5416659DED8C@SJMEMXMBS11.stjude.sjcrh.local> <AANLkTintziXhwVCEFD0yUtTDo9KG8ut=oL4OgmkjmEBe@mail.gmail.com><alpine.BSF.2.00.1008240629120.23114@epsilon.pair.com><AANLkTi=+qZQrWJ3duOzWyPq5H=w1GOVbeKRfFLTR8u5a@mail.gmail.com><alpine.BSF.2.00.1008240920580.23114@epsilon.pair.com><AANLkTikRLKp6oREvD4KcgUd-H-Cu6xoOrGWgQE1zUyx7@mail.gmail.com><alpine.BSF.2.00.1009022333190.52468@epsilon.pair.com><AANLkTimLUnUjNuS9EmMbtTurxB3MGtGvM6gWxZw6aRLE@mail.gmail.com><alpine.BSF.2.00.1009030735110.95035@epsilon.pair.com><AANLkTinxkquC5cY0m23yzBVgm7afmYYfh6+2yMz=Hr_w@mail.gmail.com><alpine.BSF.2.00.1009100711070.59446@epsilon.pair.com><8F77913624F7524AACD2A92EAF3BFA5416659DEDBD@SJMEMXMBS11.stjude.sjcrh.local> <AANLkTikuoQEU-rv9GkTqqc0u0qgd1ugf+cGTfqF77j-E@mail.gmail.com><8F77913624F7524AACD2A92EAF3BFA5416659DEDC0@SJMEMXMBS11.stjude.sjcrh.local> <AANLkTiks-tEAU9T_ygwvNhs_YpzE1+ZVb=K_=0DT8UuK@mail.gmail.com><8F77913624F7524AACD2A92EAF3BFA5416659DEDCF@SJMEMXMBS11.stjude.sjcrh.local> <a06240800c8b95c519c8a@[]><8F77913624F7524AACD2A92EAF3BFA5416659DEDD5@SJMEMXMBS11.stjude.sjcrh.local> <a06240800c8b9962d2bff@[]><8F77913624F7524AACD2A92EAF3BFA5416659DEDD6@SJMEMXMBS11.stjude.sjcrh.local>
Point well taken.  To ensure that the voting members of COMCIFS see
this message in their role as voting members of COMCIFS, I am copying
it to that  list as well.

Dear Colleagues on COMCIFS.  There has been a long and interesting
discussion of various alternatives for handling or not handling
various encodings of text for CIF2 and/or recasting CIF as a UTF-8 based
binary format.  These discussions have not achieved anything approximating
agreement on these issues, but we do seem to have agreement on the
important of extending CIF to handle Unicode, and on the desirability
of UTF8 serving the role that ASCII has long held as a preferred
default encoding for CIF.  In my opinion, it is a mistake to hold
release of CIF2 hostage to what is, frankly, a minor side-issue.
CIF has long-established practices with respect to the role of
ASCII and text in CIF that have been formally adopted by COMCIFS
in the form of the current CIF 1.1 syntax specification.  The
appended proposed resolution combines the currently adopted CIF
practice with respect to ASCII and text with the minimal changes
necessary to follow those same practices with respect to Unicode
and UTF8 accepting what has been proposed to the community as
a whole as the changes that will make CIF2.  In the 2+ months
since that proposal, nobody has objected to anything in that
document other than to the way in which UTF8 was proposed.  The
motion below simply combines was has not caused objection with
the practices previously adopted by COMCIFS and defacto in use for
many years.  Until we have agreement on something else -- which
seems likely to take years more of debate -- I urge all concerned
to accept this imperfect motion as the best we can do for now
so CIF2 can be used.

Please consult sections 22 and 23 of the CIF 1.1 syntax specification
and the "CIF Changes to the specification 05 July 2010", and then
please consider the appended motion submitted for formal vote by
COMCIFS.  If anybody likes the basic idea of a minimally disruptive
change to what has already been agreed for CIF2, but wants
some minimal wording changes, the right way to do that is just to
proposed the specific wording changes you propose as an amendment
to this motion to be voted on first, but please, let us get something
agreed to promptly.  Elephants get born is less time than CIF2 is


Proposed position on CIF2 character encodings submitted to
COMCIFS for a vote as an interim agreement on what can be
agreed thus far, subject to extension and refinement in
the future.


Reference to character(s) means abstract characters assigned code
points by Unicode.  Specific characters are referenced according to
Unicode convention, U+xxxx[x[x]], where  xxxx[x[x]] is the four- to
six-digit hexadecimal representation of the assigned code point.

The designated character encoding for CIF2 is UTF-8 as the preferred
concrete representation of the information in a CIF2 document.

Reference to ASCII characters means characters U+0000 through U+007F, or,
equivalently the first 128 characters of the ISO-8859-1 (LATIN-1)
character set.

Reference to newline or \n means the sequence that conventionally
terminates a line record (which is environment dependent).
Reference to whitespace means the characters ASCII space (U+0020),
ASCII horizontal tab (U+0009) and the newline characters. Without
regard to local  convention, the various other characters that
Unicode classifies as whitespace (character categories Zs and Zp) do
not constitute whitespace for the purposes of CIF2.

CIF2 files are standard variable length text files, which for
compatibility with older processing systems will have a maximum line
length of 2048 characters. As discussed above and below, however,
there are some restrictions on the  character set for token
delimiters, separators and data names.

References to Unicode and UTF-8 are specifically to identify characters
and a concrete representation of those characters in an established and
widely available standard.  It is understood that CIF2 documents may
be constructed and maintained on computer that implements other character
encodings.  However, for maximum portability only the clearly
identified equivalents to the Unicode characters identified above and
below should
be used and use of UTF-8 for a concrete representation is highly

A CIF2 file is uniquely identified by a required magic code at the
beginning of its first line. The code is, #\#CIF_2.0 followed
immediately by whitespace.  The addition of further information
to assist in disambiguation among multiple characters sets is
under discussion.  Encodings, such a UTF-16, which prefix a file
by a BOM (byte-order-message) or other encoding disambiguation
prefix are not precluded.  In such a case, the magic code should
follow the encoding disambiguation prefix.

In keeping with XML restrictions we allow the characters

U+0009 U+000A U+000D
U+0020 -- U+007E
U+00A0 -- U+D7FF
U+E000 -- U+FDCF
U+10000 -- U+10FFFD

In addition, character U+FEFF and characters U+xFFFE or U+xFFFF where
x is any hexadecimal digit are disallowed. Unicode reserves the code
points E000 - F8FF for private use. The IUCr and only the IUCr may specify
what characters  are assigned to these code points in the context of

CIF2 processors are required to treat <U+000A>, <U+000D> and
<U+000D><U+000A> as newline characters, by normalising them to
<U+000A> on read. No other  characters or character sequences may
represent newline. In particular, CIF2  processors should not
interpret the Unicode characters U+2028 (line separator) or U+2029
(paragraph separator) as newline.

At 5:42 PM -0500 9/17/10, Bollinger, John C wrote:
>On Friday, September 17, 2010 5:06 PM, Herbert J. Bernstein wrote:
>>I am formally asking for a vote of all COMCIFS voting members on the proposed
>>wording as it stands.
>It is your privilege to do so, but if you genuinely want a formal 
>COMCIFS vote -- as opposed to a vote of the participants in this 
>discussion -- then I do not believe the motion is in order in this 
>If you would be satisfied with a vote of the discussion 
>participants, then I vote NO.
>John C. Bollinger, Ph.D.
>Department of Structural Biology
>St. Jude Children's Research Hospital
>Email Disclaimer:  www.stjude.org/emaildisclaimer
>cif2-encoding mailing list

  Herbert J. Bernstein, Professor of Computer Science
    Dowling College, Kramer Science Center, KSC 121
         Idle Hour Blvd, Oakdale, NY, 11769


Reply to: [list | sender only]