[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: Draft CIF2 standard available

Subject: RE: Draft CIF2 standard available
From: "Bollinger, John C" <John.Bollinger@xxxxxxxxxx>
Date: Wed, 7 Apr 2010 17:22:24 -0500
Accept-Language: en-US
acceptlanguage: en-US
In-Reply-To: <8F77913624F7524AACD2A92EAF3BFA54165844C2A9@SJMEMXMBS11.stjude.sjcrh.local>
References: <8F77913624F7524AACD2A92EAF3BFA54165844C2A7@SJMEMXMBS11.stjude.sjcrh.local><8F77913624F7524AACD2A92EAF3BFA54165844C2A9@SJMEMXMBS11.stjude.sjcrh.local>

Hmmm. I sent a less Microsoft-mangled version of my previous comments, but I'm not seeing it yet. I hope it didn't get lost, but for the moment I'll spare you (another) repost. Instead, here are a few additional comments on the CIF2 draft:

(*) The character set limits in change 2 are a bit strange. Specifically, the upper bound (character U+10FFF) looks like a possible typo. The upper limit of the code point values that UTF-8 can encode is specified by RFC 3629 as U+10FFFF, which bears a suspicious typographic similarity to the much lower limit specified for CIF2. On the other hand, the limit specified for CIF2 is only slightly larger than the greatest code point assigned by Unicode 5.2, so maybe that limit is intentional.

(*) The wording of the comment about characters in the private use area strikes me oddly. Part of it probably arises from the failure throughout that section to distinguish characters from code points and Unicode from UTF-8. I would suggest restating the private use comment so: "Unicode reserves code points E000 through F8FF(hex) for private use. IUCr and only IUCr can specify which characters are mapped to these code points in the context of CIF."

(*) With regard to the "Magic code" identifying CIF2-compliant files, the specification text is somewhat imprecise and slightly out of sync with the analogous item in the CIF 1.1 spec. CIF 1.1 specifies that the version ID code must be followed by CIF whitespace and must be the first 11 bytes of the file (which count appears to include the required whitespace character, thus in fact giving four distinct codes). The CIF 2 statement does not explicitly require the version ID to appear at the very beginning of the file (other content could precede it on the first line) and does not require it to be followed by whitespace.

(*) I spotted a couple more places where the new spec seems to describe the behavior of a parser / lexer for CIF instead of describing CIF itself. Specifically, in the Change 5 section, the descriptions for the three different types of delimited strings all contain comments about the behavior of a lexer for CIF. It is much preferable to describe the language itself, directly, than to characterize it in terms of how it is processed. I guess the point here is to expressly relieve general-purpose CIF parsers of responsibility to process the semantic markup codes used in conjunction with (but not formally a part of) CIF 1.1. Perhaps something along the lines of: "CIF assigns no particular meaning to characters or character patterns within the quoted string. Any such interpretations are an application responsibility."

That's it for now.

Cheers,

John

--
John C. Bollinger, Ph.D.
Computing and X-Ray Scientist
Department of Structural Biology
St. Jude Children's Research Hospital
[email protected]
www.stjude.org

Email Disclaimer: www.stjude.org/emaildisclaimer

_______________________________________________
cif-developers mailing list
[email protected]
http://scripts.iucr.org/mailman/listinfo/cif-developers

Reply to: [list | sender only]

Follow-Ups:

RE: Draft CIF2 standard available (Bollinger, John C)

References:

Re: Draft CIF2 standard available (Bollinger, John C)

RE: Draft CIF2 standard available (Bollinger, John C)

Prev by Date: RE: Draft CIF2 standard available

Next by Date: RE: Draft CIF2 standard available

Prev by thread: RE: Draft CIF2 standard available

Next by thread: RE: Draft CIF2 standard available

Index(es):

Date

Thread

Discussion List Archives

RE: Draft CIF2 standard available