Discussion List Archives

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: Draft CIF2 standard available

Hmmm.  I sent a less Microsoft-mangled version of my previous comments, but I'm not seeing it yet.  I hope it didn't get lost, but for the moment I'll spare you (another) repost.  Instead, here are a few additional comments on the CIF2 draft:

(*) The character set limits in change 2 are a bit strange.  Specifically, the upper bound (character U+10FFF) looks like a possible typo.  The upper limit of the code point values that UTF-8 can encode is specified by RFC 3629 as U+10FFFF, which bears a suspicious typographic similarity to the much lower limit specified for CIF2.  On the other hand, the limit specified for CIF2 is only slightly larger than the greatest code point assigned by Unicode 5.2, so maybe that limit is intentional.

(*) The wording of the comment about characters in the private use area strikes me oddly.  Part of it probably arises from the failure throughout that section to distinguish characters from code points and Unicode from UTF-8.  I would suggest restating the private use comment so: "Unicode reserves code points E000 through F8FF(hex) for private use.  IUCr and only IUCr can specify which characters are mapped to these code points in the context of CIF."

(*) With regard to the "Magic code" identifying CIF2-compliant files, the specification text is somewhat imprecise and slightly out of sync with the analogous item in the CIF 1.1 spec.  CIF 1.1 specifies that the version ID code must be followed by CIF whitespace and must be the first 11 bytes of the file (which count appears to include the required whitespace character, thus in fact giving four distinct codes).  The CIF 2 statement does not explicitly require the version ID to appear at the very beginning of the file (other content could precede it on the first line) and does not require it to be followed by whitespace.

(*) I spotted a couple more places where the new spec seems to describe the behavior of a parser / lexer for CIF instead of describing CIF itself.  Specifically, in the Change 5 section, the descriptions for the three different types of delimited strings all contain comments about the behavior of a lexer for CIF.  It is much preferable to describe the language itself, directly, than to characterize it in terms of how it is processed.  I guess the point here is to expressly relieve general-purpose CIF parsers of responsibility to process the semantic markup codes used in conjunction with (but not formally a part of) CIF 1.1.  Perhaps something along the lines of: "CIF assigns no particular meaning to characters or character patterns within the quoted string.  Any such interpretations are an application responsibility."

That's it for now.



John C. Bollinger, Ph.D.
Computing and X-Ray Scientist
Department of Structural Biology
St. Jude Children's Research Hospital

Email Disclaimer:  www.stjude.org/emaildisclaimer

cif-developers mailing list

Reply to: [list | sender only]
International Union of Crystallography

Scientific Union Member of the International Council for Science (admitted 1947). Member of CODATA, the ICSU Committee on Data. Member of ICSTI, the International Council for Scientific and Technical Information. Partner with UNESCO, the United Nations Educational, Scientific and Cultural Organization in the International Year of Crystallography 2014.

ICSU Scientific Freedom Policy

The IUCr observes the basic policy of non-discrimination and affirms the right and freedom of scientists to associate in international scientific activity without regard to such factors as ethnic origin, religion, citizenship, language, political stance, gender, sex or age, in accordance with the Statutes of the International Council for Science.