[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Reply to: [list | sender only]
RE: Draft CIF2 standard available
- Subject: RE: Draft CIF2 standard available
- From: "Bollinger, John C" <John.Bollinger@xxxxxxxxxx>
- Date: Thu, 8 Apr 2010 10:35:16 -0500
- Accept-Language: en-US
- acceptlanguage: en-US
- In-Reply-To: <8F77913624F7524AACD2A92EAF3BFA54165844C2AA@SJMEMXMBS11.stjude.sjcrh.local>
- References: <8F77913624F7524AACD2A92EAF3BFA54165844C2A7@SJMEMXMBS11.stjude.sjcrh.local><8F77913624F7524AACD2A92EAF3BFA54165844C2A9@SJMEMXMBS11.stjude.sjcrh.local><8F77913624F7524AACD2A92EAF3BFA54165844C2AA@SJMEMXMBS11.stjude.sjcrh.local>
Hi All, A few more comments about CIF2 (I hope you're not tiring of them!): (*) Change 1 / Change 2: I think it would be wise to specify that if a Unicode byte order mark (U+FEFF) appears (UTF-8 encoded) at the beginning of a CIF2 file then it is not considered part of the CIF content. Some text editors will insert these automatically (even though they are not required if the text is UTF-8 encoded), and that practice is permitted by Unicode even though it is not recommended. This particularly impacts parsers that attempt to defer UTF-8 decoding until after lexical analysis, or that make it an application responsibility. It also may affect recognition of a CIF2 file by its initial magic comment. (*) Paragraph 42 of the CIF 1.1 syntax spec permits CIF processors to normalize line break sequences, including within data values, in the same way that XML 1.0 processors are required to do. XML 1.1 extends the list of line termination sequences that an XML 1.1 processor must normalize (http://www.w3.org/TR/2006/REC-xml11-20060816/#sec-line-ends). The draft CIF2 spec expressly forbids the additional normalizations of XML 1.1 from being used in CIF2 in any syntactically significant way. I find that CIF2 limitation unfortunate, but I am not much interested in debating the point. However, may CIF2 processors at least be permitted to perform the expanded set of normalizations on data values? (*) In a previous comment, I claimed that the greatest currently-assigned Unicode code point was less than 10FFF(hex). This is incorrect, hence I now assert that U+10FFF as the upper limit of accepted CIF2 characters is either a typo or a mistake. And I think that's it. I'm not planning to perform any further analysis of the current CIF2 draft, unless in conjunction with discussion of one of the points I have raised. Best Regards, John -- John C. Bollinger, Ph.D. Computing and X-Ray Scientist Department of Structural Biology St. Jude Children's Research Hospital John.Bollinger@StJude.org www.stjude.org Email Disclaimer: www.stjude.org/emaildisclaimer _______________________________________________ cif-developers mailing list cif-developers@iucr.org http://scripts.iucr.org/mailman/listinfo/cif-developers
Reply to: [list | sender only]
- References:
- Re: Draft CIF2 standard available (Bollinger, John C)
- RE: Draft CIF2 standard available (Bollinger, John C)
- RE: Draft CIF2 standard available (Bollinger, John C)
- Prev by Date: RE: Draft CIF2 standard available
- Next by Date: CIF trip files
- Prev by thread: RE: Draft CIF2 standard available
- Next by thread: Data transformation workshop planned
- Index(es):