[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Reply to: [list | sender only]
Re: [Cif2-encoding] Drafting issues
- To: Group for discussing encoding and content validation schemes for CIF2 <cif2-encoding@xxxxxxxx>
- Subject: Re: [Cif2-encoding] Drafting issues
- From: "Bollinger, John C" <John.Bollinger@xxxxxxxxxx>
- Date: Fri, 1 Oct 2010 09:10:08 -0500
- Accept-Language: en-US
- acceptlanguage: en-US
- In-Reply-To: <AANLkTi=3wpjym1rbxDWWx88ZTbWejVum8HVM5_p_x3eC@mail.gmail.com>
- References: <AANLkTi=tMbDbk8s9hvGBKY_NU8tCpmfH53QFLgwxH-=G@mail.gmail.com><AANLkTinRpwzqrytSFB8Xj++xohnr6Je19a5VgRscd5TD@mail.gmail.com><AANLkTimSRURYNSv+Eu+_CsPUq4kVdbDX9LO6+1mk3fA=@mail.gmail.com><alpine.BSF.2.00.1010010733310.70666@epsilon.pair.com><AANLkTi=3wpjym1rbxDWWx88ZTbWejVum8HVM5_p_x3eC@mail.gmail.com>
On Friday, October 01, 2010 8:00 AM, James Hester wrote: >Herbert, you have proposed an entirely reasonable rewriting of what I >proposed with an entirely reasonable justification. I'm happy to >accept your new wording. I, too, think it an improvement. I believe that gives us the following text in Change 2, preceding the character set enumeration (my formatting): ==== CIF2 files are standard variable length plain text files, which for compatibility with older processing systems will have a maximum line length of 2048 characters. As discussed above and below, however, there are some restrictions on the character set for token delimiters, separators and data names. For compatibility with CIF1 behaviour, there is no formal restriction on the encoding of CIF2 files providing they contain only code points from the ASCII range. If a CIF2 file contains characters equivalent to Unicode code points greater than U+0077 (127 decimal), then the particular encoding used must either be UTF8 or algorithmically identifiable from the CIF2 file itself. Note that UTF16 with a BOM conforms to this requirement. The use of a BOM for Unicode encodings, including UTF8, is recommended. Acceptable identification algorithms will be published as necessary as annexes to this standard (see description of magic code and encoding disambiguation in Change 1). A CIF2 file containing characters outside the ASCII range with no BOM and no disambiguation signature will be a UTF8 file. A CIF2 file containing characters outside the ASCII range with a valid UTF8 or UTF16 BOM and no disambiguation signature, will be a Unicode file written in the indicated encoding. ==== I think with that we have reached an acceptable position. I do propose three editorial changes, however, that I intend to clarify the wording without changing its meaning in any way: 1) I suggest that Herb's new text (the last two sentences above) be made the first annex, as it in fact constitutes the first acceptable identification algorithm that is defined. Alternatively, let us slightly reword the preceding text to clarify that the last sentences describe one acceptable algorithm among potentially several. 2) I furthermore suggest that the sentence "Note that UTF16 with a BOM conforms to this requirement" be deleted, for that is redundant as a consequence of Herb's wording. 3) Finally, I recommend moving the sentence "The use of a BOM for Unicode encodings, including UTF8, is recommended" to the end of that passage, so as to place the comments about acceptable identification algorithms immediately after the requirement that some encodings be "algorithmically identifiable". This will form a clearer logical progression. I hope these changes will be adopted, but my acceptance of the proposal is not conditioned on that. >The worst is behind us, and we are currently mopping up. After making >it through the mountain pass, surely you didn't expect to just fall >off a cliff to the meadows below? Perhaps that should be a haiku: > >Crunching through a snowlit pass >A distant eagle floats above the sunny meadows >Ah! The roads of the air. The goal before us, our travail yields a bounty. My spirit aloft. John -- John C. Bollinger, Ph.D. Department of Structural Biology St. Jude Children's Research Hospital Email Disclaimer: www.stjude.org/emaildisclaimer _______________________________________________ cif2-encoding mailing list cif2-encoding@iucr.org http://scripts.iucr.org/mailman/listinfo/cif2-encoding
Reply to: [list | sender only]
- Follow-Ups:
- Re: [Cif2-encoding] Drafting issues (Bollinger, John C)
- References:
- [Cif2-encoding] Drafting issues (James Hester)
- Re: [Cif2-encoding] Drafting issues (James Hester)
- Re: [Cif2-encoding] Drafting issues (James Hester)
- Re: [Cif2-encoding] Drafting issues (Herbert J. Bernstein)
- Re: [Cif2-encoding] Drafting issues (James Hester)
- Prev by Date: Re: [Cif2-encoding] Drafting issues
- Next by Date: Re: [Cif2-encoding] Drafting issues
- Prev by thread: Re: [Cif2-encoding] Drafting issues
- Next by thread: Re: [Cif2-encoding] Drafting issues
- Index(es):