[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Reply to: [list | sender only]
Re: [ddlm-group] [SPAM] ASSP UTF-8 BOM
- To: Group finalising DDLm and associated dictionaries <ddlm-group@iucr.org>
- Subject: Re: [ddlm-group] UTF-8 BOM
- From: "Bollinger, John C" <John.Bollinger@STJUDE.ORG>
- Date: Mon, 14 Jun 2010 16:58:25 -0500
- Accept-Language: en-US
- acceptlanguage: en-US
- In-Reply-To: <AANLkTimOLbOkIqCwqgsKJ36eVctlZccsAN4XAjYDr4Qd@mail.gmail.com>
- References: <8F77913624F7524AACD2A92EAF3BFA54165DF337D5@SJMEMXMBS11.stjude.sjcrh.local><alpine.BSF.2.00.1005111250250.60002@epsilon.pair.com><4BEB2CE6.3060900@niehs.nih.gov><8F77913624F7524AACD2A92EAF3BFA54165DF337DB@SJMEMXMBS11.stjude.sjcrh.local><alpine.BSF.2.00.1005131228500.12350@epsilon.pair.com><8F77913624F7524AACD2A92EAF3BFA54165DF337DD@SJMEMXMBS11.stjude.sjcrh.local><AANLkTimlen0jl2p5SsvvizSNN37HZmMs2XOCc0KW7RMG@mail.gmail.com><alpine.BSF.2.00.1005180700530.27091@epsilon.pair.com><8F77913624F7524AACD2A92EAF3BFA54165DF337E1@SJMEMXMBS11.stjude.sjcrh.local><alpine.BSF.2.00.1005181330210.38662@epsilon.pair.com><AANLkTimOLbOkIqCwqgsKJ36eVctlZccsAN4XAjYDr4Qd@mail.gmail.com>
Dear Colleagues, Brian got me thinking about this again: On Monday, May 24, 2010 1:27 AM, James Hester wrote: >To run through the alternatives and some of the arguments so far: > >(i) treating an embedded BOM as an ordinary character runs against the >Unicode recommendations. If we wish our standard to be respected, I think >we should at least respect other standards and the thinking that has gone >into them > >(ii) treating an embedded BOM as whitespace is OK with the Unicode >standard, but means that a non-ASCII character now has syntactic meaning >in the CIF. I think this would be completely inconsistent on our part, >as an invisible character (when displayed) can actually be used to >delimit strings. This is my least preferred solution, as it goes >against the human-readability expected of CIFs. > >(iii) ignoring embedded BOMs is bad because they can be a 'tip off to a serious problem'. > >(iv) treating embedded BOMs as syntax errors will cause issues when CIF2 files are naively concatenated > >I think the only viable alternatives are to choose (iii) or (iv). I initially passed over it, but I now think the argument against (i) is flawed. Unicode recommends that embedded U+FEFF, if allowed, be treated as a zero-width non-breaking space (which is its original documented function). One might equivalently say that it should be treated the same as U+2060, its designated replacement for that role. But as far as CIF is concerned, U+2060 has no special significance whatever, therefore it is as ordinary as ordinary can be. Treating U+FEFF as an ordinary (i.e. having no special significance to CIF) character is therefore perfectly consistent with Unicode recommendations. As I have already written, I am strongly opposed to both (iii) and (iv) if they apply to U+FEFF appearing in data values. Inasmuch as it could be ambiguous whether some appearances of U+FEFF are in data values, I don't think either of these options is a good choice. Furthermore, the argument I just rejected against (i) is in fact valid against (iii): if embedded U+FEFF is allowed, then it should be treated as a ZWNBSP (with or without any special significance to CIF), not ignored. I rather like (ii), but I would be satisfied with (i). ------ Also, is human readability, such as James cites against option (ii), really a significant concern to this group? I have a at least two issues in that area, but I had not planned to raise them because of the apparent hope and perception that CIF2 is largely done. John -- John C. Bollinger, Ph.D. Department of Structural Biology St. Jude Children's Research Hospital Email Disclaimer: www.stjude.org/emaildisclaimer _______________________________________________ ddlm-group mailing list ddlm-group@iucr.org http://scripts.iucr.org/mailman/listinfo/ddlm-group
Reply to: [list | sender only]
- Follow-Ups:
- Re: [ddlm-group] [SPAM] ASSP UTF-8 BOM (James Hester)
- References:
- [ddlm-group] [SPAM] ASSP UTF-8 BOM (Bollinger, John C)
- Re: [ddlm-group] [SPAM] ASSP UTF-8 BOM (Herbert J. Bernstein)
- Re: [ddlm-group] [SPAM] ASSP UTF-8 BOM (Joe Krahn)
- Re: [ddlm-group] [SPAM] ASSP UTF-8 BOM (Bollinger, John C)
- Re: [ddlm-group] [SPAM] ASSP UTF-8 BOM (Herbert J. Bernstein)
- Re: [ddlm-group] [SPAM] ASSP UTF-8 BOM (Bollinger, John C)
- Re: [ddlm-group] [SPAM] ASSP UTF-8 BOM (James Hester)
- Re: [ddlm-group] [SPAM] ASSP UTF-8 BOM (Herbert J. Bernstein)
- Re: [ddlm-group] [SPAM] ASSP UTF-8 BOM (Bollinger, John C)
- Re: [ddlm-group] [SPAM] ASSP UTF-8 BOM (Herbert J. Bernstein)
- Re: [ddlm-group] [SPAM] ASSP UTF-8 BOM (James Hester)
- Prev by Date: Re: [ddlm-group] UTF-8 BOM
- Next by Date: Re: [ddlm-group] [SPAM] ASSP UTF-8 BOM
- Prev by thread: Re: [ddlm-group] UTF-8 BOM
- Next by thread: Re: [ddlm-group] [SPAM] ASSP UTF-8 BOM
- Index(es):