[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Reply to: [list | sender only]
Re: [ddlm-group] UTF-8 BOM
- To: Group finalising DDLm and associated dictionaries <ddlm-group@iucr.org>
- Subject: Re: [ddlm-group] UTF-8 BOM
- From: "Bollinger, John C" <John.Bollinger@STJUDE.ORG>
- Date: Thu, 13 May 2010 14:03:27 -0500
- Accept-Language: en-US
- acceptlanguage: en-US
- In-Reply-To: <alpine.BSF.2.00.1005131228500.12350@epsilon.pair.com>
- References: <8F77913624F7524AACD2A92EAF3BFA54165DF337D5@SJMEMXMBS11.stjude.sjcrh.local><alpine.BSF.2.00.1005101301340.99142@epsilon.pair.com><8F77913624F7524AACD2A92EAF3BFA54165DF337D9@SJMEMXMBS11.stjude.sjcrh.local><alpine.BSF.2.00.1005111250250.60002@epsilon.pair.com><4BEB2CE6.3060900@niehs.nih.gov><8F77913624F7524AACD2A92EAF3BFA54165DF337DB@SJMEMXMBS11.stjude.sjcrh.local><alpine.BSF.2.00.1005131228500.12350@epsilon.pair.com>
> People make CIF out of pieces joined by cat or editors all the >time. We cannot tell them that thay can only make CIF2s out using >a short list of applications, nor can we tell them that they >cannot pick up material from old CIF1s. I think we will be able to tell people that the limitations on combining CIF2 fragments are about the same as those on combining CIF1 fragments. Whatever decision is made about embedded BOMs, however, there will be additional BOM-related considerations for CIF2 because Unicode-aware text tools do not all treat BOMs the same way. On the other hand, whether we tell people or not, there is no escaping the fact that there are more limitations on combining CIF2 fragments with CIF1 fragments than there are on combining only CIF1 fragments, quite apart from any question of BOM handling. That was one of the costs of abandoning 100% backwards compatibility. > In most cases, if we >treat the BOMs reasonably, the concatenated CIFs will make sense >and probably sense that the user intended. It is true that most users, for most purposes, will be able to ignore CIF syntax versions and proceed largely as they have been accustomed to doing. Some others will be able to adjust by making one-time changes to a few boilerplate CIF fragments. But even with no BOMs, blind concatenation of a well-formed CIF1 file with a well-formed CIF2 file is not certain to produce a CIF compliant with either specification, and whether it does can depend on the order in which the component files are concatenated. Similarly, with CIF1 in use alongside CIF2, there will be more cases where cutting and pasting of fragments from one well-formed CIF into another will result in an ill-formed CIF, again without any consideration of BOMs. Indeed, we face the worst possible case in that the same kinds of things that users have done before are likely to continue to work most of the time, but they will fail some of the time. That means that errors are more likely to creep into CIFs, and bugs are more likely to appear in software, than if CIF2 made a clean break with CIF1 or if CIF2 maintained full backwards compatibility. I daresay neither of those alternatives is attractive to this group, especially at this point, so we have what we have: some things people are used to doing with CIFs are no longer reliably safe, whether anybody likes it or not. > I see no immediate harm in treating an embedded BOM as >whitespace, but also no specific need to do so. The main thing >is not to treat it as a printing characters and not to completely >ignore it -- it can be a tip off to a serious problem. In other words, almost anything other than what's currently in the spec. I'm OK with treating it as a printing character (ala the current spec), though that is my least preferred alternative. Doing so is probably the worst choice for compatibility with the kinds of manipulations we're discussing, however. If you don't treat an embedded BOM as a printing character or as whitespace, and you don't ignore it (which I agree we should not do), then does that leave any alternative other than to account it an error? Cheers, John > Regards, > Herbert > >===================================================== > Herbert J. Bernstein, Professor of Computer Science > Dowling College, Kramer Science Center, KSC 121 > Idle Hour Blvd, Oakdale, NY, 11769 > > +1-631-244-3035 > yaya@dowling.edu >===================================================== Email Disclaimer: www.stjude.org/emaildisclaimer _______________________________________________ ddlm-group mailing list ddlm-group@iucr.org http://scripts.iucr.org/mailman/listinfo/ddlm-group
Reply to: [list | sender only]
- Follow-Ups:
- Re: [ddlm-group] UTF-8 BOM (James Hester)
- References:
- [ddlm-group] UTF-8 BOM (Bollinger, John C)
- Re: [ddlm-group] UTF-8 BOM (Herbert J. Bernstein)
- Re: [ddlm-group] UTF-8 BOM (Bollinger, John C)
- Re: [ddlm-group] UTF-8 BOM (Herbert J. Bernstein)
- Re: [ddlm-group] UTF-8 BOM (Joe Krahn)
- Re: [ddlm-group] UTF-8 BOM (Bollinger, John C)
- Re: [ddlm-group] UTF-8 BOM (Herbert J. Bernstein)
- Prev by Date: Re: [ddlm-group] UTF-8 BOM
- Next by Date: Re: [ddlm-group] UTF-8 BOM
- Prev by thread: Re: [ddlm-group] UTF-8 BOM
- Next by thread: Re: [ddlm-group] UTF-8 BOM
- Index(es):