[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Reply to: [list | sender only]
[ddlm-group] Recommended character set and use restrictions
- To: Group finalising DDLm and associated dictionaries <ddlm-group@iucr.org>
- Subject: [ddlm-group] Recommended character set and use restrictions
- From: "Bollinger, John C" <John.Bollinger@STJUDE.ORG>
- Date: Fri, 18 Jun 2010 16:05:50 -0500
- Accept-Language: en-US
- acceptlanguage: en-US
- In-Reply-To: <8F77913624F7524AACD2A92EAF3BFA541661229518@SJMEMXMBS11.stjude.sjcrh.local>
- References: <AANLkTikPRP0zLmeWCde-UjR599qJBDP4ps8mpT2FB07E@mail.gmail.com><8F77913624F7524AACD2A92EAF3BFA541661229515@SJMEMXMBS11.stjude.sjcrh.local><84803.69690.qm@web87001.mail.ird.yahoo.com><8F77913624F7524AACD2A92EAF3BFA541661229518@SJMEMXMBS11.stjude.sjcrh.local>
Hello All, The current spec excludes most ASCII control characters as well as code points U+FFFE and U+FFFF from the CIF character set, apparently following XML. I think it would be wise to exclude also the "C1 Controls" and all the other permanent non-characters (which is also XML's recommendation, if that gives it extra authority). Excluding the C1 control characters should be justified by the same logic that justifies excluding most of the ASCII controls. Excluding the non-characters is appropriate because Unicode formally specifies that they have no meaning and never will have. The additional excluded characters would be U+007F – U+009F (except possibly allowing U+0085 "next line"), U+FDD0 – U+FDEF, and all code points of the form U+xFFFE or U+xFFFF for x = any hex digit or 10. Furthermore, I suggest that all non-printing Unicode characters be forbidden from use except in quoted data values, where "non-printing Unicode characters" includes all in Unicode general categories Cc, Cf, Zl, Zp, and Zs. Some of these are intended to affect the formatting of characters near them, and some are spaces of various lengths and characteristics, but many of them have no visual representation at all. They do not present a problem from an automated processing perspective, but they could cause a great deal of confusion for humans. (For what it's worth, U+FEFF is in category Cf.) Regards, John -- John C. Bollinger, Ph.D. Department of Structural Biology St. Jude Children's Research Hospital Email Disclaimer: www.stjude.org/emaildisclaimer _______________________________________________ddlm-group mailing listddlm-group@iucr.orghttp://scripts.iucr.org/mailman/listinfo/ddlm-group
Reply to: [list | sender only]
- Follow-Ups:
- Re: [ddlm-group] Recommended character set and use restrictions (Herbert J. Bernstein)
- References:
- [ddlm-group] Vote on BOM (James Hester)
- Re: [ddlm-group] Vote on BOM (Bollinger, John C)
- Re: [ddlm-group] Vote on BOM (SIMON WESTRIP)
- [ddlm-group] Character set for data block and save frame codes (Bollinger, John C)
- Prev by Date: [ddlm-group] Character set for data block and save frame codes
- Next by Date: Re: [ddlm-group] Recommended character set and use restrictions
- Prev by thread: [ddlm-group] Character set for data block and save frame codes
- Next by thread: Re: [ddlm-group] Recommended character set and use restrictions
- Index(es):