[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Reply to: [list | sender only]
Re: [ddlm-group] Recommended character set and use restrictions
- To: Group finalising DDLm and associated dictionaries <ddlm-group@iucr.org>
- Subject: Re: [ddlm-group] Recommended character set and use restrictions
- From: "Herbert J. Bernstein" <yaya@bernstein-plus-sons.com>
- Date: Fri, 18 Jun 2010 18:07:44 -0400
- In-Reply-To: <8F77913624F7524AACD2A92EAF3BFA541661229519@SJMEMXMBS11.stjude.sjcrh.local>
- References: <AANLkTikPRP0zLmeWCde-UjR599qJBDP4ps8mpT2FB07E@mail.gmail.com><8F77913624F7524AACD2A92EAF3BFA541661229515@SJMEMXMBS11.stjude.sjcrh.local> <84803.69690.qm@web87001.mail.ird.yahoo.com><8F77913624F7524AACD2A92EAF3BFA541661229518@SJMEMXMBS11.stjude.sjcrh.local><8F77913624F7524AACD2A92EAF3BFA541661229519@SJMEMXMBS11.stjude.sjcrh.local>
The current specification in in terms of what is included, rather than what is excluded: #x9 #xA #xD #x20 - #xD7FF #xE000 - #xFFFD #x10000 - #x10FFF The characters #xE000-#xF8FF are reserved for private use, and the IUCr can specify what these characters must be. I think John is proposing that the included set become: #x9 #xA #xD #x20 - #x7E #xA0 - #xD7FF #xE000 - #xFDCF #xFDF0 - #xFFFD #x10000 - #x10FFD In addition, he is proposing to exclude the general class of "non-printing Unicode characters" from tag names. For many of these he is right that they would not be missed, but some of them are essential to correct rendering of some languages (e.g. Arabic in which the joining or non-joining of characters is essential to proper rendering). I would suggest a careful review of the non-printing characters one by one before making a firm decision on which ones to exclude from tag names, but the general idea that a tag should be made of of characters that either print or which make a clear change in the way in which the characters print seems sensible to me. -- Herbert At 4:05 PM -0500 6/18/10, Bollinger, John C wrote: >Hello All, > >The current spec excludes most ASCII control characters as well as >code points U+FFFE and U+FFFF from the CIF character set, apparently >following XML. I think it would be wise to exclude also the "C1 >Controls" and all the other permanent non-characters (which is also >XML's recommendation, if that gives it extra authority). Excluding >the C1 control characters should be justified by the same logic that >justifies excluding most of the ASCII controls. Excluding the >non-characters is appropriate because Unicode formally specifies >that they have no meaning and never will have. The additional >excluded characters would be U+007F - U+009F (except possibly >allowing U+0085 "next line"), U+FDD0 - U+FDEF, and all code points >of the form U+xFFFE or U+xFFFF for x = any hex digit or 10. > >Furthermore, I suggest that all non-printing Unicode characters be >forbidden from use except in quoted data values, where "non-printing >Unicode characters" includes all in Unicode general categories Cc, >Cf, Zl, Zp, and Zs. Some of these are intended to affect the >formatting of characters near them, and some are spaces of various >lengths and characteristics, but many of them have no visual >representation at all. They do not present a problem from an >automated processing perspective, but they could cause a great deal >of confusion for humans. (For what it's worth, U+FEFF is in >category Cf.) > > >Regards, > >John >-- >John C. Bollinger, Ph.D. >Department of Structural Biology >St. Jude Children's Research Hospital > > > >Email Disclaimer: www.stjude.org/emaildisclaimer >_______________________________________________ >ddlm-group mailing list >ddlm-group@iucr.org >http://scripts.iucr.org/mailman/listinfo/ddlm-group -- ===================================================== Herbert J. Bernstein, Professor of Computer Science Dowling College, Kramer Science Center, KSC 121 Idle Hour Blvd, Oakdale, NY, 11769 +1-631-244-3035 yaya@dowling.edu ===================================================== _______________________________________________ ddlm-group mailing list ddlm-group@iucr.org http://scripts.iucr.org/mailman/listinfo/ddlm-group
Reply to: [list | sender only]
- Follow-Ups:
- Re: [ddlm-group] Recommended character set and use restrictions. . (Bollinger, John C)
- References:
- [ddlm-group] Vote on BOM (James Hester)
- Re: [ddlm-group] Vote on BOM (Bollinger, John C)
- Re: [ddlm-group] Vote on BOM (SIMON WESTRIP)
- [ddlm-group] Character set for data block and save frame codes (Bollinger, John C)
- [ddlm-group] Recommended character set and use restrictions (Bollinger, John C)
- Prev by Date: [ddlm-group] Recommended character set and use restrictions
- Next by Date: Re: [ddlm-group] Vote on BOM
- Prev by thread: [ddlm-group] Recommended character set and use restrictions
- Next by thread: Re: [ddlm-group] Recommended character set and use restrictions. .
- Index(es):