[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[ddlm-group] Recommended character set and use restrictions

To: Group finalising DDLm and associated dictionaries <[email protected]>
Subject: [ddlm-group] Recommended character set and use restrictions
From: "Bollinger, John C" <[email protected]>
Date: Fri, 18 Jun 2010 16:05:50 -0500
Accept-Language: en-US
acceptlanguage: en-US
In-Reply-To: <8F77913624F7524AACD2A92EAF3BFA541661229518@SJMEMXMBS11.stjude.sjcrh.local>
References: <[email protected]><8F77913624F7524AACD2A92EAF3BFA541661229515@SJMEMXMBS11.stjude.sjcrh.local><[email protected]><8F77913624F7524AACD2A92EAF3BFA541661229518@SJMEMXMBS11.stjude.sjcrh.local>

Hello All,

The current spec excludes most ASCII control characters as well as code points U+FFFE and U+FFFF from the CIF character set, apparently following XML. I think it would be wise to exclude also the "C1 Controls" and all the other permanent non-characters (which is also XML's recommendation, if that gives it extra authority). Excluding the C1 control characters should be justified by the same logic that justifies excluding most of the ASCII controls. Excluding the non-characters is appropriate because Unicode formally specifies that they have no meaning and never will have. The additional excluded characters would be U+007F – U+009F (except possibly allowing U+0085 "next line"), U+FDD0 – U+FDEF, and all code points of the form U+xFFFE or U+xFFFF for x = any hex digit or 10.

Furthermore, I suggest that all non-printing Unicode characters be forbidden from use except in quoted data values, where "non-printing Unicode characters" includes all in Unicode general categories Cc, Cf, Zl, Zp, and Zs. Some of these are intended to affect the formatting of characters near them, and some are spaces of various lengths and characteristics, but many of them have no visual representation at all. They do not present a problem from an automated processing perspective, but they could cause a great deal of confusion for humans. (For what it's worth, U+FEFF is in category Cf.)

Regards,

John
--
John C. Bollinger, Ph.D.
Department of Structural Biology
St. Jude Children's Research Hospital

Email Disclaimer: www.stjude.org/emaildisclaimer
_______________________________________________ddlm-group mailing [email protected]http://scripts.iucr.org/mailman/listinfo/ddlm-group

Reply to: [list | sender only]

Follow-Ups:

Re: [ddlm-group] Recommended character set and use restrictions (Herbert J. Bernstein)

References:

[ddlm-group] Vote on BOM (James Hester)

Re: [ddlm-group] Vote on BOM (Bollinger, John C)

Re: [ddlm-group] Vote on BOM (SIMON WESTRIP)

[ddlm-group] Character set for data block and save frame codes (Bollinger, John C)

Prev by Date: [ddlm-group] Character set for data block and save frame codes

Next by Date: Re: [ddlm-group] Recommended character set and use restrictions

Prev by thread: [ddlm-group] Character set for data block and save frame codes

Next by thread: Re: [ddlm-group] Recommended character set and use restrictions

Index(es):

Date

Thread

Discussion List Archives

[ddlm-group] Recommended character set and use restrictions