Discussion List Archives

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[ddlm-group] Recommended character set and use restrictions

Hello All,

The current spec excludes most ASCII control characters as well as code points U+FFFE and U+FFFF from the CIF character set, apparently following XML.  I think it would be wise to exclude also the "C1 Controls" and all the other permanent non-characters (which is also XML's recommendation, if that gives it extra authority).  Excluding the C1 control characters should be justified by the same logic that justifies excluding most of the ASCII controls.  Excluding the non-characters is appropriate because Unicode formally specifies that they have no meaning and never will have.  The additional excluded characters would be U+007F – U+009F (except possibly allowing U+0085 "next line"), U+FDD0 – U+FDEF, and all code points of the form U+xFFFE or U+xFFFF for x = any hex digit or 10.

Furthermore, I suggest that all non-printing Unicode characters be forbidden from use except in quoted data values, where "non-printing Unicode characters" includes all in Unicode general categories Cc, Cf, Zl, Zp, and Zs.  Some of these are intended to affect the formatting of characters near them, and some are spaces of various lengths and characteristics, but many of them have no visual representation at all.  They do not present a problem from an automated processing perspective, but they could cause a great deal of confusion for humans.  (For what it's worth, U+FEFF is in category Cf.)


Regards,

John
--
John C. Bollinger, Ph.D.
Department of Structural Biology
St. Jude Children's Research Hospital



Email Disclaimer:  www.stjude.org/emaildisclaimer
_______________________________________________ddlm-group mailing listddlm-group@iucr.orghttp://scripts.iucr.org/mailman/listinfo/ddlm-group

Reply to: [list | sender only]
International Union of Crystallography

Scientific Union Member of the International Council for Science (admitted 1947). Member of CODATA, the ICSU Committee on Data. Member of ICSTI, the International Council for Scientific and Technical Information. Partner with UNESCO, the United Nations Educational, Scientific and Cultural Organization in the International Year of Crystallography 2014.

ICSU Scientific Freedom Policy

The IUCr observes the basic policy of non-discrimination and affirms the right and freedom of scientists to associate in international scientific activity without regard to such factors as ethnic origin, religion, citizenship, language, political stance, gender, sex or age, in accordance with the Statutes of the International Council for Science.