Discussion List Archives

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [ddlm-group] Recommended character set and use restrictions

I agree with John's suggestions, and Herbert makes a good point that
we may be unduly restricting the languages in which tags can be
written by excluding non-printing characters.  May I suggest that the
precise list of excluded non-printing characters is given as an
addendum to the main CIF2 syntax standard, allowing us to proceed with
ratification of the main standard?

So who is volunteering to come up with a list of non-printing
characters to be allowed/excluded from tags?

On Sat, Jun 19, 2010 at 8:07 AM, Herbert J. Bernstein
<yaya@bernstein-plus-sons.com> wrote:
> The current specification in in terms of what is included, rather than
> what is excluded:
>
> #x9 #xA #xD
> #x20 - #xD7FF
> #xE000 - #xFFFD
> #x10000 - #x10FFF
> The characters #xE000-#xF8FF are reserved for private use, and the
> IUCr can specify
> what these characters must be.
>
> I think John is proposing that the included set become:
>
>
> #x9 #xA #xD
> #x20 - #x7E
> #xA0 - #xD7FF
> #xE000 - #xFDCF
> #xFDF0 - #xFFFD
> #x10000 - #x10FFD
>
> In addition, he is proposing to exclude the general class of "non-printing
> Unicode characters" from tag names. For many of these he is right
> that they would not be missed, but some of them are essential to
> correct rendering
> of some languages (e.g. Arabic in which the joining or non-joining
> of characters is essential to proper rendering).  I would suggest
> a careful review of the non-printing characters one by one before
> making a firm decision on which ones to exclude from tag names, but
> the general idea that a tag should be made of of characters that
> either print or which make a clear change in the way in which the
> characters print seems sensible to me.
>
>   -- Herbert
>
>
>
> At 4:05 PM -0500 6/18/10, Bollinger, John C wrote:
>>Hello All,
>>
>>The current spec excludes most ASCII control characters as well as
>>code points U+FFFE and U+FFFF from the CIF character set, apparently
>>following XML.  I think it would be wise to exclude also the "C1
>>Controls" and all the other permanent non-characters (which is also
>>XML's recommendation, if that gives it extra authority).  Excluding
>>the C1 control characters should be justified by the same logic that
>>justifies excluding most of the ASCII controls.  Excluding the
>>non-characters is appropriate because Unicode formally specifies
>>that they have no meaning and never will have.  The additional
>>excluded characters would be U+007F - U+009F (except possibly
>>allowing U+0085 "next line"), U+FDD0 - U+FDEF, and all code points
>>of the form U+xFFFE or U+xFFFF for x = any hex digit or 10.
>>
>>Furthermore, I suggest that all non-printing Unicode characters be
>>forbidden from use except in quoted data values, where "non-printing
>>Unicode characters" includes all in Unicode general categories Cc,
>>Cf, Zl, Zp, and Zs.  Some of these are intended to affect the
>>formatting of characters near them, and some are spaces of various
>>lengths and characteristics, but many of them have no visual
>>representation at all.  They do not present a problem from an
>>automated processing perspective, but they could cause a great deal
>>of confusion for humans.  (For what it's worth, U+FEFF is in
>>category Cf.)
>>
>>
>>Regards,
>>
>>John
>>--
>>John C. Bollinger, Ph.D.
>>Department of Structural Biology
>>St. Jude Children's Research Hospital
>>
>>
>>
>>Email Disclaimer:  www.stjude.org/emaildisclaimer
>>_______________________________________________
>>ddlm-group mailing list
>>ddlm-group@iucr.org
>>http://scripts.iucr.org/mailman/listinfo/ddlm-group
>
>
> --
> =====================================================
>  Herbert J. Bernstein, Professor of Computer Science
>    Dowling College, Kramer Science Center, KSC 121
>         Idle Hour Blvd, Oakdale, NY, 11769
>
>                  +1-631-244-3035
>                  yaya@dowling.edu
> =====================================================
> _______________________________________________
> ddlm-group mailing list
> ddlm-group@iucr.org
> http://scripts.iucr.org/mailman/listinfo/ddlm-group
>



-- 
T +61 (02) 9717 9907
F +61 (02) 9717 3145
M +61 (04) 0249 4148
_______________________________________________
ddlm-group mailing list
ddlm-group@iucr.org
http://scripts.iucr.org/mailman/listinfo/ddlm-group


Reply to: [list | sender only]
International Union of Crystallography

Scientific Union Member of the International Science Council (admitted 1947). Member of CODATA, the ISC Committee on Data. Partner with UNESCO, the United Nations Educational, Scientific and Cultural Organization in the International Year of Crystallography 2014.

International Science Council Scientific Freedom Policy

The IUCr observes the basic policy of non-discrimination and affirms the right and freedom of scientists to associate in international scientific activity without regard to such factors as ethnic origin, religion, citizenship, language, political stance, gender, sex or age, in accordance with the Statutes of the International Council for Science.