Discussion List Archives

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [ddlm-group] THREAD 3: The alphabet of non-delimited strings.

Regarding the adoption of the Unicode character set, I agree that
this would make it easier to accommodate accented and non-Latin
characters and symbols, and I see no reason to oppose implementing
it as a UTF-8 encoding, and so I vote 3.2.

(It's not a panacea, especially for maths, where new symbols can
always be invented, and one must be able to specify a two-dimensional
layout as well as just the glyphs, so we shall still need other
approaches for various types of "rich" text.)

However, this is a binary encoding, is it not, and so the underlying
STAR specification must be modified to accommodate this. (I'm afraid
I haven't got Nick's draft paper for the revised STAR specification
to hand, so I apologise if that's already been addrressed.)

Does it raise issues of endian-ness? If we are introducing binary
encodings, are there any reasons to restrict the character set
encoding to UTF-8 or should one also allow UTF-16 etc. (i) in STAR
and (ii) in CIF? And, ultimately, is there a prospect of extending
the STAR spec in a way that properly accommodates at least the CBF
implementation, and possibly other binary data incorporation?

I am happy in this case that handling by "old" CIF software can
be done by adopting a protocol that allows UTF-8 Unicode characters
to be represented by ASCII encodings such as \u27. (I don't think
that we need specify a protocol at this point, just be sure that
one can be defined if needed.)

I again draw attention to the amusing fact that with an ASCII
Unicode encoding, "O\u27Neill" is a valid data value under the
current proposals, whereas the UTF-8 equivalent would not be,
because the UTF-8 encoding of ' is just ' !

ddlm-group mailing list

Reply to: [list | sender only]
International Union of Crystallography

Scientific Union Member of the International Science Council (admitted 1947). Member of CODATA, the ISC Committee on Data. Partner with UNESCO, the United Nations Educational, Scientific and Cultural Organization in the International Year of Crystallography 2014.

International Science Council Scientific Freedom Policy

The IUCr observes the basic policy of non-discrimination and affirms the right and freedom of scientists to associate in international scientific activity without regard to such factors as ethnic origin, religion, citizenship, language, political stance, gender, sex or age, in accordance with the Statutes of the International Council for Science.