[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [ddlm-group] THREAD 3: The alphabet of non-delimited strings.

To: Group finalising DDLm and associated dictionaries <[email protected]>
Subject: Re: [ddlm-group] THREAD 3: The alphabet of non-delimited strings.
From: Brian McMahon <[email protected]>
Date: Sat, 10 Oct 2009 15:58:30 +0100
In-Reply-To: <[email protected]>
References: <C6F5BF24.1200E%[email protected]><[email protected]><[email protected]><[email protected]><[email protected]>

Regarding the adoption of the Unicode character set, I agree that
this would make it easier to accommodate accented and non-Latin
characters and symbols, and I see no reason to oppose implementing
it as a UTF-8 encoding, and so I vote 3.2.

(It's not a panacea, especially for maths, where new symbols can
always be invented, and one must be able to specify a two-dimensional
layout as well as just the glyphs, so we shall still need other
approaches for various types of "rich" text.)

However, this is a binary encoding, is it not, and so the underlying
STAR specification must be modified to accommodate this. (I'm afraid
I haven't got Nick's draft paper for the revised STAR specification
to hand, so I apologise if that's already been addrressed.)

Does it raise issues of endian-ness? If we are introducing binary
encodings, are there any reasons to restrict the character set
encoding to UTF-8 or should one also allow UTF-16 etc. (i) in STAR
and (ii) in CIF? And, ultimately, is there a prospect of extending
the STAR spec in a way that properly accommodates at least the CBF
implementation, and possibly other binary data incorporation?

I am happy in this case that handling by "old" CIF software can
be done by adopting a protocol that allows UTF-8 Unicode characters
to be represented by ASCII encodings such as \u27. (I don't think
that we need specify a protocol at this point, just be sure that
one can be defined if needed.)

I again draw attention to the amusing fact that with an ASCII
Unicode encoding, "O\u27Neill" is a valid data value under the
current proposals, whereas the UTF-8 equivalent would not be,
because the UTF-8 encoding of ' is just ' !

Brian
_______________________________________________
ddlm-group mailing list
[email protected]
http://scripts.iucr.org/mailman/listinfo/ddlm-group

Reply to: [list | sender only]

Follow-Ups:

Re: [ddlm-group] THREAD 3: The alphabet of non-delimited strings. (Herbert J. Bernstein)

References:

Re: [ddlm-group] THREAD 3: The alphabet of non-delimited strings. (Nick Spadaccini)

Re: [ddlm-group] THREAD 3: The alphabet of non-delimited strings. (SIMON WESTRIP)

Re: [ddlm-group] THREAD 3: The alphabet of non-delimited strings. (James Hester)

Re: [ddlm-group] THREAD 3: The alphabet of non-delimited strings. (James Hester)

Re: [ddlm-group] THREAD 3: The alphabet of non-delimited strings. (Brian McMahon)

Prev by Date: Re: [ddlm-group] THREAD 3: The alphabet of non-delimited strings.

Next by Date: Re: [ddlm-group] THREAD 3: The alphabet of non-delimited strings.

Prev by thread: Re: [ddlm-group] THREAD 3: The alphabet of non-delimited strings.

Next by thread: Re: [ddlm-group] THREAD 3: The alphabet of non-delimited strings.

Index(es):

Date

Thread

Discussion List Archives

Re: [ddlm-group] THREAD 3: The alphabet of non-delimited strings.