[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Reply to: [list | sender only]
Re: [ddlm-group] THREAD 3: The alphabet of non-delimited strings.
- To: Group finalising DDLm and associated dictionaries <ddlm-group@iucr.org>
- Subject: Re: [ddlm-group] THREAD 3: The alphabet of non-delimited strings.
- From: Brian McMahon <bm@iucr.org>
- Date: Sat, 10 Oct 2009 15:58:30 +0100
- In-Reply-To: <20091010125924.GA7536@emerald.iucr.org>
- References: <C6F5BF24.1200E%nick@csse.uwa.edu.au><645410.77656.qm@web87015.mail.ird.yahoo.com><279aad2a0910100249o2c09897anb767ab28b06cbdcf@mail.gmail.com><279aad2a0910100513u1e9ef18dua5f984cc20ac9a9b@mail.gmail.com><20091010125924.GA7536@emerald.iucr.org>
Regarding the adoption of the Unicode character set, I agree that this would make it easier to accommodate accented and non-Latin characters and symbols, and I see no reason to oppose implementing it as a UTF-8 encoding, and so I vote 3.2. (It's not a panacea, especially for maths, where new symbols can always be invented, and one must be able to specify a two-dimensional layout as well as just the glyphs, so we shall still need other approaches for various types of "rich" text.) However, this is a binary encoding, is it not, and so the underlying STAR specification must be modified to accommodate this. (I'm afraid I haven't got Nick's draft paper for the revised STAR specification to hand, so I apologise if that's already been addrressed.) Does it raise issues of endian-ness? If we are introducing binary encodings, are there any reasons to restrict the character set encoding to UTF-8 or should one also allow UTF-16 etc. (i) in STAR and (ii) in CIF? And, ultimately, is there a prospect of extending the STAR spec in a way that properly accommodates at least the CBF implementation, and possibly other binary data incorporation? I am happy in this case that handling by "old" CIF software can be done by adopting a protocol that allows UTF-8 Unicode characters to be represented by ASCII encodings such as \u27. (I don't think that we need specify a protocol at this point, just be sure that one can be defined if needed.) I again draw attention to the amusing fact that with an ASCII Unicode encoding, "O\u27Neill" is a valid data value under the current proposals, whereas the UTF-8 equivalent would not be, because the UTF-8 encoding of ' is just ' ! Brian _______________________________________________ ddlm-group mailing list ddlm-group@iucr.org http://scripts.iucr.org/mailman/listinfo/ddlm-group
Reply to: [list | sender only]
- Follow-Ups:
- Re: [ddlm-group] THREAD 3: The alphabet of non-delimited strings. (Herbert J. Bernstein)
- References:
- Re: [ddlm-group] THREAD 3: The alphabet of non-delimited strings. (Nick Spadaccini)
- Re: [ddlm-group] THREAD 3: The alphabet of non-delimited strings. (SIMON WESTRIP)
- Re: [ddlm-group] THREAD 3: The alphabet of non-delimited strings. (James Hester)
- Re: [ddlm-group] THREAD 3: The alphabet of non-delimited strings. (James Hester)
- Re: [ddlm-group] THREAD 3: The alphabet of non-delimited strings. (Brian McMahon)
- Prev by Date: Re: [ddlm-group] THREAD 3: The alphabet of non-delimited strings.
- Next by Date: Re: [ddlm-group] THREAD 3: The alphabet of non-delimited strings.
- Prev by thread: Re: [ddlm-group] THREAD 3: The alphabet of non-delimited strings.
- Next by thread: Re: [ddlm-group] THREAD 3: The alphabet of non-delimited strings.
- Index(es):