[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[ddlm-group] [THREAD 4] UTF8

To: [email protected]
Subject: [ddlm-group] [THREAD 4] UTF8
From: James Hester <[email protected]>
Date: Mon, 12 Oct 2009 18:38:13 +0300

I've started a separate thread for the UTF8 discussion.

John has floated the option of delinking the file encoding from the
syntax specification, so CIF1.2 files could have either ASCII or UTF8
encodings.  I believe that this is unnecessary for the following reasons

1. Encoding can be automatically determined: If a given CIF1.2 file
contains any bytes with values >127 then it can/should only be UTF8.

2. The fact that CIF1.2 syntax allows UTF8 encoding does not mean that
any given string-valued data item could be presented in UTF8:
dictionary writers are free to restrict the character set of data
values. Would such dictionary-based regulation give the PDB and IUCr
sufficient control over UTF8 introduction (John/Brian/Simon?).

3. An additional UTF8 encoding magic number could complicate the
simple magic number scheme we currently have in place.

James.

-- 
T +61 (02) 9717 9907
F +61 (02) 9717 3145
M +61 (04) 0249 4148
_______________________________________________
ddlm-group mailing list
[email protected]
http://scripts.iucr.org/mailman/listinfo/ddlm-group

Reply to: [list | sender only]

Follow-Ups:

Re: [ddlm-group] [THREAD 4] UTF8 (Nick Spadaccini)

Prev by Date: Re: [ddlm-group] THREAD 3: The alphabet of non-delimited strings.

Next by Date: Re: [ddlm-group] THREAD 3: The alphabet of non-delimited strings.

Prev by thread: Re: [ddlm-group] Straw poll results

Next by thread: Re: [ddlm-group] [THREAD 4] UTF8

Index(es):

Date

Thread

Discussion List Archives

[ddlm-group] [THREAD 4] UTF8