[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Reply to: [list | sender only]
[ddlm-group] [THREAD 4] UTF8
- To: ddlm-group@iucr.org
- Subject: [ddlm-group] [THREAD 4] UTF8
- From: James Hester <jamesrhester@gmail.com>
- Date: Mon, 12 Oct 2009 18:38:13 +0300
I've started a separate thread for the UTF8 discussion. John has floated the option of delinking the file encoding from the syntax specification, so CIF1.2 files could have either ASCII or UTF8 encodings. I believe that this is unnecessary for the following reasons 1. Encoding can be automatically determined: If a given CIF1.2 file contains any bytes with values >127 then it can/should only be UTF8. 2. The fact that CIF1.2 syntax allows UTF8 encoding does not mean that any given string-valued data item could be presented in UTF8: dictionary writers are free to restrict the character set of data values. Would such dictionary-based regulation give the PDB and IUCr sufficient control over UTF8 introduction (John/Brian/Simon?). 3. An additional UTF8 encoding magic number could complicate the simple magic number scheme we currently have in place. James. -- T +61 (02) 9717 9907 F +61 (02) 9717 3145 M +61 (04) 0249 4148 _______________________________________________ ddlm-group mailing list ddlm-group@iucr.org http://scripts.iucr.org/mailman/listinfo/ddlm-group
Reply to: [list | sender only]
- Follow-Ups:
- Re: [ddlm-group] [THREAD 4] UTF8 (Nick Spadaccini)
- Prev by Date: Re: [ddlm-group] THREAD 3: The alphabet of non-delimited strings.
- Next by Date: Re: [ddlm-group] THREAD 3: The alphabet of non-delimited strings.
- Prev by thread: Re: [ddlm-group] Straw poll results
- Next by thread: Re: [ddlm-group] [THREAD 4] UTF8
- Index(es):