[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [ddlm-group] [THREAD 4] UTF8

To: Group finalising DDLm and associated dictionaries <[email protected]>
Subject: Re: [ddlm-group] [THREAD 4] UTF8
From: Nick Spadaccini <[email protected]>
Date: Tue, 13 Oct 2009 00:14:41 +0800
Authentication-Results: postfix;
In-Reply-To: <[email protected]>




On 12/10/09 11:38 PM, "James Hester" <[email protected]> wrote:

> I've started a separate thread for the UTF8 discussion.
> 
> John has floated the option of delinking the file encoding from the
> syntax specification, so CIF1.2 files could have either ASCII or UTF8
> encodings.  I believe that this is unnecessary for the following reasons
> 
> 1. Encoding can be automatically determined: If a given CIF1.2 file
> contains any bytes with values >127 then it can/should only be UTF8.

Is it? Doesn't CBF/imgCIF or whatever have binary that is the "Hammersley"
coding algorithm?
 
> 2. The fact that CIF1.2 syntax allows UTF8 encoding does not mean that
> any given string-valued data item could be presented in UTF8:
> dictionary writers are free to restrict the character set of data
> values. Would such dictionary-based regulation give the PDB and IUCr
> sufficient control over UTF8 introduction (John/Brian/Simon?).

OK Only one-byte UTF-8 is allowed. Voila. Problem solved.

> 3. An additional UTF8 encoding magic number could complicate the
> simple magic number scheme we currently have in place.

I don't think it does. It would simplify the case for those parsers not
supporting yet UTF-8. It would tell them to terminate the process.

cheers

Nick

--------------------------------
Associate Professor N. Spadaccini, PhD
School of Computer Science & Software Engineering

The University of Western Australia    t: +61 (0)8 6488 3452
35 Stirling Highway                    f: +61 (0)8 6488 1089
CRAWLEY, Perth,  WA  6009 AUSTRALIA   w3: www.csse.uwa.edu.au/~nick
MBDP  M002

CRICOS Provider Code: 00126G

e: [email protected]





_______________________________________________
ddlm-group mailing list
[email protected]
http://scripts.iucr.org/mailman/listinfo/ddlm-group

Reply to: [list | sender only]

Follow-Ups:

Re: [ddlm-group] [THREAD 4] UTF8 (James Hester)

Re: [ddlm-group] [THREAD 4] UTF8 (SIMON WESTRIP)

References:

[ddlm-group] [THREAD 4] UTF8 (James Hester)

Prev by Date: Re: [ddlm-group] THREAD 3: The alphabet of non-delimited strings.

Next by Date: Re: [ddlm-group] [THREAD 4] UTF8

Prev by thread: [ddlm-group] [THREAD 4] UTF8

Next by thread: Re: [ddlm-group] [THREAD 4] UTF8

Index(es):

Date

Thread

Discussion List Archives

Re: [ddlm-group] [THREAD 4] UTF8