Discussion List Archives

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [ddlm-group] [THREAD 4] UTF8

"OK Only one-byte UTF-8 is allowed. Voila. Problem solved."

Please forgive me, but for the first time in my life I think I might have to type 'lol'  :-)

(Sorry if this is inappropriate - I'll try to add something constructive tomorrow.)

Cheers

Simon




From: Nick Spadaccini <nick@csse.uwa.edu.au>
To: Group finalising DDLm and associated dictionaries <ddlm-group@iucr.org>
Sent: Monday, 12 October, 2009 17:14:41
Subject: Re: [ddlm-group] [THREAD 4] UTF8




On 12/10/09 11:38 PM, "James Hester" <jamesrhester@gmail.com> wrote:

> I've started a separate thread for the UTF8 discussion.
>
> John has floated the option of delinking the file encoding from the
> syntax specification, so CIF1.2 files could have either ASCII or UTF8
> encodings.  I believe that this is unnecessary for the following reasons
>
> 1. Encoding can be automatically determined: If a given CIF1.2 file
> contains any bytes with values >127 then it can/should only be UTF8.

Is it? Doesn't CBF/imgCIF or whatever have binary that is the "Hammersley"
coding algorithm?

> 2. The fact that CIF1.2 syntax allows UTF8 encoding does not mean that
> any given string-valued data item could be presented in UTF8:
> dictionary writers are free to restrict the character set of data
> values. Would such dictionary-based regulation give the PDB and IUCr
> sufficient control over UTF8 introduction (John/Brian/Simon?).

OK Only one-byte UTF-8 is allowed. Voila. Problem solved.

> 3. An additional UTF8 encoding magic number could complicate the
> simple magic number scheme we currently have in place.

I don't think it does. It would simplify the case for those parsers not
supporting yet UTF-8. It would tell them to terminate the process.

cheers

Nick

--------------------------------
Associate Professor N. Spadaccini, PhD
School of Computer Science & Software Engineering

The University of Western Australia    t: +61 (0)8 6488 3452
35 Stirling Highway                    f: +61 (0)8 6488 1089
CRAWLEY, Perth,  WA  6009 AUSTRALIA  w3: www.csse.uwa.edu.au/~nick
MBDP  M002

CRICOS Provider Code: 00126G

e: Nick.Spadaccini@uwa.edu.au





_______________________________________________
ddlm-group mailing list
ddlm-group@iucr.org
http://scripts.iucr.org/mailman/listinfo/ddlm-group
_______________________________________________
ddlm-group mailing list
ddlm-group@iucr.org
http://scripts.iucr.org/mailman/listinfo/ddlm-group

Reply to: [list | sender only]
International Union of Crystallography

Scientific Union Member of the International Science Council (admitted 1947). Member of CODATA, the ISC Committee on Data. Partner with UNESCO, the United Nations Educational, Scientific and Cultural Organization in the International Year of Crystallography 2014.

International Science Council Scientific Freedom Policy

The IUCr observes the basic policy of non-discrimination and affirms the right and freedom of scientists to associate in international scientific activity without regard to such factors as ethnic origin, religion, citizenship, language, political stance, gender, sex or age, in accordance with the Statutes of the International Council for Science.