Discussion List Archives

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[ddlm-group] [THREAD 4] UTF8

I've started a separate thread for the UTF8 discussion.

John has floated the option of delinking the file encoding from the
syntax specification, so CIF1.2 files could have either ASCII or UTF8
encodings.  I believe that this is unnecessary for the following reasons

1. Encoding can be automatically determined: If a given CIF1.2 file
contains any bytes with values >127 then it can/should only be UTF8.

2. The fact that CIF1.2 syntax allows UTF8 encoding does not mean that
any given string-valued data item could be presented in UTF8:
dictionary writers are free to restrict the character set of data
values. Would such dictionary-based regulation give the PDB and IUCr
sufficient control over UTF8 introduction (John/Brian/Simon?).

3. An additional UTF8 encoding magic number could complicate the
simple magic number scheme we currently have in place.


T +61 (02) 9717 9907
F +61 (02) 9717 3145
M +61 (04) 0249 4148
ddlm-group mailing list

Reply to: [list | sender only]
International Union of Crystallography

Scientific Union Member of the International Science Council (admitted 1947). Member of CODATA, the ISC Committee on Data. Partner with UNESCO, the United Nations Educational, Scientific and Cultural Organization in the International Year of Crystallography 2014.

International Science Council Scientific Freedom Policy

The IUCr observes the basic policy of non-discrimination and affirms the right and freedom of scientists to associate in international scientific activity without regard to such factors as ethnic origin, religion, citizenship, language, political stance, gender, sex or age, in accordance with the Statutes of the International Council for Science.