[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Advice on COMCIFS policy regarding compatibility of CIF syntaxwith other domains

To: "Discussion list of the IUCr Committee for the Maintenance of the CIFStandard (COMCIFS)" <[email protected]>
Subject: Re: Advice on COMCIFS policy regarding compatibility of CIF syntaxwith other domains
From: Peter Murray-Rust <[email protected]>
Date: Fri, 4 Mar 2011 14:07:29 +0000
In-Reply-To: <[email protected]>
References: <[email protected]><[email protected]><[email protected]><[email protected]>

On Fri, Mar 4, 2011 at 11:47 AM, James Hester <[email protected]> wrote:

Thanks Peter for your comments. �While you may not be a voting member
of COMCIFS, you and other COMCIFS members fulfill an important
advisory role and I would encourage everybody to take the opportunity
to provide their perspectives.

I assume you have no particular disagreement with the principles that
you haven't commented on explicitly?

None at all - it's just that I haven't been as heavily engaged in CIF recently and so wouldn't have meaningful comments.

I've added some comments in response to your comments, inserted below:

>
> I found the original ASCII escapes difficult/tedious for some code points
> and woudl urge full unicode support (with numeric values).

I perhaps wasn't clear that we have already taken this step. �The
current CIF2 draft envisions full Unicode support using UTF-8
encoding. �Some provision has been made for allowing other encodings
in the future. �The point of the example was to show how this decision
to adopt Unicode was justifiable in terms of these principles.

It's really important to� manage encoding. I am completely supportive of UTF-8 but we don't mandate it in CML as XML can manage different encodings. The problem comes when non-conformant tools are used and this is particularly common with Microsoft tools which use CP-1252. This means that for any code points above 127 a cut-and-patse is likely to corrupt characters.

So if I have understood correctly all CIF documents MUST use UTF-8 and I'd strongly support this. It might be useful to announce this in the document (similarly to XML's <? encoding="UTF-8"?>). This is so that non-CIF tools can recognise the encoding.

It does put requirements on the toolchain. If an author receives a CIF with high codepoints, pastes bits of it into (say) Windows and re-saves there is a good chance that characters will become corrupted. Anglophones often do not realise this as they do not have diacritics and high-code points. (I applaud the removal of the separate escaped diacritic that CIF originally had).

--
Peter Murray-Rust
Reader in Molecular Informatics
Unilever Centre, Dep. Of Chemistry
University of Cambridge
CB2 1EW, UK
+44-1223-763069

Reply to: [list | sender only]

Follow-Ups:

Re: Advice on COMCIFS policy regarding compatibility of CIFsyntax with other domains (Herbert J. Bernstein)

References:

Advice on COMCIFS policy regarding compatibility of CIF syntax withother domains (James Hester)

Re: Advice on COMCIFS policy regarding compatibility of CIF syntaxwith other domains (James Hester)

Re: Advice on COMCIFS policy regarding compatibility of CIF syntaxwith other domains (Peter Murray-Rust)

Re: Advice on COMCIFS policy regarding compatibility of CIF syntaxwith other domains (James Hester)

Prev by Date: Re: Advice on COMCIFS policy regarding compatibility of CIF syntaxwith other domains

Next by Date: Re: Advice on COMCIFS policy regarding compatibility of CIFsyntax with other domains

Prev by thread: Re: Advice on COMCIFS policy regarding compatibility of CIF syntaxwith other domains

Next by thread: Re: Advice on COMCIFS policy regarding compatibility of CIFsyntax with other domains

Index(es):

Date

Thread

Discussion List Archives

Re: Advice on COMCIFS policy regarding compatibility of CIF syntaxwith other domains