[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
--
Peter Murray-Rust
Reader in Molecular Informatics
Unilever Centre, Dep. Of Chemistry
University of Cambridge
CB2 1EW, UK
+44-1223-763069
Reply to: [list | sender only]
Re: Advice on COMCIFS policy regarding compatibility of CIF syntaxwith other domains
- To: "Discussion list of the IUCr Committee for the Maintenance of the CIFStandard (COMCIFS)" <[email protected]>
- Subject: Re: Advice on COMCIFS policy regarding compatibility of CIF syntaxwith other domains
- From: Peter Murray-Rust <[email protected]>
- Date: Fri, 4 Mar 2011 14:07:29 +0000
- In-Reply-To: <[email protected]>
- References: <[email protected]><[email protected]><[email protected]><[email protected]>
On Fri, Mar 4, 2011 at 11:47 AM, James Hester <[email protected]> wrote:
None at all - it's just that I haven't been as heavily engaged in CIF recently and so wouldn't have meaningful comments.
It's really important to� manage encoding. I am completely supportive of UTF-8 but we don't mandate it in CML as XML can manage different encodings. The problem comes when non-conformant tools are used and this is particularly common with Microsoft tools which use CP-1252. This means that for any code points above 127 a cut-and-patse is likely to corrupt characters.
So if I have understood correctly all CIF documents MUST use UTF-8 and I'd strongly support this. It might be useful to announce this in the document (similarly to XML's <? encoding="UTF-8"?>). This is so that non-CIF tools can recognise the encoding.
It does put requirements on the toolchain. If an author receives a CIF with high codepoints, pastes bits of it into (say) Windows and re-saves there is a good chance that characters will become corrupted. Anglophones often do not realise this as they do not have diacritics and high-code points. (I applaud the removal of the separate escaped diacritic that CIF originally had).
P.
Thanks Peter for your comments. �While you may not be a voting member
of COMCIFS, you and other COMCIFS members fulfill an important
advisory role and I would encourage everybody to take the opportunity
to provide their perspectives.
I assume you have no particular disagreement with the principles that
you haven't commented on explicitly?
None at all - it's just that I haven't been as heavily engaged in CIF recently and so wouldn't have meaningful comments.
I've added some comments in response to your comments, inserted below:
>I perhaps wasn't clear that we have already taken this step. �The
> I found the original ASCII escapes difficult/tedious for some code points
> and woudl urge full unicode support (with numeric values).
current CIF2 draft envisions full Unicode support using UTF-8
encoding. �Some provision has been made for allowing other encodings
in the future. �The point of the example was to show how this decision
to adopt Unicode was justifiable in terms of these principles.
It's really important to� manage encoding. I am completely supportive of UTF-8 but we don't mandate it in CML as XML can manage different encodings. The problem comes when non-conformant tools are used and this is particularly common with Microsoft tools which use CP-1252. This means that for any code points above 127 a cut-and-patse is likely to corrupt characters.
So if I have understood correctly all CIF documents MUST use UTF-8 and I'd strongly support this. It might be useful to announce this in the document (similarly to XML's <? encoding="UTF-8"?>). This is so that non-CIF tools can recognise the encoding.
It does put requirements on the toolchain. If an author receives a CIF with high codepoints, pastes bits of it into (say) Windows and re-saves there is a good chance that characters will become corrupted. Anglophones often do not realise this as they do not have diacritics and high-code points. (I applaud the removal of the separate escaped diacritic that CIF originally had).
--
Peter Murray-Rust
Reader in Molecular Informatics
Unilever Centre, Dep. Of Chemistry
University of Cambridge
CB2 1EW, UK
+44-1223-763069
Reply to: [list | sender only]
- Follow-Ups:
- Re: Advice on COMCIFS policy regarding compatibility of CIFsyntax with other domains (Herbert J. Bernstein)
- References:
- Advice on COMCIFS policy regarding compatibility of CIF syntax withother domains (James Hester)
- Re: Advice on COMCIFS policy regarding compatibility of CIF syntaxwith other domains (James Hester)
- Re: Advice on COMCIFS policy regarding compatibility of CIF syntaxwith other domains (Peter Murray-Rust)
- Re: Advice on COMCIFS policy regarding compatibility of CIF syntaxwith other domains (James Hester)
- Prev by Date: Re: Advice on COMCIFS policy regarding compatibility of CIF syntaxwith other domains
- Next by Date: Re: Advice on COMCIFS policy regarding compatibility of CIFsyntax with other domains
- Prev by thread: Re: Advice on COMCIFS policy regarding compatibility of CIF syntaxwith other domains
- Next by thread: Re: Advice on COMCIFS policy regarding compatibility of CIFsyntax with other domains
- Index(es):