Discussion List Archives

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Cif2-encoding] How we wrap this up

My vote:

Preference  Option
   1        2. Herbert's 'as for CIF1 proposal with UTF8 in place of
               ASCII', together with Brian's *recommendations*
   2        1. Herbert's 'as for CIF1 proposal with UTF8 in place of
               ASCII' recently posted here and to COMCIFS.
   3        4. UTF8 + UTF16
   4        3. UTF8-only as in the original draft
   5        5. UTF8, UTF16 + "local"

Rationale: I still feel this argument is at heart a "binary/text"
dichotomy, where "binary" implies that one can prescribe specific
byte-level representations of every distinct character; "text"
implies that you're at the mercy of external libraries and mappings
between encoding conventions - and those mappings are not always
explicit or easy to identify.

I sympathise greatly with James's desire for a prescriptive, "binary"
approach, but its corollary is that a CIF application must take full
responsibility for expressing any supported extended character set (I
mean accented Latin letters, Greek characters, Cyrillic or Chinese
alphabets).

First off, I don't know how difficult that is technically. I would
guess that rather than trying to handle arbitrary keyboard mappings,
the natural approach would be to pick from a graphical character
grid. (What are the implications for this of glyph rendering - does
a CIF editor have to be compiled with its own large font library?)

But that's a laborious method of authoring if relatively large amounts
of "non-standard" text are involved, and the way that authors would
prefer to work, surely, is by copying and pasting text from Word or
some other tool of choice. Permitting that necessarily pollutes the
"binary" approach with byte streams delivered by text-oriented
applications.

If I could be sure that publCIF, say, can be compiled with libraries
that reliably transcode byte streams imported from clipboards and
file import (across the mess of SMB/NFS mounts etc. that exist in
the real world) - and equally reliably transcode its UTF8 encoded text
to the author's locale-based clipboard, then I'd be more willing to
promote option 3 to the top as the starting point at least for CIF
2.0 (but its "enforcement" does depend on the availability of such a
robust CIF-editing tool).

I prefer the UTF8 + UTF16 option over UTF8-only because of the
real-world use case that Herbert has described before; and in
existing imgCIF applications the UTF16 encoding is being done
rather carefully and for a specific purpose.

I put option 5 at the bottom because of the non-portability of a
"local" encoding.

Note, though, that whatever the outcome I would still favour the
discussion of character set encodings to be presented as a Part 3
to the complete CIF2 spec.

Best wishes
Brian
_________________________________________________________________________
Brian McMahon                                       tel: +44 1244 342878
Research and Development Officer                    fax: +44 1244 314888
International Union of Crystallography            e-mail:  bm@iucr.org
5 Abbey Square, Chester CH1 2HU, England

On Thu, Sep 23, 2010 at 10:37:48AM +1000, James Hester wrote:
> Dear CIF2 encoding participants,
> 
> As Herbert has indicated, we are starting to run out of time for
> resolution of the encoding issue.  I believe that we have now explored
> the various proposals sufficiently to all have a good understanding of
> the consequences and advantages of each approach.  So, after a round
> of final comments, I propose that we vote on the general scheme that
> we recommend.  We can then flesh out the details of the particular
> scheme that we have settled on, and take this completed proposal to
> the DDLm group for their approval, following which we will present the
> entire CIF2 syntax document to COMCIFS for a formal vote.
> 
> The proposals that I believe are still on the table are:
> 
> 1. Herbert's 'as for CIF1 proposal' recently posted here and to COMCIFS.
> 2. Herbert's 'as for CIF1 proposal', together with Brian's proposal
> (if you agree that they are compatible)
> 2. UTF8-only as in the original draft
> 3. UTF8 + UTF16
> 4. UTF8, UTF16 + "local"
> 
> I have not included the hashcode proposal as I believe it no longer
> has any supporters.
> 
> We would need to conduct a preferential vote.  I stress that this is
> purely to determine the recommendation of this working group, and is
> not in any way binding on COMCIFS.
> 
> James.
> -- 
> T +61 (02) 9717 9907
> F +61 (02) 9717 3145
> M +61 (04) 0249 4148
_______________________________________________
cif2-encoding mailing list
cif2-encoding@iucr.org
http://scripts.iucr.org/mailman/listinfo/cif2-encoding

Reply to: [list | sender only]
International Union of Crystallography

Scientific Union Member of the International Science Council (admitted 1947). Member of CODATA, the ISC Committee on Data. Partner with UNESCO, the United Nations Educational, Scientific and Cultural Organization in the International Year of Crystallography 2014.

International Science Council Scientific Freedom Policy

The IUCr observes the basic policy of non-discrimination and affirms the right and freedom of scientists to associate in international scientific activity without regard to such factors as ethnic origin, religion, citizenship, language, political stance, gender, sex or age, in accordance with the Statutes of the International Council for Science.