[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Reply to: [list | sender only]
Re: [Cif2-encoding] How we wrap this up
- To: Group for discussing encoding and content validation schemes for CIF2 <cif2-encoding@xxxxxxxx>
- Subject: Re: [Cif2-encoding] How we wrap this up
- From: Brian McMahon <bm@xxxxxxxx>
- Date: Fri, 24 Sep 2010 10:23:59 +0100
- In-Reply-To: <AANLkTi=hmKNFMgaeMqt69=sG6dOmxZRUrffB1khjF+mZ@mail.gmail.com>
- References: <AANLkTi=hmKNFMgaeMqt69=sG6dOmxZRUrffB1khjF+mZ@mail.gmail.com>
My vote: Preference Option 1 2. Herbert's 'as for CIF1 proposal with UTF8 in place of ASCII', together with Brian's *recommendations* 2 1. Herbert's 'as for CIF1 proposal with UTF8 in place of ASCII' recently posted here and to COMCIFS. 3 4. UTF8 + UTF16 4 3. UTF8-only as in the original draft 5 5. UTF8, UTF16 + "local" Rationale: I still feel this argument is at heart a "binary/text" dichotomy, where "binary" implies that one can prescribe specific byte-level representations of every distinct character; "text" implies that you're at the mercy of external libraries and mappings between encoding conventions - and those mappings are not always explicit or easy to identify. I sympathise greatly with James's desire for a prescriptive, "binary" approach, but its corollary is that a CIF application must take full responsibility for expressing any supported extended character set (I mean accented Latin letters, Greek characters, Cyrillic or Chinese alphabets). First off, I don't know how difficult that is technically. I would guess that rather than trying to handle arbitrary keyboard mappings, the natural approach would be to pick from a graphical character grid. (What are the implications for this of glyph rendering - does a CIF editor have to be compiled with its own large font library?) But that's a laborious method of authoring if relatively large amounts of "non-standard" text are involved, and the way that authors would prefer to work, surely, is by copying and pasting text from Word or some other tool of choice. Permitting that necessarily pollutes the "binary" approach with byte streams delivered by text-oriented applications. If I could be sure that publCIF, say, can be compiled with libraries that reliably transcode byte streams imported from clipboards and file import (across the mess of SMB/NFS mounts etc. that exist in the real world) - and equally reliably transcode its UTF8 encoded text to the author's locale-based clipboard, then I'd be more willing to promote option 3 to the top as the starting point at least for CIF 2.0 (but its "enforcement" does depend on the availability of such a robust CIF-editing tool). I prefer the UTF8 + UTF16 option over UTF8-only because of the real-world use case that Herbert has described before; and in existing imgCIF applications the UTF16 encoding is being done rather carefully and for a specific purpose. I put option 5 at the bottom because of the non-portability of a "local" encoding. Note, though, that whatever the outcome I would still favour the discussion of character set encodings to be presented as a Part 3 to the complete CIF2 spec. Best wishes Brian _________________________________________________________________________ Brian McMahon tel: +44 1244 342878 Research and Development Officer fax: +44 1244 314888 International Union of Crystallography e-mail: bm@iucr.org 5 Abbey Square, Chester CH1 2HU, England On Thu, Sep 23, 2010 at 10:37:48AM +1000, James Hester wrote: > Dear CIF2 encoding participants, > > As Herbert has indicated, we are starting to run out of time for > resolution of the encoding issue. I believe that we have now explored > the various proposals sufficiently to all have a good understanding of > the consequences and advantages of each approach. So, after a round > of final comments, I propose that we vote on the general scheme that > we recommend. We can then flesh out the details of the particular > scheme that we have settled on, and take this completed proposal to > the DDLm group for their approval, following which we will present the > entire CIF2 syntax document to COMCIFS for a formal vote. > > The proposals that I believe are still on the table are: > > 1. Herbert's 'as for CIF1 proposal' recently posted here and to COMCIFS. > 2. Herbert's 'as for CIF1 proposal', together with Brian's proposal > (if you agree that they are compatible) > 2. UTF8-only as in the original draft > 3. UTF8 + UTF16 > 4. UTF8, UTF16 + "local" > > I have not included the hashcode proposal as I believe it no longer > has any supporters. > > We would need to conduct a preferential vote. I stress that this is > purely to determine the recommendation of this working group, and is > not in any way binding on COMCIFS. > > James. > -- > T +61 (02) 9717 9907 > F +61 (02) 9717 3145 > M +61 (04) 0249 4148 _______________________________________________ cif2-encoding mailing list cif2-encoding@iucr.org http://scripts.iucr.org/mailman/listinfo/cif2-encoding
Reply to: [list | sender only]
- Follow-Ups:
- Re: [Cif2-encoding] How we wrap this up (SIMON WESTRIP)
- References:
- [Cif2-encoding] How we wrap this up (James Hester)
- Prev by Date: Re: [Cif2-encoding] How we wrap this up
- Next by Date: Re: [Cif2-encoding] How we wrap this up
- Prev by thread: Re: [Cif2-encoding] How we wrap this up
- Next by thread: Re: [Cif2-encoding] How we wrap this up
- Index(es):