[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Cif2-encoding] How we wrap this up

Hi Brian

Just for info (not argument), regarding your concerns over fonts and the clipboard:

Fonts: this can be an issue. The libraries I use access the system fonts to locate
the appropriate glyphs. publCIF is also distributed with an opensource font that
contains a few modified glyphs to represent e.g. delocalized double bonds.

Clipboard: publCIF reads the clipboard looking for html or plain text and attempts to convert the data to CIF format
when pasting into the CIF (i.e. converting symbols to their ASCII CIF codes and discarding unwanted html tags).
publBio employs a similar mechanism to intercept pasting into some of its html input boxes.
So although this is an issue that we should be aware of, and should review, I think communication with the clipboard
should not be too much cause for concern. Writing to the clipboard is less of a problem - the writer controls what is written.

File import across mounts etc: this would require research.

So import (and maybe clipboard communication) might indeed require further work beyond what has already been done in publCIF
to prevent inclusion of non-CIF text.

Cheers

Simon







From: Brian McMahon <bm@iucr.org>
To: Group for discussing encoding and content validation schemes for CIF2 <cif2-encoding@iucr.org>
Sent: Friday, 24 September, 2010 10:23:59
Subject: Re: [Cif2-encoding] How we wrap this up

My vote:

Preference  Option
  1        2. Herbert's 'as for CIF1 proposal with UTF8 in place of
              ASCII', together with Brian's *recommendations*
  2        1. Herbert's 'as for CIF1 proposal with UTF8 in place of
              ASCII' recently posted here and to COMCIFS.
  3        4. UTF8 + UTF16
  4        3. UTF8-only as in the original draft
  5        5. UTF8, UTF16 + "local"

Rationale: I still feel this argument is at heart a "binary/text"
dichotomy, where "binary" implies that one can prescribe specific
byte-level representations of every distinct character; "text"
implies that you're at the mercy of external libraries and mappings
between encoding conventions - and those mappings are not always
explicit or easy to identify.

I sympathise greatly with James's desire for a prescriptive, "binary"
approach, but its corollary is that a CIF application must take full
responsibility for expressing any supported extended character set (I
mean accented Latin letters, Greek characters, Cyrillic or Chinese
alphabets).

First off, I don't know how difficult that is technically. I would
guess that rather than trying to handle arbitrary keyboard mappings,
the natural approach would be to pick from a graphical character
grid. (What are the implications for this of glyph rendering - does
a CIF editor have to be compiled with its own large font library?)

But that's a laborious method of authoring if relatively large amounts
of "non-standard" text are involved, and the way that authors would
prefer to work, surely, is by copying and pasting text from Word or
some other tool of choice. Permitting that necessarily pollutes the
"binary" approach with byte streams delivered by text-oriented
applications.

If I could be sure that publCIF, say, can be compiled with libraries
that reliably transcode byte streams imported from clipboards and
file import (across the mess of SMB/NFS mounts etc. that exist in
the real world) - and equally reliably transcode its UTF8 encoded text
to the author's locale-based clipboard, then I'd be more willing to
promote option 3 to the top as the starting point at least for CIF
2.0 (but its "enforcement" does depend on the availability of such a
robust CIF-editing tool).

I prefer the UTF8 + UTF16 option over UTF8-only because of the
real-world use case that Herbert has described before; and in
existing imgCIF applications the UTF16 encoding is being done
rather carefully and for a specific purpose.

I put option 5 at the bottom because of the non-portability of a
"local" encoding.

Note, though, that whatever the outcome I would still favour the
discussion of character set encodings to be presented as a Part 3
to the complete CIF2 spec.

Best wishes
Brian
_________________________________________________________________________
Brian McMahon                                      tel: +44 1244 342878
Research and Development Officer                    fax: +44 1244 314888
International Union of Crystallography            e-mail:  bm@iucr.org
5 Abbey Square, Chester CH1 2HU, England

On Thu, Sep 23, 2010 at 10:37:48AM +1000, James Hester wrote:
> Dear CIF2 encoding participants,
>
> As Herbert has indicated, we are starting to run out of time for
> resolution of the encoding issue.  I believe that we have now explored
> the various proposals sufficiently to all have a good understanding of
> the consequences and advantages of each approach.  So, after a round
> of final comments, I propose that we vote on the general scheme that
> we recommend.  We can then flesh out the details of the particular
> scheme that we have settled on, and take this completed proposal to
> the DDLm group for their approval, following which we will present the
> entire CIF2 syntax document to COMCIFS for a formal vote.
>
> The proposals that I believe are still on the table are:
>
> 1. Herbert's 'as for CIF1 proposal' recently posted here and to COMCIFS.
> 2. Herbert's 'as for CIF1 proposal', together with Brian's proposal
> (if you agree that they are compatible)
> 2. UTF8-only as in the original draft
> 3. UTF8 + UTF16
> 4. UTF8, UTF16 + "local"
>
> I have not included the hashcode proposal as I believe it no longer
> has any supporters.
>
> We would need to conduct a preferential vote.  I stress that this is
> purely to determine the recommendation of this working group, and is
> not in any way binding on COMCIFS.
>
> James.
> --
> T +61 (02) 9717 9907
> F +61 (02) 9717 3145
> M +61 (04) 0249 4148
_______________________________________________
cif2-encoding mailing list
cif2-encoding@iucr.org
http://scripts.iucr.org/mailman/listinfo/cif2-encoding
_______________________________________________
cif2-encoding mailing list
cif2-encoding@iucr.org
http://scripts.iucr.org/mailman/listinfo/cif2-encoding

Reply to: [list | sender only]