[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Reply to: [list | sender only]
Re: [Cif2-encoding] Splitting of imgCIF and other sub-topics. .. ...
- To: Group for discussing encoding and content validation schemes for CIF2 <cif2-encoding@xxxxxxxx>
- Subject: Re: [Cif2-encoding] Splitting of imgCIF and other sub-topics. .. ...
- From: James Hester <jamesrhester@xxxxxxxxx>
- Date: Fri, 24 Sep 2010 00:07:52 +1000
- In-Reply-To: <8F77913624F7524AACD2A92EAF3BFA5416659DEDCF@SJMEMXMBS11.stjude.sjcrh.local>
- References: <AANLkTilyJE2mCxprlBYaSkysu1OBjY7otWrXDWm3oOT9@mail.gmail.com><8F77913624F7524AACD2A92EAF3BFA541661229542@SJMEMXMBS11.stjude.sjcrh.local><AANLkTikTee4PicHKjnnbAdipegyELQ6UWLXz9Zm08aVL@mail.gmail.com><8F77913624F7524AACD2A92EAF3BFA541661229552@SJMEMXMBS11.stjude.sjcrh.local><AANLkTinZ4KNsnREOOU6sVFdGYR_aQHcjdWr_ko648NGm@mail.gmail.com><8F77913624F7524AACD2A92EAF3BFA5416659DED8C@SJMEMXMBS11.stjude.sjcrh.local><AANLkTintziXhwVCEFD0yUtTDo9KG8ut=oL4OgmkjmEBe@mail.gmail.com><alpine.BSF.2.00.1008240629120.23114@epsilon.pair.com><AANLkTi=+qZQrWJ3duOzWyPq5H=w1GOVbeKRfFLTR8u5a@mail.gmail.com><alpine.BSF.2.00.1008240920580.23114@epsilon.pair.com><AANLkTikRLKp6oREvD4KcgUd-H-Cu6xoOrGWgQE1zUyx7@mail.gmail.com><alpine.BSF.2.00.1009022333190.52468@epsilon.pair.com><AANLkTimLUnUjNuS9EmMbtTurxB3MGtGvM6gWxZw6aRLE@mail.gmail.com><alpine.BSF.2.00.1009030735110.95035@epsilon.pair.com><AANLkTinxkquC5cY0m23yzBVgm7afmYYfh6+2yMz=Hr_w@mail.gmail.com><alpine.BSF.2.00.1009100711070.59446@epsilon.pair.com><8F77913624F7524AACD2A92EAF3BFA5416659DEDBD@SJMEMXMBS11.stjude.sjcrh.local><AANLkTikuoQEU-rv9GkTqqc0u0qgd1ugf+cGTfqF77j-E@mail.gmail.com><8F77913624F7524AACD2A92EAF3BFA5416659DEDC0@SJMEMXMBS11.stjude.sjcrh.local><AANLkTiks-tEAU9T_ygwvNhs_YpzE1+ZVb=K_=0DT8UuK@mail.gmail.com><8F77913624F7524AACD2A92EAF3BFA5416659DEDCF@SJMEMXMBS11.stjude.sjcrh.local>
In this email I try to pin down what supporting local encoding might imply. I think it is fair to say that John is advocating including "local" encoding in the list of CIF2 encodings because: (i) this will be the default encoding assumed by text editors (ii) there will be a significant tendency among programmers not to specify encoding when reading/writing CIF files I was surprised to read that we were worried about programmers not getting the message, having assumed up until now that we were concerned only about ordinary users not coming to grips with non-local encoding. Anyway, let's put ourselves in the programmer's shoes on a system for which local encoding is not UTF8/16: Programmer A wants to support UTF8, UTF16 and local. When reading a CIF file, she *must* first try UTF8, then UTF16, and only then local, because a UTF8 file will most probably read in without error as a file in local encoding. However, this programmer is not one of those identified in (ii), because she is actively setting UTF8 and 16 as input encodings. Programmer B wants no business with setting encodings, and so supports only reading/writing local encoding. His program will unfortunately also read in UTF8 files assuming local encoding. The program thus behaves correctly only if the user always remembers to either produce or transcode CIF files to local encoding, assuming that the user has read the documentation for the program sufficiently to know that this is even an issue. As an added bonus, this user has to know what the local encoding is, as the programmer is presumably not making any effort to find out and communicate it (as this would actually be more work than just specifying the damn encoding already). I believe that this is an unworkable situation, and not one that we should facilitate. My point being that reason (ii) (lazy programmers) is not a good justification for keeping local encoding on the list of acceptable encodings. I have never seen reason (i) as sufficient justification. In any case, I do not think that there will be many Programmer Bs. Note the following points: (a) Dealing with encoding in most common languages is simple. Note that even modern Fortran can handle UTF8 - see the code snippets at http://coding.derkeiler.com/Archive/Fortran/comp.lang.fortran/2008-08/msg00395.html and http://gcc.gnu.org/onlinedocs/gfortran/SELECTED_005fCHAR_005fKIND.html (b) If UTF8 is to be supported for reading, files have to be opened explicitly in UTF8, so the programmer is already explicitly specifying encoding (c) The audience of programmers for CIF is (unfortunately) rather small. They can all be reached very easily for active education on how and why UTF8 encoding is specified. And the lack of local encoding can be managed simply: UTF8 encoding and "local" will almost always coincide in the ASCII space, so the absence of local encoding in the acceptable encoding list is invisible on day one of CIF2. Introduction of non-ASCII characters into CIFs can then be managed from Chester through gradual introduction of non-ASCII dataname values, first in non-critical places. Chester can monitor the proportion of incorrectly encoded files received and calibrate a response. However you all assess the above points, I think it is clear that John and I will have to agree to disagree on the value of local encodings. The root cause, I think, is differing perceptions of programmer responsiveness to the standard. I appreciate John's efforts to find a compromise, but I believe we have exhaused our avenues in this direction. Well, I'm ready to vote. Would anybody else like to make any final points before we call for a vote? James. On Sat, Sep 18, 2010 at 1:33 AM, Bollinger, John C <John.Bollinger@stjude.org> wrote: [...] > Unfortunately, that train has long since left us behind on the platform. New standard notwithstanding, I don't see an opportunity to > effect an abrupt shift in program and user behavior -- specifically, the behavior of using default text conventions implicitly and > routinely. If we formally require UTF-8/16, it can only be with the understanding that many users and programs will ignore that > requirement altogether. I don't find that at all appealing or useful, and I do not support it. > > I think we will achieve more consistent CIF2 software, and we will better influence programmers and users, by standardizing the > use of default text conventions with CIF2. I would be content to deprecate such use. I would favor non-normative commentary in > the spec that explains the issue and discourages reliance on default text encoding. I would also favor publicizing resources > describing how to convert local text to UTF-8 (or -16), and creating such resources if necessary. I want to see people using > UTF-8/16 for their CIFs, but I don't want to cut them off, standards-wise, when they don't. > > [...] >>In fact, it is rather difficult to >>find any instructions as to how to determine the platform's "local" >>encoding. > > The point of default conventions is that you don't have to determine what they are, you just use them. In fact, in some > programming environments, there is no easy way to do otherwise. For example, to the best of my knowledge, there is no way to > write a standard-conformant Fortran 95 program that portably reads text from a file in anything but the default encoding. If you don't know what your input encoding is, how do you transcode to UTF8? [...] > The mechanism for reliable transmission is to transcode, if necessary, to UTF-8/16, and transmit the result. This is exactly the > same mechanism that would be available for reliable transmission if UTF-8 were the only standardized encoding (under which case > I include transmission of non-UTF-8 almost-CIFs). The mechanism is the same for reliably sharing CIFs among environments > where compatibility of default conventions is uncertain. I see no reason to believe that users' decisions whether to employ that > mechanism will be driven by anything other than practical considerations, the standard's position notwithstanding. I would expect > some programmers to be more influenced by the standard, but in the end they are faced with the same practical considerations. > >> And so on. Frankly, I still see no >>merit in including local encodings in CIF2 at all. > > I value standardizing behavior that we all (I think) expect will be common, even though that behavior isn't ideal. In that way I expect > to support well-defined and consistent responses to that behavior (mainly in software). Given that I have said so before without > persuading you, we will have to agree to disagree here. > >> but instead will attempt >>to mitigate the damage by supporting the following moves: >> >>(i) compliant CIF processors are *not* required to accept files in >>local encoding; > > It is inconsistent to allow local text conventions in the file format definition, but to permit conformant processors to reject them. > Additionally, I oppose inclusion of any explicit requirements on CIF processors, preferring instead to rely on the format > specification to define what conformant processors must do. I could, however, accept defining separate flavors of CIF > distinguished by these encoding distinctions, so that programs could conform to one, the other, or both. I'm not sure I like that, > but I think I could agree to it if it helps us wrap this up. > > _______________________________________________ > cif2-encoding mailing list > cif2-encoding@iucr.org > http://scripts.iucr.org/mailman/listinfo/cif2-encoding > -- T +61 (02) 9717 9907 F +61 (02) 9717 3145 M +61 (04) 0249 4148 _______________________________________________ cif2-encoding mailing list cif2-encoding@iucr.org http://scripts.iucr.org/mailman/listinfo/cif2-encoding
Reply to: [list | sender only]
- References:
- Re: [Cif2-encoding] [ddlm-group] options/text vsbinary/end-of-line. .. .. .. .. .. .. .. .. .. .. .. .. . (James Hester)
- Re: [Cif2-encoding] [ddlm-group] options/text vsbinary/end-of-line . .. .. .. .. .. .. .. .. .. .. .. .. .. . (Bollinger, John C)
- Re: [Cif2-encoding] [ddlm-group] options/text vs binary/end-of-line. .. .. .. .. .. .. .. .. .. .. .. .. .. . (James Hester)
- [Cif2-encoding] Splitting of imgCIF and other sub-topics (Herbert J. Bernstein)
- Re: [Cif2-encoding] Splitting of imgCIF and other sub-topics (James Hester)
- Re: [Cif2-encoding] Splitting of imgCIF and other sub-topics (Herbert J. Bernstein)
- Re: [Cif2-encoding] Splitting of imgCIF and other sub-topics (James Hester)
- Re: [Cif2-encoding] Splitting of imgCIF and other sub-topics (Herbert J. Bernstein)
- Re: [Cif2-encoding] Splitting of imgCIF and other sub-topics (James Hester)
- Re: [Cif2-encoding] Splitting of imgCIF and other sub-topics (Herbert J. Bernstein)
- Re: [Cif2-encoding] Splitting of imgCIF and other sub-topics (James Hester)
- Re: [Cif2-encoding] Splitting of imgCIF and other sub-topics (Herbert J. Bernstein)
- Re: [Cif2-encoding] Splitting of imgCIF and other sub-topics. . (Bollinger, John C)
- Re: [Cif2-encoding] Splitting of imgCIF and other sub-topics. . (James Hester)
- Re: [Cif2-encoding] Splitting of imgCIF and other sub-topics. .. . (Bollinger, John C)
- Re: [Cif2-encoding] Splitting of imgCIF and other sub-topics. .. . (James Hester)
- Re: [Cif2-encoding] Splitting of imgCIF and other sub-topics. .... . (Bollinger, John C)
- Prev by Date: Re: [Cif2-encoding] How we wrap this up
- Next by Date: Re: [Cif2-encoding] How we wrap this up
- Prev by thread: [Cif2-encoding] Request for a vote on a motion
- Next by thread: Re: [Cif2-encoding] [ddlm-group] options/text vs binary/end-of-line. .. .. .. .. .. .. .. .. .. .. .. .. .. .
- Index(es):