[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Reply to: [list | sender only]
Re: [Cif2-encoding] [ddlm-group] options/text vs binary/end-of-line. .. .. .. .. .. .. .. .. .. .. .. .. .. .
- To: Group for discussing encoding and content validation schemes for CIF2 <cif2-encoding@xxxxxxxx>
- Subject: Re: [Cif2-encoding] [ddlm-group] options/text vs binary/end-of-line. .. .. .. .. .. .. .. .. .. .. .. .. .. .
- From: "Herbert J. Bernstein" <yaya@xxxxxxxxxxxxxxxxxxxxxxx>
- Date: Thu, 26 Aug 2010 07:24:49 -0400 (EDT)
- In-Reply-To: <AANLkTi=tzw3gqS1Hn199QMXPd1eY7Jf1Zxf0tnaz3ggF@mail.gmail.com>
- References: <AANLkTilyJE2mCxprlBYaSkysu1OBjY7otWrXDWm3oOT9@mail.gmail.com><520427.68014.qm@web87001.mail.ird.yahoo.com><a06240800c84ac1b696bf@192.168.2.104><614241.93385.qm@web87016.mail.ird.yahoo.com><alpine.BSF.2.00.1006251827270.70846@epsilon.pair.com><8F77913624F7524AACD2A92EAF3BFA54166122952D@SJMEMXMBS11.stjude.sjcrh.local><33483.93964.qm@web87012.mail.ird.yahoo.com><AANLkTilqKa_vZJEmfjEtd_MzKhH1CijEIglJzWpFQrrC@mail.gmail.com><8F77913624F7524AACD2A92EAF3BFA541661229542@SJMEMXMBS11.stjude.sjcrh.local><AANLkTikTee4PicHKjnnbAdipegyELQ6UWLXz9Zm08aVL@mail.gmail.com><8F77913624F7524AACD2A92EAF3BFA541661229552@SJMEMXMBS11.stjude.sjcrh.local><AANLkTinZ4KNsnREOOU6sVFdGYR_aQHcjdWr_ko648NGm@mail.gmail.com><8F77913624F7524AACD2A92EAF3BFA5416659DED8C@SJMEMXMBS11.stjude.sjcrh.local><AANLkTintziXhwVCEFD0yUtTDo9KG8ut=oL4OgmkjmEBe@mail.gmail.com><639601.73559.qm@web87008.mail.ird.yahoo.com><AANLkTi=tzw3gqS1Hn199QMXPd1eY7Jf1Zxf0tnaz3ggF@mail.gmail.com>
Um, but CIF1 is _not_ ascii-only. It is text in any acceptable local encoding. ===================================================== Herbert J. Bernstein, Professor of Computer Science Dowling College, Kramer Science Center, KSC 121 Idle Hour Blvd, Oakdale, NY, 11769 +1-631-244-3035 yaya@dowling.edu ===================================================== On Thu, 26 Aug 2010, James Hester wrote: > Hi Simon and others, > > What Simon describes accords closely with my perception of the > situation, except that your final point regarding CIF2 requiring users > to abandon text editors will depend on how we resolve the encoding > issue. For me the logical conclusion from the points you make is to > stick to UTF8-only encoding which will keep the large majority of > users and developers happy. Unfortunately others have the perception > that UTF8-only will be overly restrictive, and lacking hard data we > are having trouble deciding which of these two perceptions are > correct. Clearly UTF8-only is not overly restrictive *now* because it > is *less* restrictive than the (de-facto) CIF1 situation of ASCII-only > which has served us well. UTF8 may be restrictive in the future when > users of non Latin-1 code points find that they don't know how or > can't use their favourite text editors for putting those code points > into a CIF, but I'm not sure even the users themselves could answer > the question now as to how likely that is going to be. > > What I would suggest as a cautious compromise is to leave the door > open for adding non UTF8 encodings in the future, but not describing > any scheme for doing this at present. One way to leave the door open > like this would be to declare that the first line of a CIF2 file is > 'special', and is reserved for future expansion. Our discussions on > Scheme B are sufficiently far advanced to indicate that conventions > relating to encoding schemes could be managed in the first line. The > question of how strictly something like Scheme B should be applied > remains open, and could be addressed once more in-field experience has > been gained. > > > On Thu, Aug 26, 2010 at 9:08 AM, SIMON WESTRIP > <simonwestrip@btinternet.com> wrote: >> Dear all >> >> Recent contributions have stimulated me to revisit some of the fundamental >> issues of the possible changes in CIF2 with respect to CIF1, >> in particular, the impact on current practice (as I perceive it, based on my >> experience). The following is a summary of my thoughts, trying to >> look at this from two perspectives (forgive me if I repeat earlier >> opinions): >> >> 1) User perspective >> >> To date, in the 'core' CIF world (i.e. single-crystal and its extensions), >> users treat CIFs as text files, and expect to be able to read them as such >> using >> plain-text editors, and indeed edit them if necessary for e.g. publication >> purposes. Furthermore, they expect them to be readable by applications that >> claim that >> ability (e.g. graphics software). >> >> The situation is slghtly different with mmCIF (and the pdb variants), where >> users tend to treat these CIFs as data sources that can be read by >> applications without >> any need to examine the raw CIF themselves, let alone edit them. >> >> Although the above statements only encompass two user groups and are based >> on my personal experience, I believe these groups are the largest when >> talking about CIF users? >> >> So what is the impact on such users of introducing the use of non-ASCII text >> and thus raising the text encoding issue? >> >> In the latter case, probably minimal, inasmuch as the users dont interact >> directly with the raw CIF and rely on CIF processing software to manage the >> data. >> >> In the former case, it is quite possible that a user will no longer be able >> to edit the raw CIF using the same plain-text editor they have always used >> for such purposes. >> For example, if a user receives a CIF that has been encoded in UTF16 by some >> remote CIF processing system, and opens it in a non-UTF16-aware plain-text >> editor, >> they will not be presented with what they would expect, even if the >> character set in that particular CIF doesnt extend beyond ASCII; >> furthermore, even 'advanced' test editors would struggle if the encoding >> were e.g. UTF16BE (i.e. has no BOM). Granted, this example is equally >> applicable to CIF1, but by 'opening up' multiple encodings, the probability >> of their usage increases? >> >> So as soon as we move beyond ASCII, we have to accept that a large group of >> CIF users will, at the very least, have to be aware that CIF is no longer >> the 'text' format >> that they once understood it to be? >> >> 2) Developer perspective >> >> I beleive that developers presented with a documented standard will follow >> that standard and prefer to work with no uncertainties, especially if they >> are >> unfamiliar with the format (perhaps just need to be able to read a CIF to >> extract data relevant to their application/database...?) >> >> Taking the example of XML, in my experience developers seem to follow the >> standard quite strictly. Most everyday applications that process XML are >> intolerant of >> violations of the standard. Fortunately, it is largely only developers that >> work with raw XML, so the standard works well. >> >> In contrast to XML, with HTML/javascript the approach to the 'standard' is >> far more tolerant. Though these languages are standardized, in order to >> compete, the leading application >> developers have had to adopt flexibility (e.g. browsers accept 'dirty' HTML, >> are remarkably forgiving of syntax violations in javascript, and alter the >> standard to >> achieve their own ends or facilitate user requirements). I suspect this >> results largely from the evolution of the languages: just as in the early >> days of CIF, encouragement of >> use and the end results were more important than adherence to the documented >> standard? >> >> Note that these same applications that are so tolerant of HTML/javascript >> violations are far less forgiving of malformed XML. So is the lesson here >> that developers expect >> new standards to be unambiguous and will code accordingly (especially if the >> new standard was partly designed to address the shortcomings of its >> ancestors)? >> >> >> Again, forgive me if these all sounds familiar - however, before arguing one >> way or the other with regard to specifics, perhaps the wider group would >> like to confirm or otherwise the main points I'm trying to assert, in >> particular, with respect to *user* practice: >> >> 1) CIF2 will require users to change the way they view CIF - i.e. they may >> be forced to use CIF2-compliant text editors/application software, and >> abandon their current practice. >> >> With respect to developers, recent coverage has been very insightful, but >> just out of interest, would I be wrong in stating that: >> >> 2) Developers, especially those that don't specialize in CIF, are likely to >> want a clear-cut universal standard that does not require any heuristic >> interpretatation. >> >> Cheers >> >> Simon >> >> >> >> > -- > T +61 (02) 9717 9907 > F +61 (02) 9717 3145 > M +61 (04) 0249 4148 > _______________________________________________ > cif2-encoding mailing list > cif2-encoding@iucr.org > http://scripts.iucr.org/mailman/listinfo/cif2-encoding > _______________________________________________ cif2-encoding mailing list cif2-encoding@iucr.org http://scripts.iucr.org/mailman/listinfo/cif2-encoding
Reply to: [list | sender only]
- Follow-Ups:
- References:
- Re: [Cif2-encoding] [ddlm-group] options/text vsbinary/end-of-line. .. .. .. .. .. .. .. .. .. .. .. .. . (James Hester)
- Re: [Cif2-encoding] [ddlm-group] options/text vsbinary/end-of-line . .. .. .. .. .. .. .. .. .. .. .. .. .. . (Bollinger, John C)
- Re: [Cif2-encoding] [ddlm-group] options/text vs binary/end-of-line. .. .. .. .. .. .. .. .. .. .. .. .. .. . (James Hester)
- Re: [Cif2-encoding] [ddlm-group] options/text vs binary/end-of-line. .. .. .. .. .. .. .. .. .. .. .. .. .. . (SIMON WESTRIP)
- Re: [Cif2-encoding] [ddlm-group] options/text vs binary/end-of-line. .. .. .. .. .. .. .. .. .. .. .. .. .. . (James Hester)
- Prev by Date: Re: [Cif2-encoding] [ddlm-group] options/text vs binary/end-of-line. .. .. .. .. .. .. .. .. .. .. .. .. .. .
- Next by Date: Re: [Cif2-encoding] [ddlm-group] options/text vs binary/end-of-line. .. .. .. .. .. .. .. .. .. .. .. .. .. .
- Prev by thread: Re: [Cif2-encoding] [ddlm-group] options/text vs binary/end-of-line. .. .. .. .. .. .. .. .. .. .. .. .. .. .
- Next by thread: Re: [Cif2-encoding] [ddlm-group] options/text vs binary/end-of-line. .. .. .. .. .. .. .. .. .. .. .. .. .. .
- Index(es):