[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Reply to: [list | sender only]
Re: [ddlm-group] options/text vs binary/end-of-line. .. .
- To: Group finalising DDLm and associated dictionaries <ddlm-group@iucr.org>
- Subject: Re: [ddlm-group] options/text vs binary/end-of-line. .. .
- From: James Hester <jamesrhester@gmail.com>
- Date: Tue, 22 Jun 2010 10:43:48 +1000
- In-Reply-To: <alpine.BSF.2.00.1006212018430.91069@epsilon.pair.com>
- References: <alpine.BSF.2.00.1005111250250.60002@epsilon.pair.com><20100614142541.GA356@emerald.iucr.org><AANLkTikeIbft9SKfvpgTpGZVpo47Vg_acYBbXi-eUvU-@mail.gmail.com><alpine.BSF.2.00.1006152223480.59900@epsilon.pair.com><AANLkTimmOPFkQhY1KY24Dg5kz3MUB4mO2sjoM848bqjV@mail.gmail.com><alpine.BSF.2.00.1006160719520.58405@epsilon.pair.com><881462.27872.qm@web87009.mail.ird.yahoo.com><AANLkTin51hXra-cIPzH3VMcUxJHMaUPWL71Kf1zM8SNt@mail.gmail.com><alpine.BSF.2.00.1006172025070.91418@epsilon.pair.com><AANLkTimEn-5bOcLNsa1DSOjDS7XqFmqVKA-W-6Z4NxFO@mail.gmail.com><alpine.BSF.2.00.1006172107430.91418@epsilon.pair.com><AANLkTilJUtXpw5UFQv0Y04Knrv9wCPLr5eertWPCcTzz@mail.gmail.com><alpine.BSF.2.00.1006180703230.91255@epsilon.pair.com><alpine.BSF.2.00.1006180837330.91255@epsilon.pair.com><AANLkTildS0DVEj76rffd8sgXgno2INL8zkXI_qsBjSLP@mail.gmail.com><a06240803c845518a843e@192.168.2.104><AANLkTilyJE2mCxprlBYaSkysu1OBjY7otWrXDWm3oOT9@mail.gmail.com><alpine.BSF.2.00.1006212018430.91069@epsilon.pair.com>
I agree with your paragraph. I'm ready for your next step... On Tue, Jun 22, 2010 at 10:23 AM, Herbert J. Bernstein <yaya@bernstein-plus-sons.com> wrote: > OK, so we are at least in agreement with the concept of a text file. > Now let's deal with what that means to users: > > I means that they can edit a file on some reasonable range of > machines with a text editor, read it with the text-reading > libraries for some reasonable range of programming languages > on some reasonable range of machine, and write it with > text editors and the text-writing libraries of programming > languages on some reaonable range of machines and they > have some reaonable way to print the file on piece of paper > and read it seeing the essential content of the file. > > Do we all agree to those implcations of saying we are dealing > with a text file? > > (Yes, this is a trick question -- to find out if we have a > text interchange format or if we are just dealing with > a binary file under false colors). > > Regards, > Herbert > > ===================================================== > Herbert J. Bernstein, Professor of Computer Science > Dowling College, Kramer Science Center, KSC 121 > Idle Hour Blvd, Oakdale, NY, 11769 > > +1-631-244-3035 > yaya@dowling.edu > ===================================================== > > On Tue, 22 Jun 2010, James Hester wrote: > >> As Simon says, to agree to this wording requires agreeing to multiple >> encodings. We have not agreed to that yet. I would however agree to >> the following wording, which has removed any reference to encoding, >> and inserted John's suggestion for EOL treatment. >> >> "CIF2 is a specification for the interchange of text files.This >> document is therefore written >> in terms of a sequence of Unicode code points. Particular care must >> be taken with treatment of newline in text files. This document will >> only refer to <0x000A> as a line terminator, as CIF2 processors are >> required to map <0x000D>, <0x000A> and <0x000D><0x000A> to this >> character. >> >> To ensure compatibility with older Fortran text processing software, >> lines in CIF2 files should be restricted to no more than 2048 >> code points in length, not including the line terminator itself." >> >> On Tue, Jun 22, 2010 at 3:44 AM, Herbert J. Bernstein >> <yaya@bernstein-plus-sons.com> wrote: >>> >>> Dear Colleagues, >>> >>> The IUCr is an international organization. Is it really politically >>> wise to insist that CIF2 tags be restricted to unaccented roman letters? >>> >>> Before we go much further, may we please have a vote on explicitly >>> changing CIF2 from the current draft wording that it is a binary >>> format to the wording I suggested making it a text format. Most of the >>> rest of the issues we are dealing with hinge on that basic decision. >>> >>> The wording I proposed was: >>> >>> "CIF2 is a specification for the interchange of text files. Text files >>> have many possible system dependent represenations and encodings. To >>> ensure clarity in the specification of CIF2, this document is written >>> in terms of a sequence of unicode code points, and all fully compliant >>> CIF2 processing systems should, at a minimum be able to process >>> text files as unicode code points represented in UTF-8, subject to the >>> XML-based restrictions below. This approach is not meant to prevent >>> people from preparing valid CIF2 files with non-UTF-8-based text >>> editors, but, if a non-UTF-8 file format is produced, it is important >>> to clearly specify the intended mapping to UTF-8. This is particularly >>> important in dealing with end-of-line indicators (see >>> http://en.wikipedia.org/wiki/Newline). When handling CIF2 files >>> produced under MS windows, CR-LF sequences should be accepted as >>> an alternative to LF, and when handling CIF2 files produced under >>> Mac OS, CR should be accepted as an alternative to LF. This document >>> will only refer to LF as a line terminator and will assume that some >>> appropriate system-dependent text processing system will handle >>> the necessary conversion. >>> >>> To ensure compatibility with older Fortran text processing software, >>> lines in CIF2 files should be restricted to no more than 2048 >>> code points in length, not including the line temrinator itself. >>> Not that the UTF-8 encoding of such a line may well be much longer." >>> >>> If anybody objects to some specific wording in this text, let us >>> settle on revised wording. We need to get this basic issue >>> clarified in writing or we will be going in circles forever. >>> >>> >>> Regards, >>> Herbert >>> >>> >>> >>> At 11:30 AM -0500 6/21/10, Bollinger, John C wrote: >>>> >>>> On Monday, June 21, 2010 1:13 AM, James Hester wrote: >>>> >>>>> I prefer the XML treatment of newline (ie translated to 0x000A for >>>>> processing purposes). I would be in favour of restricting newline to >>>>> <0x000A>, <0x000D> or <0x000D 0x000A>, which means that only these >>>>> combinations have the syntactic significance of a newline. >>>> >>>> I would be satisfied with that approach. >>>> >>>>> From >>>>> memory, this significance is restricted to: >>>>> >>>>> 1. end of comment >>>>> 2. whitespace >>>>> 3. use in <eol><semicolon> digraph >>>> >>>> The significance also extends to 'single'- and "double"-quote >>>> delimited data values, in that these cannot contain end-of-line. >>>> >>>>> I would also restrict the appearance of the remaining Unicode newline >>>>> characters to delimited datavalues, to maintain consistent display of >>>>> data files. >>>> >>>> I'm seeing more and more upside to restricting *all* non-ASCII >>>> characters to delimited data values. I don't have any objection to >>>> restricting U+0085, U+2028, and U+2029 (did I miss any?) to such >>>> contexts. >>>> >>>> >>>> John >>>> -- >>>> John C. Bollinger, Ph.D. >>>> Department of Structural Biology >>>> St. Jude Children's Research Hospital >>>> >>>> >>>> >>>> >>>> Email Disclaimer: www.stjude.org/emaildisclaimer >>>> >>>> _______________________________________________ >>>> ddlm-group mailing list >>>> ddlm-group@iucr.org >>>> http://scripts.iucr.org/mailman/listinfo/ddlm-group >>> >>> >>> -- >>> ===================================================== >>> Herbert J. Bernstein, Professor of Computer Science >>> Dowling College, Kramer Science Center, KSC 121 >>> Idle Hour Blvd, Oakdale, NY, 11769 >>> >>> +1-631-244-3035 >>> yaya@dowling.edu >>> ===================================================== >>> _______________________________________________ >>> ddlm-group mailing list >>> ddlm-group@iucr.org >>> http://scripts.iucr.org/mailman/listinfo/ddlm-group >>> >> >> >> >> -- >> T +61 (02) 9717 9907 >> F +61 (02) 9717 3145 >> M +61 (04) 0249 4148 >> _______________________________________________ >> ddlm-group mailing list >> ddlm-group@iucr.org >> http://scripts.iucr.org/mailman/listinfo/ddlm-group > > _______________________________________________ > ddlm-group mailing list > ddlm-group@iucr.org > http://scripts.iucr.org/mailman/listinfo/ddlm-group > > -- T +61 (02) 9717 9907 F +61 (02) 9717 3145 M +61 (04) 0249 4148 _______________________________________________ ddlm-group mailing list ddlm-group@iucr.org http://scripts.iucr.org/mailman/listinfo/ddlm-group
Reply to: [list | sender only]
- Follow-Ups:
- Re: [ddlm-group] options/text vs binary/end-of-line. .. . (Herbert J. Bernstein)
- References:
- Re: [ddlm-group] [SPAM] ASSP UTF-8 BOM (Herbert J. Bernstein)
- Re: [ddlm-group] UTF-8 BOM (Brian McMahon)
- Re: [ddlm-group] UTF-8 BOM (James Hester)
- Re: [ddlm-group] UTF-8 BOM (Herbert J. Bernstein)
- Re: [ddlm-group] UTF-8 BOM (James Hester)
- Re: [ddlm-group] UTF-8 BOM (Herbert J. Bernstein)
- Re: [ddlm-group] UTF-8 BOM (SIMON WESTRIP)
- Re: [ddlm-group] UTF-8 BOM (James Hester)
- Re: [ddlm-group] UTF-8 BOM (Herbert J. Bernstein)
- Re: [ddlm-group] UTF-8 BOM (James Hester)
- Re: [ddlm-group] UTF-8 BOM (Herbert J. Bernstein)
- Re: [ddlm-group] UTF-8 BOM (James Hester)
- Re: [ddlm-group] UTF-8 BOM (Herbert J. Bernstein)
- [ddlm-group] options/text vs binary/end-of-line (Herbert J. Bernstein)
- Re: [ddlm-group] options/text vs binary/end-of-line. . (James Hester)
- Re: [ddlm-group] options/text vs binary/end-of-line. .. . (James Hester)
- Re: [ddlm-group] options/text vs binary/end-of-line. .. . (Herbert J. Bernstein)
- Prev by Date: Re: [ddlm-group] Recommended character set and use restrictions. .
- Next by Date: Re: [ddlm-group] options/text vs binary/end-of-line. .. .
- Prev by thread: Re: [ddlm-group] options/text vs binary/end-of-line. .. .
- Next by thread: Re: [ddlm-group] options/text vs binary/end-of-line. .. .
- Index(es):