[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [ddlm-group] options/text vs binary/end-of-line. .. .

To: Group finalising DDLm and associated dictionaries <[email protected]>
Subject: Re: [ddlm-group] options/text vs binary/end-of-line. .. .
From: James Hester <[email protected]>
Date: Tue, 22 Jun 2010 10:43:48 +1000
In-Reply-To: <[email protected]>
References: <[email protected]><[email protected]><[email protected]><[email protected]><[email protected]><[email protected]><[email protected]><[email protected]><[email protected]><[email protected]><[email protected]><[email protected]><[email protected]><[email protected]><[email protected]><[email protected]><[email protected]><[email protected]>

I agree with your paragraph.  I'm ready for your next step...

On Tue, Jun 22, 2010 at 10:23 AM, Herbert J. Bernstein
<[email protected]> wrote:
> OK, so we are at least in agreement with the concept of a text file.
> Now let's deal with what that means to users:
>
> I means that they can edit a file on some reasonable range of
> machines with a text editor, read it with the text-reading
> libraries for some reasonable range of programming languages
> on some reasonable range of machine, and write it with
> text editors and the text-writing libraries of programming
> languages on some reaonable range of machines and they
> have some reaonable way to print the file on piece of paper
> and read it seeing the essential content of the file.
>
> Do we all agree to those implcations of saying we are dealing
> with a text file?
>
> (Yes, this is a trick question -- to find out if we have a
> text interchange format or if we are just dealing with
> a binary file under false colors).
>
> Regards,
> �Herbert
>
> =====================================================
> �Herbert J. Bernstein, Professor of Computer Science
> � Dowling College, Kramer Science Center, KSC 121
> � � � �Idle Hour Blvd, Oakdale, NY, 11769
>
> � � � � � � � � +1-631-244-3035
> � � � � � � � � [email protected]
> =====================================================
>
> On Tue, 22 Jun 2010, James Hester wrote:
>
>> As Simon says, to agree to this wording requires agreeing to multiple
>> encodings. �We have not agreed to that yet. �I would however agree to
>> the following wording, which has removed any reference to encoding,
>> and inserted John's suggestion for EOL treatment.
>>
>> "CIF2 is a specification for the interchange of text files.This
>> document is therefore written
>> in terms of a sequence of Unicode code points. �Particular care must
>> be taken with treatment of newline in text files. This document will
>> only refer to <0x000A> as a line terminator, as CIF2 processors are
>> required to map <0x000D>, <0x000A> and <0x000D><0x000A> to this
>> character.
>>
>> To ensure compatibility with older Fortran text processing software,
>> lines in CIF2 files should be restricted to no more than 2048
>> code points in length, not including the line terminator itself."
>>
>> On Tue, Jun 22, 2010 at 3:44 AM, Herbert J. Bernstein
>> <[email protected]> wrote:
>>>
>>> Dear Colleagues,
>>>
>>> � The IUCr is an international organization. �Is it really politically
>>> wise to insist that CIF2 tags be restricted to unaccented roman letters?
>>>
>>> � Before we go much further, may we please have a vote on explicitly
>>> changing CIF2 from the current draft wording that it is a binary
>>> format to the wording I suggested making it a text format. �Most of the
>>> rest of the issues we are dealing with hinge on that basic decision.
>>>
>>> � The wording I proposed was:
>>>
>>> "CIF2 is a specification for the interchange of text files. �Text files
>>> have many possible system dependent represenations and encodings. �To
>>> ensure clarity in the specification of CIF2, this document is written
>>> in terms of a sequence of unicode code points, and all fully compliant
>>> CIF2 processing systems should, at a minimum be able to process
>>> text files as unicode code points represented in UTF-8, subject to the
>>> XML-based restrictions below. �This approach is not meant to prevent
>>> people from preparing valid CIF2 files with non-UTF-8-based text
>>> editors, but, if a non-UTF-8 file format is produced, it is important
>>> to clearly specify the intended mapping to UTF-8. �This is particularly
>>> important in dealing with end-of-line indicators (see
>>> http://en.wikipedia.org/wiki/Newline). �When handling CIF2 files
>>> produced under MS windows, CR-LF sequences should be accepted as
>>> an alternative to LF, and when handling CIF2 files produced under
>>> Mac OS, CR should be accepted as an alternative to LF. �This document
>>> will only refer to LF as a line terminator and will assume that some
>>> appropriate system-dependent text processing system will handle
>>> the necessary conversion.
>>>
>>> To ensure compatibility with older Fortran text processing software,
>>> lines in CIF2 files should be restricted to no more than 2048
>>> code points in length, not including the line temrinator itself.
>>> Not that the UTF-8 encoding of such a line may well be much longer."
>>>
>>> If anybody objects to some specific wording in this text, let us
>>> settle on revised wording. �We need to get this basic issue
>>> clarified in writing or we will be going in circles forever.
>>>
>>>
>>> � Regards,
>>> � � Herbert
>>>
>>>
>>>
>>> At 11:30 AM -0500 6/21/10, Bollinger, John C wrote:
>>>>
>>>> On Monday, June 21, 2010 1:13 AM, James Hester wrote:
>>>>
>>>>> I prefer the XML treatment of newline (ie translated to 0x000A for
>>>>> processing purposes). �I would be in favour of restricting newline to
>>>>> <0x000A>, <0x000D> or <0x000D 0x000A>, which means that only these
>>>>> combinations have the syntactic significance of a newline.
>>>>
>>>> I would be satisfied with that approach.
>>>>
>>>>> �From
>>>>> memory, this significance is restricted to:
>>>>>
>>>>> 1. end of comment
>>>>> 2. whitespace
>>>>> 3. use in <eol><semicolon> digraph
>>>>
>>>> The significance also extends to 'single'- and "double"-quote
>>>> delimited data values, in that these cannot contain end-of-line.
>>>>
>>>>> I would also restrict the appearance of the remaining Unicode newline
>>>>> characters to delimited datavalues, to maintain consistent display of
>>>>> data files.
>>>>
>>>> I'm seeing more and more upside to restricting *all* non-ASCII
>>>> characters to delimited data values. �I don't have any objection to
>>>> restricting U+0085, U+2028, and U+2029 (did I miss any?) to such
>>>> contexts.
>>>>
>>>>
>>>> John
>>>> --
>>>> John C. Bollinger, Ph.D.
>>>> Department of Structural Biology
>>>> St. Jude Children's Research Hospital
>>>>
>>>>
>>>>
>>>>
>>>> Email Disclaimer: �www.stjude.org/emaildisclaimer
>>>>
>>>> _______________________________________________
>>>> ddlm-group mailing list
>>>> [email protected]
>>>> http://scripts.iucr.org/mailman/listinfo/ddlm-group
>>>
>>>
>>> --
>>> =====================================================
>>> �Herbert J. Bernstein, Professor of Computer Science
>>> � �Dowling College, Kramer Science Center, KSC 121
>>> � � � � Idle Hour Blvd, Oakdale, NY, 11769
>>>
>>> � � � � � � � � �+1-631-244-3035
>>> � � � � � � � � �[email protected]
>>> =====================================================
>>> _______________________________________________
>>> ddlm-group mailing list
>>> [email protected]
>>> http://scripts.iucr.org/mailman/listinfo/ddlm-group
>>>
>>
>>
>>
>> --
>> T +61 (02) 9717 9907
>> F +61 (02) 9717 3145
>> M +61 (04) 0249 4148
>> _______________________________________________
>> ddlm-group mailing list
>> [email protected]
>> http://scripts.iucr.org/mailman/listinfo/ddlm-group
>
> _______________________________________________
> ddlm-group mailing list
> [email protected]
> http://scripts.iucr.org/mailman/listinfo/ddlm-group
>
>



-- 
T +61 (02) 9717 9907
F +61 (02) 9717 3145
M +61 (04) 0249 4148
_______________________________________________
ddlm-group mailing list
[email protected]
http://scripts.iucr.org/mailman/listinfo/ddlm-group

Reply to: [list | sender only]

Follow-Ups:

Re: [ddlm-group] options/text vs binary/end-of-line. .. . (Herbert J. Bernstein)

References:

Re: [ddlm-group] [SPAM] ASSP UTF-8 BOM (Herbert J. Bernstein)

Re: [ddlm-group] UTF-8 BOM (Brian McMahon)

Re: [ddlm-group] UTF-8 BOM (James Hester)

Re: [ddlm-group] UTF-8 BOM (Herbert J. Bernstein)

Re: [ddlm-group] UTF-8 BOM (James Hester)

Re: [ddlm-group] UTF-8 BOM (Herbert J. Bernstein)

Re: [ddlm-group] UTF-8 BOM (SIMON WESTRIP)

Re: [ddlm-group] UTF-8 BOM (James Hester)

Re: [ddlm-group] UTF-8 BOM (Herbert J. Bernstein)

Re: [ddlm-group] UTF-8 BOM (James Hester)

Re: [ddlm-group] UTF-8 BOM (Herbert J. Bernstein)

Re: [ddlm-group] UTF-8 BOM (James Hester)

Re: [ddlm-group] UTF-8 BOM (Herbert J. Bernstein)

[ddlm-group] options/text vs binary/end-of-line (Herbert J. Bernstein)

Re: [ddlm-group] options/text vs binary/end-of-line. . (James Hester)

Re: [ddlm-group] options/text vs binary/end-of-line. .. . (James Hester)

Re: [ddlm-group] options/text vs binary/end-of-line. .. . (Herbert J. Bernstein)

Prev by Date: Re: [ddlm-group] Recommended character set and use restrictions. .

Next by Date: Re: [ddlm-group] options/text vs binary/end-of-line. .. .

Prev by thread: Re: [ddlm-group] options/text vs binary/end-of-line. .. .

Next by thread: Re: [ddlm-group] options/text vs binary/end-of-line. .. .

Index(es):

Date

Thread

Discussion List Archives

Re: [ddlm-group] options/text vs binary/end-of-line. .. .