[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
--
T +61 (02) 9717 9907
F +61 (02) 9717 3145
M +61 (04) 0249 4148
Reply to: [list | sender only]
Re: [Cif2-encoding] Revised Motion
- To: Group for discussing encoding and content validation schemes for CIF2 <cif2-encoding@xxxxxxxx>
- Subject: Re: [Cif2-encoding] Revised Motion
- From: James Hester <jamesrhester@xxxxxxxxx>
- Date: Fri, 1 Oct 2010 11:35:15 +1000
- In-Reply-To: <alpine.BSF.2.00.1009301325040.76563@epsilon.pair.com>
- References: <alpine.BSF.2.00.1009271801070.86201@epsilon.pair.com><alpine.BSF.2.00.1009271900080.86201@epsilon.pair.com><AANLkTikudiXBk7orHSAH=JonoeQHeNXVrzvAZmH3Wt94@mail.gmail.com><646265.82162.qm@web87004.mail.ird.yahoo.com><alpine.BSF.2.00.1009281501030.93180@epsilon.pair.com><a06240801c8c840b90dc7@192.168.2.104><20100929102536.GB24670@emerald.iucr.org><alpine.BSF.2.00.1009291001300.12237@epsilon.pair.com><20100930084028.GC9485@emerald.iucr.org><alpine.BSF.2.00.1009300540110.389@epsilon.pair.com><629785.55688.qm@web87004.mail.ird.yahoo.com><a06240802c8ca32fa3108@192.168.2.104><a06240803c8ca416a932e@192.168.2.104><8F77913624F7524AACD2A92EAF3BFA5416659DEDF0@SJMEMXMBS11.stjude.sjcrh.local><alpine.BSF.2.00.1009301325040.76563@epsilon.pair.com>
On Fri, Oct 1, 2010 at 4:04 AM, Herbert J. Bernstein <yaya@bernstein-plus-sons.com> wrote:
Dear John,
It appears you are proposing to add the words
Is, unfortunately, inaccurate and confusing and gets us back into the
"Reference to text files means binary representations of sequences of
characters, either in a system-dependent form, provided that the
characters are all drawn from the ASCII set, or alternatively as the
sequence of bytes resulting from encoding the character sequence according
to UTF-8."
looping dicussion of binary versus text. It opens up exactly the
issues we just tried to get away from of making it appear that
CIF2 is going to invalidate encodings that happen to be neither
ASCII nor UTF8. I realize that is not what you intend, but that
is what your paragraph seems to imply.
This is no an easy concept to define. I just went through a large
number of text file definitions on the web, and it is amazing how
flawed they are are in one way or another. For example, wordiq
says, "Text files (plain text files) are files with generally a one-to-one
correspondence between the bytes and ordinary readable characters such as
letters and digits," but that defintion fails to consider UTF8 a text
file deifnition because it maps multiple bytes to readable characters
and multiple, very different byte sequences, all map to the same
redable character. The W3C definition is even more vague than the
CIF non-definition:
"The text Content-Type is intended for sending material which is
principally textual in form. It is the default Content- Type. A "charset"
parameter may be used to indicate the character set of the body text. The
primary subtype of text is "plain". This indicates plain (unformatted)
text. The default Content-Type for Internet mail is "text/plain;
charset=us-ascii".
Beyond plain text, there are many formats for representing what might be
known as "extended text" -- text with embedded formatting and presentation
information. An interesting characteristic of many such representations is
that they are to some extent readable even without the software that
interprets them. It is useful, then, to distinguish them, at the highest
level, from such unreadable data as images, audio, or text represented in
an unreadable form. In the absence of appropriate interpretation software,
it is reasonable to show subtypes of text to the user, while it is not
reasonable to do so with most nontextual data.
Such formatted textual data should be represented using subtypes of text.
Plausible subtypes of text are typically given by the common name of the
representation format, e.g., "text/richtext". "
Coming to an acceptable formal resolution on the meaning of "text" would
seem likely to take a very, very long time. We need to move on.
Please recall that what we are discussing is a revision to the existing.
larger CIF 1.1 syntax definition to create the CIF2 syntax definition,
and are just trying to get a clear enough definition of what users and
software developers need to do to cope with the extension of the
number of code points past 126.
I would suggest that we go forward with the motion as it stands now
and that we all carefully read CIF 1.1 syntax definition to see if
and where it might make sense to insert some clear, agreed definition of
a text file at some future time, but I really don't think most users or
software developers will have a serious problem in getting started with
CIF2 leaving the any ambiguty about the concept of a text file at the same
level it has been under CIF1 with this motion added.
Once we have a clear, agreed understanding of the more metaphysical
aspects of what text is, we can then share that with the
community. Meanwhile, they hopefully will already be using CIF2.
Regards,
Herbert
=====================================================
Herbert J. Bernstein, Professor of Computer Science
Dowling College, Kramer Science Center, KSC 121
Idle Hour Blvd, Oakdale, NY, 11769
+1-631-244-3035
yaya@dowling.edu
=====================================================
On Thu, 30 Sep 2010, Bollinger, John C wrote:
>
> On Thursday, September 30, 2010 8:40 AM, Herbert J. Bernstein wrote:
>> James and I had a good e-meeting and came up with the following
>> revised wording. If anybody objects to this motion, please speak
>> up now.
>
> With apologies, I object. This proposal has exactly the same problem
> that options (1) and (2) did: it does not define "text file". It is
> worse in this case, however, because the problem cannot be fixed merely
> by adding Herbert's definition (or mine). In most environments that
> definition does not encompass UTF-8 encoded text containing non-ASCII
> characters, so the recommendation to use UTF-8 implies some other,
> ill-defined definition.
>
> I am quite surprised that the result presented is so different from
> James's recent compromise proposal, which seemed poised to serve as the
> basis for a consensus result. Perhaps a viable solution would be to
> include a definition of "text file" derived from that proposal.
>
>
> Regards,
>
> John
> --
> John C. Bollinger, Ph.D.
> Department of Structural Biology
> St. Jude Children's Research Hospital
>
>
> Email Disclaimer: www.stjude.org/emaildisclaimer
>
> _______________________________________________
> cif2-encoding mailing list
> cif2-encoding@iucr.org
> http://scripts.iucr.org/mailman/listinfo/cif2-encoding
>
_______________________________________________
cif2-encoding mailing list
cif2-encoding@iucr.org
http://scripts.iucr.org/mailman/listinfo/cif2-encoding
--
T +61 (02) 9717 9907
F +61 (02) 9717 3145
M +61 (04) 0249 4148
_______________________________________________ cif2-encoding mailing list cif2-encoding@iucr.org http://scripts.iucr.org/mailman/listinfo/cif2-encoding
Reply to: [list | sender only]
- Follow-Ups:
- Re: [Cif2-encoding] Revised Motion (Herbert J. Bernstein)
- References:
- Re: [Cif2-encoding] How we wrap this up (Herbert J. Bernstein)
- Re: [Cif2-encoding] How we wrap this up (Herbert J. Bernstein)
- Re: [Cif2-encoding] How we wrap this up (James Hester)
- Re: [Cif2-encoding] How we wrap this up (SIMON WESTRIP)
- Re: [Cif2-encoding] How we wrap this up (Herbert J. Bernstein)
- Re: [Cif2-encoding] How we wrap this up (Brian McMahon)
- [Cif2-encoding] Skype conference call 8:45 am EDT,Thursday 30 September 2010 (Herbert J. Bernstein)
- Re: [Cif2-encoding] Skype conference call 8:45 am EDT,Thursday 30 September 2010 (Brian McMahon)
- Re: [Cif2-encoding] Skype conference call 8:45 am EDT,Thursday 30 September 2010 (Herbert J. Bernstein)
- Re: [Cif2-encoding] Skype conference call 8:45 am EDT,Thursday 30 September 2010 (SIMON WESTRIP)
- Re: [Cif2-encoding] Revised Motion (Bollinger, John C)
- Re: [Cif2-encoding] Revised Motion (Herbert J. Bernstein)
- Prev by Date: Re: [Cif2-encoding] Revised Motion
- Next by Date: Re: [Cif2-encoding] Revised Motion
- Prev by thread: Re: [Cif2-encoding] Revised Motion
- Next by thread: Re: [Cif2-encoding] Revised Motion
- Index(es):