Discussion List Archives

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Cif2-encoding] Revised Motion

Dear James,

   I know what John meanss, but it is not what his amendment says.
That is not because he is bad at wording this.  I have checked
and a lot of people have tried for many years to come up
with something with similar intent and have failed.  That does
not mean we should stop trying, but it does mean that it _has_
to be decoupled from getting CIF2 moving, or we will be at
this forever, looping trying to find words that express an
idea that does _not_ need to be settled to allow code to
be written and files to be created.

   Please, the current revised motions as it stands expresses
what is probably the only comprmise we can reach in finite
time.  A signficant majority of this group favor it.  You
were mistaken in thinking John would favor it. So be it.
There is _nothing_ bad in the current motion.  Let us not
screw it up with John's poorly phrased paragraph.  If he
can come up with decent wording some time for what he is
trying to say before this summer and there is broad support,
we can always include it at Madrid, but holding up progress
on CIF2 any further now is very unwise.

   Please let us have an end to this and go with what you and I
negotiated.

   Regards,
     Herbert

=====================================================
  Herbert J. Bernstein, Professor of Computer Science
    Dowling College, Kramer Science Center, KSC 121
         Idle Hour Blvd, Oakdale, NY, 11769

                  +1-631-244-3035
                  yaya@dowling.edu
=====================================================

On Fri, 1 Oct 2010, James Hester wrote:

> Hi Herbert - I think you are misinterpreting John.  He is not trying to
> impose ASCII encoding, he is simply using the term ASCII to refer to Unicode
> code points less than 127.  The only disagreement that I think you and I
> could have with his wording is that it does not leave open the possibility
> of COMCIFS approving encodings other than UTF8 and UTF16 (John is not
> against adding UTF16).
> 
> On Fri, Oct 1, 2010 at 4:04 AM, Herbert J. Bernstein
> <yaya@bernstein-plus-sons.com> wrote:
>       Dear John,
>
>       It appears you are proposing to add the words
>
>       "Reference to text files means binary representations of
>       sequences of
>       characters, either in a system-dependent form, provided that the
>       characters are all drawn from the ASCII set, or alternatively as
>       the
>       sequence of bytes resulting from encoding the character sequence
>       according
>       to UTF-8."
> 
> Is, unfortunately, inaccurate and confusing and gets us back into the
> looping dicussion of binary versus text.  It opens up exactly the
> issues we just tried to get away from of making it appear that
> CIF2 is going to invalidate encodings that happen to be neither
> ASCII nor UTF8.  I realize that is not what you intend, but that
> is what your paragraph seems to imply.
> 
> This is no an easy concept to define.  I just went through a large
> number of text file definitions on the web, and it is amazing how
> flawed they are are in one way or another.  For example, wordiq
> says, "Text files (plain text files) are files with generally a
> one-to-one
> correspondence between the bytes and ordinary readable characters such
> as
> letters and digits," but that defintion fails to consider UTF8 a text
> file deifnition because it maps multiple bytes to readable characters
> and multiple, very different byte sequences, all map to the same
> redable character.  The W3C definition is even more vague than the
> CIF non-definition:
> 
> "The text Content-Type is intended for sending material which is
> principally textual in form. It is the default Content- Type. A
> "charset"
> parameter may be used to indicate the character set of the body text.
> The
> primary subtype of text is "plain". This indicates plain (unformatted)
> text. The default Content-Type for Internet mail is "text/plain;
> charset=us-ascii".
> 
> Beyond plain text, there are many formats for representing what might
> be
> known as "extended text" -- text with embedded formatting and
> presentation
> information. An interesting characteristic of many such
> representations is
> that they are to some extent readable even without the software that
> interprets them. It is useful, then, to distinguish them, at the
> highest
> level, from such unreadable data as images, audio, or text represented
> in
> an unreadable form. In the absence of appropriate interpretation
> software,
> it is reasonable to show subtypes of text to the user, while it is not
> reasonable to do so with most nontextual data.
> 
> Such formatted textual data should be represented using subtypes of
> text.
> Plausible subtypes of text are typically given by the common name of
> the
> representation format, e.g., "text/richtext". "
> 
> Coming to an acceptable  formal resolution on the meaning of "text"
> would
> seem likely to take a very, very long time.  We need to move on.
> 
> Please recall that what we are discussing is a revision to the
> existing.
> larger CIF 1.1 syntax definition to create the CIF2 syntax definition,
> and are just trying to get a clear enough definition of what users and
> software developers need to do to cope with the extension of the
> number of code points past 126.
> 
> I would suggest that we go forward with the motion as it stands now
> and that we all carefully read CIF 1.1 syntax definition to see if
> and where it might make sense to insert some clear, agreed definition
> of
> a text file at some future time, but I really don't think most users
> or
> software developers will have a serious problem in getting started
> with
> CIF2 leaving the any ambiguty about the concept of a text file at the
> same
> level it has been under CIF1 with this motion added.
> 
> Once we have a clear, agreed understanding of the more metaphysical
> aspects of what text is, we can then share that with the
> community.  Meanwhile, they hopefully will already be using CIF2.
> 
> Regards,
>     Herbert
> 
> 
> 
> 
> =====================================================
>  Herbert J. Bernstein, Professor of Computer Science
>    Dowling College, Kramer Science Center, KSC 121
>         Idle Hour Blvd, Oakdale, NY, 11769
> 
>                  +1-631-244-3035
>                  yaya@dowling.edu
> =====================================================
> 
> On Thu, 30 Sep 2010, Bollinger, John C wrote:
> 
> >
> > On Thursday, September 30, 2010 8:40 AM, Herbert J. Bernstein wrote:
> >>   James and I had a good e-meeting and came up with the following
> >> revised wording.  If anybody objects to this motion, please speak
> >> up now.
> >
> > With apologies, I object.  This proposal has exactly the same
> problem
> > that options (1) and (2) did: it does not define "text file".  It is
> > worse in this case, however, because the problem cannot be fixed
> merely
> > by adding Herbert's definition (or mine).  In most environments that
> > definition does not encompass UTF-8 encoded text containing
> non-ASCII
> > characters, so the recommendation to use UTF-8 implies some other,
> > ill-defined definition.
> >
> > I am quite surprised that the result presented is so different from
> > James's recent compromise proposal, which seemed poised to serve as
> the
> > basis for a consensus result.  Perhaps a viable solution would be to
> > include a definition of "text file" derived from that proposal.
> >
> >
> > Regards,
> >
> > John
> > --
> > John C. Bollinger, Ph.D.
> > Department of Structural Biology
> > St. Jude Children's Research Hospital
> >
> >
> > Email Disclaimer:  www.stjude.org/emaildisclaimer
> >
> > _______________________________________________
> > cif2-encoding mailing list
> > cif2-encoding@iucr.org
> > http://scripts.iucr.org/mailman/listinfo/cif2-encoding
> >
> _______________________________________________
> cif2-encoding mailing list
> cif2-encoding@iucr.org
> http://scripts.iucr.org/mailman/listinfo/cif2-encoding
> 
> 
> 
> 
> --
> T +61 (02) 9717 9907
> F +61 (02) 9717 3145
> M +61 (04) 0249 4148
> 
>
_______________________________________________
cif2-encoding mailing list
cif2-encoding@iucr.org
http://scripts.iucr.org/mailman/listinfo/cif2-encoding

Reply to: [list | sender only]
International Union of Crystallography

Scientific Union Member of the International Science Council (admitted 1947). Member of CODATA, the ISC Committee on Data. Partner with UNESCO, the United Nations Educational, Scientific and Cultural Organization in the International Year of Crystallography 2014.

International Science Council Scientific Freedom Policy

The IUCr observes the basic policy of non-discrimination and affirms the right and freedom of scientists to associate in international scientific activity without regard to such factors as ethnic origin, religion, citizenship, language, political stance, gender, sex or age, in accordance with the Statutes of the International Council for Science.