Discussion List Archives

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [ddlm-group] Searching for a compromise on eliding. .. .

Dear John B.,

   I am not saying the 2.7 cannot handle non-ASCII encodings.  What I
am saying that the recommended way to handle UTF-8 in 2.7 is to use
the u""" treble quote, rather than the """ treble quote to take care
of any confusion caused by use of the octal and hex escapes.  In 3 it
works the other way, the """ treble quote does what one would expect
with the unicode code points, while you use b""" to get the ASCII-oriented
behavior.  People have learned to cope with """ under 2.7 in a UTF-8 
environment, and with """ under 3 in an ASCII environment, but 
personally, I find it is easier to cope if I use the full feature set, 
rather than just half of it.

   To some extent, every implementation of any system is likely to have
some idiosyncracies, and I suspect option P-prime will work for most
people, whether we use the 2.7 model or the 3 model, but as John W.
has suggested, we are likely to get more ready acceptance of CIF 2 if
we try as hard as we can to avoid the "non-invented-here" syndrome
and try to have as much common ground as we can tolerate with
something well-established, like Python.

   To be very, very clear, my personal preferences is P, then P-prime,
and, as a last resort, F.  The further we get from imitating Python,
the more difficulty I think we are likely to have.

  Herbert J. Bernstein, Professor of Computer Science
    Dowling College, Kramer Science Center, KSC 121
         Idle Hour Blvd, Oakdale, NY, 11769


On Fri, 25 Feb 2011, Bollinger, John C wrote:

> Dear Herbert,
> On Friday, February 25, 2011 12:25 PM, you wrote:
>> Allow me to clarify the choices involved among the 2.7 and 3 treble
>> uoted strings.  The Python 3 bytes strings (b""" or B""" or b''' or B''')
>> are essentially the Python 2.7 treble quoted strings, and are ASCII
>> oriented.  The Python 2.7 unicode strings (u""" or U""" or u''' or U''')
>> are essentially the Python 3 treble quoted strings) and are unicode
>> oriented.  The difference arises because 2.7 is based on 7-bit ASCII
>> and 3 is UTF8 oriented.  Thus 2.7 is a better design fit for compatibility
>> with CIF1.1 data, while 3 is a better design for compatibility
>> with CIF2 data.  The transition between CIF1.1 and CIF2 will go more
>> smoorthly with both ASCII and Unicode supported in the string context,
>> i.e. with proposal P using both the ASCII-oriented treble quoted strings
>> _and_ the Unicode oriented unicode treble-quoted strings from 2.7 or
>> the functionally equivalent ASCII-oriented bytes trebble quoted strings
>> _and_ Unicode oriented treble-quoted string from 3.
> As you have pointed out before, Python 2.7 is not limited to 7-bit ASCII.  I would therefore be surprised to find that Python 2.7 rejects a plain string in a UTF-8-encoded source file on account of it containing a literal character not belonging to the ASCII set.  If it in fact does so, then I agree that Python 2's is not a model that we can adopt for CIF 2.
> I would expect, however, that when Python encounters such a situation, it records the encoded byte sequence read from the source file in the resulting string object.  If adoption of Python 2 syntax for CIF 2 is supposed to imply such a thing about the interpretation or internal representation of non-ASCII character literals appearing in triple-quoted strings, then for that reason I agree that Python 2's is not a model that we can adopt for CIF 2.  We are in control of that, however, and I see no reason why we should adopt that approach.
> If neither of the above counterindications is present, then there is no reason why CIF 2 could not rely on Python 2 syntax to express triple-quoted strings containing literal Unicode characters, and doing so would be entirely compatible with Python 2.
> Python does provide different object types for Unicode strings and byte strings, thus it provides different, albeit related, syntax for expressing literals of each type.  Neither CIF 1 nor CIF 2 makes a comparable distinction, therefore CIF does not have the same need to distinguish string types that Python does.
> I agree, and I have said so before, that Python 3's Unicode-based model for default strings is a better match for CIF 2's data model than is Python 2's byte-oriented model.  My comments about the unsuitability for CIF's purposes of Python 2 octal and hex escapes reflect that.  It is the full syntax that I am unwilling to support, and especially the \N{name} escape.  Even the \uxxxx and \Uxxxxxxxx are unsatisfactory to me, however, on account of the fact that Python accounts it a syntax error if \u is not followed by at least four hex digits, or if \U is not followed by eight.
> I do appreciate the usefulness of escape sequences for representing general Unicode characters in CIF 2.  You will recall that I proposed a syntax that offered Python-inspired \u and \U escapes, though it received only lukewarm support.  It was always my intention, however, to follow the letter of the Python spec that unrecognized escape sequences are accepted without processing.  Had I realized at the time that Python implementations behaved differently, then I might not have made that proposal at all.
>> I don't see what is bad about using LGPL'd libraries.  That allows you
>> to kick start your application fast using other people's code and when
>> you have a version with better performance or features to swap in you own
>> improved version.  It is a powerful and effective software development
>> model.  As long s you stick to operating-system level libraries (and mine
>> are) there are absolutely no negative license implications for proprietary
>> software development, witness the use of CBFlib in proprietary
>> applications.
> I don't personally have a problem with LGPL'd libraries in general, but there certainly are people who do.  My objections in particular to CIF 2 relying on them include these:
> [General]
> 1) If we have to promise to provide libraries in order to make CIF 2 adequately accessible to developers, then CIF 2 is too complex.
> 2) I want to avoid a library implementation being mistaken for a normative specification, which evidently is a risk.
> 3) I am unsatisfied with having to include a copy of the UCD with every CIF 2 application, whether in library form or some other.
> 4) I do not favor CIF2 obligating COMCIFS, or through it IUCr, to maintain a software project, but
> 5) I also do not favor COMCIFS delegating responsibility for essential code to an outside party.
> [LGPL]
> 6) There indeed are people who cannot or will not use LGPL code for a variety of reasons.  I disfavor making CIF 2 unavailable to them.
> 7) Because CIF is patented, there is a non-trivial legal question regarding patent licensing.  In particular, the (L)GPL appears to have different implications for patent licensing if IUCr licenses or sponsors CIF software than it does if a third party who has no CIF patent license is the software licensor.  Whether IUCr would ultimately agree or not, I prefer a plan that does not require a legal review.
> Regards,
> John
> --
> John C. Bollinger, Ph.D.
> Department of Structural Biology
> St. Jude Children's Research Hospital
> Email Disclaimer:  www.stjude.org/emaildisclaimer
> _______________________________________________
> ddlm-group mailing list
> ddlm-group@iucr.org
> http://scripts.iucr.org/mailman/listinfo/ddlm-group
ddlm-group mailing list

Reply to: [list | sender only]
International Union of Crystallography

Scientific Union Member of the International Council for Science (admitted 1947). Member of CODATA, the ICSU Committee on Data. Member of ICSTI, the International Council for Scientific and Technical Information. Partner with UNESCO, the United Nations Educational, Scientific and Cultural Organization in the International Year of Crystallography 2014.

ICSU Scientific Freedom Policy

The IUCr observes the basic policy of non-discrimination and affirms the right and freedom of scientists to associate in international scientific activity without regard to such factors as ethnic origin, religion, citizenship, language, political stance, gender, sex or age, in accordance with the Statutes of the International Council for Science.