Discussion List Archives

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [ddlm-group] Simon's elide proposal


   1.  I do think there is value in having CIF capable
for functioning as a programming language, and see
nothing to be gained by crippling its ability to
function in that role.  As noted there is a move now
towards the creation of executable papers to allow
journale articles to be better reviewed.  Opening
the dREL features to more general use than just in
dictionaries would allow the IUCr to explore this
very important direction and be more competitive
with Elsevier, so I strongly disagree with John's
value judgement that a rich feature set is somehow

2.  I find the ability to have escapes to handle the
"illegal" characters useful for imgCIF which need
to be able to handle at least the range of 15 out
of 16 bits without breaks.

3. I find the \N{}, \a, ... constructs useful for the reasons
in 1, above.  In point fo fact, I think we would be best
of following Brian's original approach of being "maximally
disrupttive" and requiring a uniform translation of all
the IUCr glyphs that conflict with current programming
practice to an escaped form \\a

In any case, I think you should get the point -- this
really is a matter of taste, not a technical issue.
I find python compatability a strong plus to help
move us into the executable paper realm, indeed
to help move CIF into being a scripting language.

If anyone wants even more detail, I will be happy
to send it in an off-list message, but it think it
really might be wise to address the broader issues
of what we want CIF to be, first, before we get
too far into those technical details.  Right now
I have to get packed to be able to catch a plane,
so the longer answer will have to wait until tomorrow.

  Herbert J. Bernstein, Professor of Computer Science
    Dowling College, Kramer Science Center, KSC 121
         Idle Hour Blvd, Oakdale, NY, 11769


On Fri, 14 Jan 2011, James Hester wrote:

> Dear Herbert,
> Au contraire, I would not be bored, I'd be fascinated by a
> point-by-point rebuttal.  I find John's assessment spot-on and do not
> think dismissing his points as a matter of taste shows much respect
> for the amount of time he has put in to formulate these comments.
> Please go ahead and rebut his points.
> James.
> On Fri, Jan 14, 2011 at 5:42 AM, Herbert J. Bernstein
> <yaya@bernstein-plus-sons.com> wrote:
>> Dear Colleagues,
>>   I will not bore you all with a point-by-point rebuttal
>> to John B.'s negative assessment of Python treble quote
>> use in a CIF context.  Most of what he sees as defects,
>> I see as virtues.  Such are differences in taste, and
>> more importantly in the uses to which we put CIF.  Especially
>> with the introduction of dREL and DDLm, I do see CIF as
>> a programming language, and as one with strong similarities
>> to Python.  That does not mean everybody has to use it that
>> way, just that it would be nice if those who use it one way
>> and those who use it another could find some common ground.
>> The is now a move towards executable papers, and I suspect
>> a more powerful and fexible python compatible CIF could be
>> a strong competitor in that area.  Indeed, if current trends
>> continue, the IUCr is likely to need programming support
>> in papers if it is to keep up.
>> One point that does need a rebuttal...
>>> It should also be noted that Python source code, including its string
>>> literals, is restricted to being expressed in the characters of the
>>> 7-bit ASCII character set (though they need not necessarily be encoded
>>> according to US-ASCII).  Unconditional, bidirectional CIF/Python string
>>> compatibility would require that we apply the same restriction to CIF2
>>> triple-quoted strings.  I would oppose that.
>> That started to change in Python 2.5 which allowed explicit encoding
>> declarations, and by Python 3 has vanished even without an
>> encoding declaration.  The Python 3 spec is:
>> "Python reads program text as Unicode code points; the encoding
>> ... defaults to UTF8"
>> For more on how Python dealt with this issue as the same time
>> we were considering it, see:
>> http://www.python.org/dev/peps/pep-3120/
>> =====================================================
>>  Herbert J. Bernstein, Professor of Computer Science
>>    Dowling College, Kramer Science Center, KSC 121
>>         Idle Hour Blvd, Oakdale, NY, 11769
>>                  +1-631-244-3035
>>                  yaya@dowling.edu
>> =====================================================
>> On Thu, 13 Jan 2011, Bollinger, John C wrote:
>>> On Thursday, January 13, 2011 7:10 AM, SIMON WESTRIP wrote:
>>>> Let's assume we were starting with CIF2 that included a minimal 
>>>> scheme like F'.
>>> What then would be gained by adopting the full python specification of 
>>> string literals?
>>>> 1) "Cleaner" presentation in the very rare cases that the eliding 
>>>> system would be needed in order to accommodate delimiters within the 
>>>> value. This is purely a matter of taste.
>>>> 2) Ability to include raw strings using the 'r' prefix. But in CIF2 
>>>> as it stands, all strings are 'raw'.
>>> Yes, but that will no longer be true if any of the proposals we're 
>>> discussing is adopted.
>>>> Perhaps others can add to this list?
>>>> From the perspective of technical features only:
>>> 3) Three distinct forms for expressing Unicode characters via ASCII 
>>> characters; one is restricted to characters from the BMP, but the 
>>> others are general
>>> 4) Two forms for expressing 8-bit characters (from some undocumented 
>>> character set, probably the source character set) via ASCII characters
>>> 5) Several elides for specific whitespace and non-printing ASCII 
>>> characters, some of which are not among the allowed CIF characters, 
>>> and all of which clash with the IUCr application-level elides
>>> 6) A mechanism for indicating whether the three forms of Unicode 
>>> elides of item (3) should in fact be processed, or not.
>>> 7) A mechanism for representing a byte-string data object, or possibly 
>>> a stub for such a feature, depending on which Python version serves as 
>>> a reference
>>> Commentary:
>>> I think that makes a complete list of the new technical features that 
>>> full Python string literals would bring to CIF, beyond those of 
>>> proposal F.  I ignore a few semantic details that are mostly 
>>> consistent with the current CIF specifications.
>>> Python's is indeed a rich feature set, but that is one of my 
>>> objections to its use for CIF.  CIF is a data representation language, 
>>> not a programming language, so once the language can represent 
>>> everything in its present and future domain, alternative 
>>> representation mechanisms add little.  People can and do write CIF by 
>>> hand, but I don't think that use case is of sufficient import to 
>>> justify convenience features solely for its support, particularly when 
>>> such features present problems in other respects.
>>> Furthermore, Python admits essentially one implementation (changing 
>>> slowly over time), so a rich feature set does not present 
>>> compatibility problems.  CIF, however, anticipates many 
>>> implementations, so the number and complexity of its features 
>>> contribute to the likelihood of incompatibility between 
>>> implementations.
>>> Most importantly, however, I think several of the Python features are 
>>> inappropriate for CIF, and I specifically want them excluded:
>>> a) The \N{name} syntax for designating Unicode characters by UCD name. 
>>>  I view this as the single greatest locus for bugs and 
>>> incompatibility, both among CIF implementations and between CIF and 
>>> Python.  Large among the questions here is *which version of the UCD 
>>> is referenced*?  That can evolve over time in Python, but it must be 
>>> fixed in CIF, at least for each CIF version.  Shall we plan to issue a 
>>> new version of CIF every time Python moves up to a new Unicode 
>>> version, and to deal with the multiple resulting versions?  Must every 
>>> CIF2 implementation lug along a name=>character table just for this? 
>>>  It is redundant with the other two Unicode elides.
>>> b) The [uU] prefix.  In Python, Unicode strings are a different type 
>>> of object than ordinary strings, which is the main reason for the [uU] 
>>> syntax.  All CIF2 strings are Unicode strings, however (so there's an 
>>> unavoidable semantic difference regardless).  In CIF the [uU] prefix 
>>> could still turn on and off processing of Unicode elides, but to what 
>>> end?  In rare cases, to yield a slightly simpler representation of 
>>> strings that would otherwise clash with one of the Unicode elide 
>>> sequences.  Should we really require all conforming CIF processors to 
>>> implement rules to support that obscure case, even though it can 
>>> reasonably be handled by the \\ elide instead?
>>> c) The [bB] prefix.  I'm not clear on what it will mean in Python 3, 
>>> but it is ignored in Python 2.  The only Python 3 meanings I can 
>>> imagine are incompatible with CIF, and there is no technical advantage 
>>> for CIF in including [bB] just to ignore it.
>>> d) The [rR] prefix.  In Python, this turns off elide processing for 
>>> the string, except that if the [uU] prefix is also present then 
>>> Unicode elides are still handled.  Also, the \\ elide is handled, but 
>>> differently than for other string literals.  I would be happier with 
>>> this for CIF, though  still not in favor, if it were a universal 
>>> on/off for all elides.  Furthermore, as Simon pointed out, raw strings 
>>> are what we have now.  Supposing that we use the Python rule that 
>>> unrecognized elides are treated as literals, the value of [rR] raw 
>>> strings for CIF depends on how many and which elides we adopt. 
>>>  Inasmuch as I favor restriction to only a few elides, I don't see 
>>> [rR] adding much of value.
>>> e) The \a, \b, \f, \n, \r, \t, and \v elides.  These needlessly clash 
>>> with the IUCr elides, they are redundant with Unicode elides, and they 
>>> express characters that either can appear in as literals in 
>>> triple-quoted strings or are not allowed CIF characters (more on that 
>>> in a separate message).  Including these would complicate CIF 
>>> implementations for little or no technical advantage.
>>> f) The \ooo and \xhh elides.  These are redundant with the Unicode 
>>> elides.  Moreover, they are byte-oriented in standard strings (so that 
>>> their actual meaning depends on the source or runtime character set), 
>>> but character-oriented in Unicode strings (there *thoroughly* 
>>> redundant with the \uxxxx and \Uxxxxxxxx forms).
>>> That leaves very few Python string features that I could support being 
>>> added to CIF (triple-quoted strings only), to wit:

>>> \<newline>
>>> \uxxxx
>>> \Uxxxxxxxx
>>> \'
>>> \"
>>> \\
>>> Among those, \' and \" serve only the purpose of delimiter elision; 
>>> the others have larger scopes.  Given that the need to elide 
>>> delimiters is likely to be quite rare, and that these two clash with 
>>> the IUCr elides, I would prefer to omit them.
>>> As for the two Unicode escapes, it turns out that when the \[uU] is 
>>> not followed by the expected number of hex digits, the Python 2.4 
>>> behavior differs from what the documentation lead me to believe. 
>>>  Python throws a UnicodeDecodeError in such cases, rather than 
>>> applying "all unrecognized escape sequences are left in the string 
>>> unchanged" to the whole construct.  With respect to those forms, if 
>>> they are included then I would prefer that constructs such as 
>>> '''\u065q''' be treated as literals rather than error cases.  (And 
>>> thus, to be subject to further interpretation at the application 
>>> level.)
>>> Regards,
>>> John
>>> --
>>> John C. Bollinger, Ph.D.
>>> Department of Structural Biology
>>> St. Jude Children's Research Hospital
>>> Email Disclaimer:  www.stjude.org/emaildisclaimer
>>> _______________________________________________
>>> ddlm-group mailing list
>>> ddlm-group@iucr.org
>>> http://scripts.iucr.org/mailman/listinfo/ddlm-group
>> _______________________________________________
>> ddlm-group mailing list
>> ddlm-group@iucr.org
>> http://scripts.iucr.org/mailman/listinfo/ddlm-group
> -- 
> T +61 (02) 9717 9907
> F +61 (02) 9717 3145
> M +61 (04) 0249 4148
> _______________________________________________
> ddlm-group mailing list
> ddlm-group@iucr.org
> http://scripts.iucr.org/mailman/listinfo/ddlm-group
ddlm-group mailing list

Reply to: [list | sender only]
International Union of Crystallography

Scientific Union Member of the International Science Council (admitted 1947). Member of CODATA, the ISC Committee on Data. Partner with UNESCO, the United Nations Educational, Scientific and Cultural Organization in the International Year of Crystallography 2014.

International Science Council Scientific Freedom Policy

The IUCr observes the basic policy of non-discrimination and affirms the right and freedom of scientists to associate in international scientific activity without regard to such factors as ethnic origin, religion, citizenship, language, political stance, gender, sex or age, in accordance with the Statutes of the International Council for Science.