[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Reply to: [list | sender only]
Re: [ddlm-group] Simon's elide proposal
- To: Group finalising DDLm and associated dictionaries <ddlm-group@iucr.org>
- Subject: Re: [ddlm-group] Simon's elide proposal
- From: "Herbert J. Bernstein" <yaya@bernstein-plus-sons.com>
- Date: Tue, 18 Jan 2011 05:45:42 -0500 (EST)
- In-Reply-To: <AANLkTimALVr6Ny6yE-7pswC+EaGwDKBziCvBmou7cf9M@mail.gmail.com>
- References: <AANLkTimdAavg2KCjPZTj1xDYXDQ1JLiQCkQb4snyBErZ@mail.gmail.com><alpine.BSF.2.00.1101120536370.71134@epsilon.pair.com><AANLkTimA8+YXbJ8yS0AtKgFjq9221oMFjR6habn6DsXR@mail.gmail.com><alpine.BSF.2.00.1101120834010.42232@epsilon.pair.com><8F77913624F7524AACD2A92EAF3BFA54166D7D1EA8@SJMEMXMBS11.stjude.sjcrh.local><alpine.BSF.2.00.1101121400400.85750@epsilon.pair.com><alpine.BSF.2.00.1101121556380.31518@epsilon.pair.com><698308.91583.qm@web87015.mail.ird.yahoo.com><alpine.BSF.2.00.1101121845060.90622@epsilon.pair.com><722757.13635.qm@web87012.mail.ird.yahoo.com><8F77913624F7524AACD2A92EAF3BFA54166D7D1EB2@SJMEMXMBS11.stjude.sjcrh.local><alpine.BSF.2.00.1101131319580.27153@epsilon.pair.com><AANLkTinriGRksUvs+irA7NS6tJePaVa7s7dYGWQ9ktd9@mail.gmail.com><alpine.BSF.2.00.1101140506420.94749@epsilon.pair.com><AANLkTimALVr6Ny6yE-7pswC+EaGwDKBziCvBmou7cf9M@mail.gmail.com>
I respecfully disagree, but will refrain from detailed comment until we settle on out goals, at which point I suspect much of this discussion will become moot. -- Herbert ===================================================== Herbert J. Bernstein, Professor of Computer Science Dowling College, Kramer Science Center, KSC 121 Idle Hour Blvd, Oakdale, NY, 11769 +1-631-244-3035 yaya@dowling.edu ===================================================== On Tue, 18 Jan 2011, James Hester wrote: > John has made some good points in reply, to which I'll add a few others: > > On Fri, Jan 14, 2011 at 9:25 PM, Herbert J. Bernstein > <yaya@bernstein-plus-sons.com> wrote: >> OK: >> >> 1. I do think there is value in having CIF capable >> for functioning as a programming language, and see >> nothing to be gained by crippling its ability to >> function in that role. As noted there is a move now >> towards the creation of executable papers to allow >> journale articles to be better reviewed. Opening >> the dREL features to more general use than just in >> dictionaries would allow the IUCr to explore this >> very important direction and be more competitive >> with Elsevier, so I strongly disagree with John's >> value judgement that a rich feature set is somehow >> negative. > > This whole paragraph makes no sense without a more concrete > explanation of how you plan to turn a data container into a > programming language, and why the current CIF2+DDLm+dREL framework is > not adequate to the task you envisage. Regardless of the > well-advisedness or otherwise of the quest to turn CIF into a > programming language, I would note that simply adopting the string > literal syntax of a programming language does not in any way make a > data format somehow more like a programming language - string literal > syntax is simply syntactic sugar for specifying a sequence of bytes. > >> 2. I find the ability to have escapes to handle the >> "illegal" characters useful for imgCIF which need >> to be able to handle at least the range of 15 out >> of 16 bits without breaks. > > You can have all the escapes you want by either creating a new string > type in DDLm, or describing a syntax right in the item definition. > There is therefore no need to impose a heap of syntactic sugar on the > entire CIF community simply to satisfy a domain-specific application. > I will clarify for John B that I consider that disallowed Unicode code > points should not appear in any CIF datavalue. CIF datavalues can > obviously be transformed to include those code points in application > specific contexts, so, for example, Herbert can define \b to mean > ASCII BEL in a particular imgCIF item definition if he thinks that > useful, or a LaTeX processor can take LaTeX text inside a CIF > datavalue and turn it into DVI. > >> 3. I find the \N{}, \a, ... constructs useful for the reasons >> in 1, above. In point fo fact, I think we would be best >> of following Brian's original approach of being "maximally >> disrupttive" and requiring a uniform translation of all >> the IUCr glyphs that conflict with current programming >> practice to an escaped form \\a > > In addition to John B's entirely reasonable comments, note that > supporting the \N construct would create a dependency on the whole > Unicode database in every CIF2 parser. Am I really the only one who > finds this ridiculous? Asking the IUCr to wholesale redefine their > glyphs would require your application to be considerably more > important, which is far from demonstrated. > >> In any case, I think you should get the point -- this >> really is a matter of taste, not a technical issue. >> I find python compatability a strong plus to help >> move us into the executable paper realm, indeed >> to help move CIF into being a scripting language. > > No, as John says, it is an important design issue. Unlike a > programming language, we have several layers at which meaning can be > created: syntactic, DDL, and domain dictionary. To ignore the latter > two is to misunderstand the entire CIF project. > >> If anyone wants even more detail, I will be happy >> to send it in an off-list message, but it think it >> really might be wise to address the broader issues >> of what we want CIF to be, first, before we get >> too far into those technical details. Right now >> I have to get packed to be able to catch a plane, >> so the longer answer will have to wait until tomorrow. >> >> ===================================================== >> Herbert J. Bernstein, Professor of Computer Science >> Dowling College, Kramer Science Center, KSC 121 >> Idle Hour Blvd, Oakdale, NY, 11769 >> >> +1-631-244-3035 >> yaya@dowling.edu >> ===================================================== >> >> On Fri, 14 Jan 2011, James Hester wrote: >> >>> Dear Herbert, >>> >>> Au contraire, I would not be bored, I'd be fascinated by a >>> point-by-point rebuttal. I find John's assessment spot-on and do not >>> think dismissing his points as a matter of taste shows much respect >>> for the amount of time he has put in to formulate these comments. >>> Please go ahead and rebut his points. >>> >>> James. >>> >>> On Fri, Jan 14, 2011 at 5:42 AM, Herbert J. Bernstein >>> <yaya@bernstein-plus-sons.com> wrote: >>>> >>>> Dear Colleagues, >>>> >>>> I will not bore you all with a point-by-point rebuttal >>>> to John B.'s negative assessment of Python treble quote >>>> use in a CIF context. Most of what he sees as defects, >>>> I see as virtues. Such are differences in taste, and >>>> more importantly in the uses to which we put CIF. Especially >>>> with the introduction of dREL and DDLm, I do see CIF as >>>> a programming language, and as one with strong similarities >>>> to Python. That does not mean everybody has to use it that >>>> way, just that it would be nice if those who use it one way >>>> and those who use it another could find some common ground. >>>> The is now a move towards executable papers, and I suspect >>>> a more powerful and fexible python compatible CIF could be >>>> a strong competitor in that area. Indeed, if current trends >>>> continue, the IUCr is likely to need programming support >>>> in papers if it is to keep up. >>>> >>>> One point that does need a rebuttal... >>>> >>>>> It should also be noted that Python source code, including its string >>>>> literals, is restricted to being expressed in the characters of the >>>>> 7-bit ASCII character set (though they need not necessarily be encoded >>>>> according to US-ASCII). Unconditional, bidirectional CIF/Python string >>>>> compatibility would require that we apply the same restriction to CIF2 >>>>> triple-quoted strings. I would oppose that. >>>> >>>> That started to change in Python 2.5 which allowed explicit encoding >>>> declarations, and by Python 3 has vanished even without an >>>> encoding declaration. The Python 3 spec is: >>>> >>>> "Python reads program text as Unicode code points; the encoding >>>> ... defaults to UTF8" >>>> >>>> For more on how Python dealt with this issue as the same time >>>> we were considering it, see: >>>> >>>> http://www.python.org/dev/peps/pep-3120/ >>>> >>>> ===================================================== >>>> Herbert J. Bernstein, Professor of Computer Science >>>> Dowling College, Kramer Science Center, KSC 121 >>>> Idle Hour Blvd, Oakdale, NY, 11769 >>>> >>>> +1-631-244-3035 >>>> yaya@dowling.edu >>>> ===================================================== >>>> >>>> On Thu, 13 Jan 2011, Bollinger, John C wrote: >>>> >>>>> >>>>> On Thursday, January 13, 2011 7:10 AM, SIMON WESTRIP wrote: >>>>>> >>>>>> Let's assume we were starting with CIF2 that included a minimal scheme >>>>>> like F'. >>>>> >>>>> What then would be gained by adopting the full python specification of >>>>> string literals? >>>>>> >>>>>> 1) "Cleaner" presentation in the very rare cases that the eliding >>>>>> system would be needed in order to accommodate delimiters within the value. >>>>>> This is purely a matter of taste. >>>>>> >>>>>> 2) Ability to include raw strings using the 'r' prefix. But in CIF2 as >>>>>> it stands, all strings are 'raw'. >>>>> >>>>> Yes, but that will no longer be true if any of the proposals we're >>>>> discussing is adopted. >>>>> >>>>>> Perhaps others can add to this list? >>>>> >>>>>> From the perspective of technical features only: >>>>> >>>>> 3) Three distinct forms for expressing Unicode characters via ASCII >>>>> characters; one is restricted to characters from the BMP, but the others are >>>>> general >>>>> >>>>> 4) Two forms for expressing 8-bit characters (from some undocumented >>>>> character set, probably the source character set) via ASCII characters >>>>> >>>>> 5) Several elides for specific whitespace and non-printing ASCII >>>>> characters, some of which are not among the allowed CIF characters, and all >>>>> of which clash with the IUCr application-level elides >>>>> >>>>> 6) A mechanism for indicating whether the three forms of Unicode elides >>>>> of item (3) should in fact be processed, or not. >>>>> >>>>> 7) A mechanism for representing a byte-string data object, or possibly a >>>>> stub for such a feature, depending on which Python version serves as a >>>>> reference >>>>> >>>>> >>>>> Commentary: >>>>> >>>>> I think that makes a complete list of the new technical features that >>>>> full Python string literals would bring to CIF, beyond those of proposal F. >>>>> I ignore a few semantic details that are mostly consistent with the current >>>>> CIF specifications. >>>>> >>>>> Python's is indeed a rich feature set, but that is one of my objections >>>>> to its use for CIF. CIF is a data representation language, not a >>>>> programming language, so once the language can represent everything in its >>>>> present and future domain, alternative representation mechanisms add little. >>>>> People can and do write CIF by hand, but I don't think that use case is of >>>>> sufficient import to justify convenience features solely for its support, >>>>> particularly when such features present problems in other respects. >>>>> >>>>> Furthermore, Python admits essentially one implementation (changing >>>>> slowly over time), so a rich feature set does not present compatibility >>>>> problems. CIF, however, anticipates many implementations, so the number and >>>>> complexity of its features contribute to the likelihood of incompatibility >>>>> between implementations. >>>>> >>>>> Most importantly, however, I think several of the Python features are >>>>> inappropriate for CIF, and I specifically want them excluded: >>>>> >>>>> a) The \N{name} syntax for designating Unicode characters by UCD name. >>>>> I view this as the single greatest locus for bugs and incompatibility, both >>>>> among CIF implementations and between CIF and Python. Large among the >>>>> questions here is *which version of the UCD is referenced*? That can evolve >>>>> over time in Python, but it must be fixed in CIF, at least for each CIF >>>>> version. Shall we plan to issue a new version of CIF every time Python >>>>> moves up to a new Unicode version, and to deal with the multiple resulting >>>>> versions? Must every CIF2 implementation lug along a name=>character table >>>>> just for this? It is redundant with the other two Unicode elides. >>>>> >>>>> b) The [uU] prefix. In Python, Unicode strings are a different type of >>>>> object than ordinary strings, which is the main reason for the [uU] syntax. >>>>> All CIF2 strings are Unicode strings, however (so there's an unavoidable >>>>> semantic difference regardless). In CIF the [uU] prefix could still turn on >>>>> and off processing of Unicode elides, but to what end? In rare cases, to >>>>> yield a slightly simpler representation of strings that would otherwise >>>>> clash with one of the Unicode elide sequences. Should we really require all >>>>> conforming CIF processors to implement rules to support that obscure case, >>>>> even though it can reasonably be handled by the \\ elide instead? >>>>> >>>>> c) The [bB] prefix. I'm not clear on what it will mean in Python 3, but >>>>> it is ignored in Python 2. The only Python 3 meanings I can imagine are >>>>> incompatible with CIF, and there is no technical advantage for CIF in >>>>> including [bB] just to ignore it. >>>>> >>>>> d) The [rR] prefix. In Python, this turns off elide processing for the >>>>> string, except that if the [uU] prefix is also present then Unicode elides >>>>> are still handled. Also, the \\ elide is handled, but differently than for >>>>> other string literals. I would be happier with this for CIF, though still >>>>> not in favor, if it were a universal on/off for all elides. Furthermore, as >>>>> Simon pointed out, raw strings are what we have now. Supposing that we use >>>>> the Python rule that unrecognized elides are treated as literals, the value >>>>> of [rR] raw strings for CIF depends on how many and which elides we adopt. >>>>> Inasmuch as I favor restriction to only a few elides, I don't see [rR] >>>>> adding much of value. >>>>> >>>>> e) The \a, \b, \f, \n, \r, \t, and \v elides. These needlessly clash >>>>> with the IUCr elides, they are redundant with Unicode elides, and they >>>>> express characters that either can appear in as literals in triple-quoted >>>>> strings or are not allowed CIF characters (more on that in a separate >>>>> message). Including these would complicate CIF implementations for little >>>>> or no technical advantage. >>>>> >>>>> f) The \ooo and \xhh elides. These are redundant with the Unicode >>>>> elides. Moreover, they are byte-oriented in standard strings (so that their >>>>> actual meaning depends on the source or runtime character set), but >>>>> character-oriented in Unicode strings (there *thoroughly* redundant with the >>>>> \uxxxx and \Uxxxxxxxx forms). >>>>> >>>>> >>>>> That leaves very few Python string features that I could support being >>>>> added to CIF (triple-quoted strings only), to wit: >> >>>>> \<newline> >>>>> \uxxxx >>>>> \Uxxxxxxxx >>>>> \' >>>>> \" >>>>> \\ >>>>> >>>>> Among those, \' and \" serve only the purpose of delimiter elision; the >>>>> others have larger scopes. Given that the need to elide delimiters is >>>>> likely to be quite rare, and that these two clash with the IUCr elides, I >>>>> would prefer to omit them. >>>>> >>>>> As for the two Unicode escapes, it turns out that when the \[uU] is not >>>>> followed by the expected number of hex digits, the Python 2.4 behavior >>>>> differs from what the documentation lead me to believe. Python throws a >>>>> UnicodeDecodeError in such cases, rather than applying "all unrecognized >>>>> escape sequences are left in the string unchanged" to the whole construct. >>>>> With respect to those forms, if they are included then I would prefer that >>>>> constructs such as '''\u065q''' be treated as literals rather than error >>>>> cases. (And thus, to be subject to further interpretation at the >>>>> application level.) >>>>> >>>>> >>>>> >>>>> >>>>> Regards, >>>>> >>>>> John >>>>> -- >>>>> John C. Bollinger, Ph.D. >>>>> Department of Structural Biology >>>>> St. Jude Children's Research Hospital >>>>> >>>>> >>>>> Email Disclaimer: www.stjude.org/emaildisclaimer >>>>> _______________________________________________ >>>>> ddlm-group mailing list >>>>> ddlm-group@iucr.org >>>>> http://scripts.iucr.org/mailman/listinfo/ddlm-group >>>>> >>>> _______________________________________________ >>>> ddlm-group mailing list >>>> ddlm-group@iucr.org >>>> http://scripts.iucr.org/mailman/listinfo/ddlm-group >>>> >>> >>> >>> >>> -- >>> T +61 (02) 9717 9907 >>> F +61 (02) 9717 3145 >>> M +61 (04) 0249 4148 >>> _______________________________________________ >>> ddlm-group mailing list >>> ddlm-group@iucr.org >>> http://scripts.iucr.org/mailman/listinfo/ddlm-group >> >> _______________________________________________ >> ddlm-group mailing list >> ddlm-group@iucr.org >> http://scripts.iucr.org/mailman/listinfo/ddlm-group >> >> > > > > -- > T +61 (02) 9717 9907 > F +61 (02) 9717 3145 > M +61 (04) 0249 4148 > _______________________________________________ > ddlm-group mailing list > ddlm-group@iucr.org > http://scripts.iucr.org/mailman/listinfo/ddlm-group >
_______________________________________________ ddlm-group mailing list ddlm-group@iucr.org http://scripts.iucr.org/mailman/listinfo/ddlm-group
Reply to: [list | sender only]
- References:
- [ddlm-group] Simon's elide proposal (James Hester)
- Re: [ddlm-group] Simon's elide proposal (Herbert J. Bernstein)
- Re: [ddlm-group] Simon's elide proposal (James Hester)
- Re: [ddlm-group] Simon's elide proposal (Herbert J. Bernstein)
- Re: [ddlm-group] Simon's elide proposal (Bollinger, John C)
- Re: [ddlm-group] Simon's elide proposal (Herbert J. Bernstein)
- Re: [ddlm-group] Simon's elide proposal (Herbert J. Bernstein)
- Re: [ddlm-group] Simon's elide proposal (SIMON WESTRIP)
- Re: [ddlm-group] Simon's elide proposal (Herbert J. Bernstein)
- Re: [ddlm-group] Simon's elide proposal (SIMON WESTRIP)
- Re: [ddlm-group] Simon's elide proposal (Bollinger, John C)
- Re: [ddlm-group] Simon's elide proposal (Herbert J. Bernstein)
- Re: [ddlm-group] Simon's elide proposal (James Hester)
- Re: [ddlm-group] Simon's elide proposal (Herbert J. Bernstein)
- Re: [ddlm-group] Simon's elide proposal (James Hester)
- Prev by Date: Re: [ddlm-group] Relationship asmong CIF2, STAR, CIF1 and Python
- Next by Date: Re: [ddlm-group] Relationship asmong CIF2, STAR,CIF1 and Python. . . .. .
- Prev by thread: Re: [ddlm-group] Simon's elide proposal
- Next by thread: [ddlm-group] Relationship asmong CIF2, STAR, CIF1 and Python
- Index(es):