[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Reply to: [list | sender only]
Re: [ddlm-group] Simon's elide proposal
- To: Group finalising DDLm and associated dictionaries <ddlm-group@iucr.org>
- Subject: Re: [ddlm-group] Simon's elide proposal
- From: "Herbert J. Bernstein" <yaya@bernstein-plus-sons.com>
- Date: Wed, 12 Jan 2011 05:39:31 -0500 (EST)
- In-Reply-To: <AANLkTimdAavg2KCjPZTj1xDYXDQ1JLiQCkQb4snyBErZ@mail.gmail.com>
- References: <AANLkTimdAavg2KCjPZTj1xDYXDQ1JLiQCkQb4snyBErZ@mail.gmail.com>
Actually, Simon's proposal, while useful, is not complete, inasmuch as \" and \' are not handled yet. I urge adoption of my compromise suggestion as written. Without it, we are going down the same slippery slope we crashed on the last time we tried to resolve this issue. -- Herbert ===================================================== Herbert J. Bernstein, Professor of Computer Science Dowling College, Kramer Science Center, KSC 121 Idle Hour Blvd, Oakdale, NY, 11769 +1-631-244-3035 yaya@dowling.edu ===================================================== On Wed, 12 Jan 2011, James Hester wrote: > Note that Simon's proposal *does* completely answer Ralf's concern > about the lack of elide mechanism in triple quoted strings. It > provides line folding as well. I for one would consider our job > finished if we were to adopt Simon's proposal, and see no need for the > further steps proposed by Herbert. Herbert is of course welcome to > propose including the various Python behaviours as a separate > amendment to the CIF2 standard. > > I would propose a slight tweak to Simon's proposal, so that it works as follows: > > The datavalue is obtained from the triple-quoted string in two steps: > (1) All instances of <backslash><eol> are removed from the string > where the <backslash> is not preceded by another <backslash> > (2) All other instances of <backslash><eol> are replaced with <eol> > > This means that a sequence of n backslashes followed by newline is > replaced by a sequence of n-1 backslashes followed by newline, except > if there is one backslash before the newline, in which case both > newline and backslash are removed. Triple quote sequences are elided > by inserting a <backslash><eol> sequence between <delimiter> > characters to break up the triple delimiter sequence. Note also that > backslash has no special meaning if not in a sequence finishing with > <eol>. > > I will be posting a separate email, hopefully tonight, where I will > list the current elide proposals and request that we all indicate > which ones are potentially acceptable to us, with a ranking if > possible. This may help us to restrict discussion to something that > is mutually acceptable. > > On Sun, Jan 9, 2011 at 2:45 AM, Herbert J. Bernstein > <yaya@bernstein-plus-sons.com> wrote: >> Here is a possible compromise. This thread began with >> Ralf's concern about the lack of an elide mechanism >> in treble quoted strings. Simon's suggestion does >> not really answer that question, but it is a reasonable >> step in that direction. So, how about ... >> >> 1. Immediately adopt Simon's suggestion to allow the >> \\n and \\ elides in treble quoted strings. Except for >> the confusion in the meaning of \"""" if a more general >> elide is eventually adopted that should cause very little >> stress for anybody. >> >> 2. Add Ralf's proosed new section 7 to the CIF2 >> document as a proposal under discussion, with the >> advice that people may wish to avoid creating treble-quoted >> string that conflict with the full python elide >> conventions. >> >> 3. Provide a coherent discussion document for COMCIFS >> and the community at large on the alternatives in >> handling the treble-quoted string, asking for comments >> to the list prior to the Madrid meeting. I would suggest >> that Ralf be asked to contribute a page or 2 on the >> merits of his proposal and that either John B. or James >> contribute a page or 2 on their objections and alternatives. >> >> 4. Discuss it face to face at the Msdrid meeting and >> try to come to a resolution. >> >> 5. Move forward with the rest of CIF2 as proposed in >> the meantime so we will be ready to discuss all of CIF2 >> at the Madrid meeting, with a effort to have sample parsers >> and data sets available on the web prior to the meeting. >> >> Regards, >> Herbert >> >> ===================================================== >> Herbert J. Bernstein, Professor of Computer Science >> Dowling College, Kramer Science Center, KSC 121 >> Idle Hour Blvd, Oakdale, NY, 11769 >> >> +1-631-244-3035 >> yaya@dowling.edu >> ===================================================== >> >> On Sat, 8 Jan 2011, Herbert J. Bernstein wrote: >> >>> Dear James, >>> >>> You are clearly a much better programmer than I am. When I got down into >>> the interactions among the treble quote, single quotes, text fields, elides, >>> the bracketed constructs and comments in the lexical scan, I found the going >>> tough. If you have it done neatly, I would greatly appreciate seeing it. >>> >>> I think we need a face to face meeting or Skype meeting to resolve not >>> just this one issue, but the process of getting a workable CIF2. Perhaps we >>> can finally get to do that in Madrid. >>> >>> Regards, >>> Herbert >>> >>> >>> ===================================================== >>> Herbert J. Bernstein, Professor of Computer Science >>> Dowling College, Kramer Science Center, KSC 121 >>> Idle Hour Blvd, Oakdale, NY, 11769 >>> >>> +1-631-244-3035 >>> yaya@dowling.edu >>> ===================================================== >>> >>> On Sat, 8 Jan 2011, James Hester wrote: >>> >>>> I can't let these assertions go unchallenged: >>>> >>>> On Sat, Jan 8, 2011 at 12:04 PM, Herbert J. Bernstein >>>> <yaya@bernstein-plus-sons.com> wrote: >>>>> >>>>> Dear Simon, >>>>> >>>>> Adoption of Ralf's proposal will ... >>>>> >>>>> 1. Make it much easier to create a CIF2 parser, because for one of >>>>> the messiest parts of the code we will have a clear specification, >>>>> sample code and a way to validate the tough cases. >>>> >>>> If we adopt a simpler spec than the Python in toto spec: >>>> - there will be many fewer tough cases >>>> - there will be a simpler and therefore clearer specification >>>> - for many alternative schemes the lexer will be unchanged from the >>>> current version, with the elide behaviour >>>> simply requiring a search and replace following lexing >>>> Triple-quoted string handling is not currently a messy part of the >>>> code, I don't understand why you think this. It will become >>>> significantly more complex under Ralf's proposal. >>>> >>>>> 2. Make it easier for users to conform the the quoting rules, because >>>>> at least that one part of CIF2 will be thoroughly documented with lots >>>>> of examples. >>>> >>>> Quoting rules are not rocket science. About 3 examples will be >>>> enough, if we adopt a simple specification rather >>>> than the unicode+raw+lots of escapes that the Python proposal entails. >>>> Doing things the Python way would >>>> imply more chance for user misunderstanding, especially bearing in >>>> mind that CIF2 users are not necessarily >>>> Python programmers or even programmers at all. For these users, there >>>> is absolutely no benefit in adopting Python or any other language's >>>> approach - they are unfamiliar with them all. >>>> >>>>> 3. Make is easier for the journals and archives to deal with "odd" >>>>> CIF2 files containing complex treble quoted strings because at >>>>> least that one part of CIF2 will be throughly documented with lots >>>>> of examples, and, with a utility (IDLE) all ready to allow them >>>>> to pull out a troublesome treble-quoted string and figure out what >>>>> it means or what it might mean if some intuitive change were made. >>>> >>>> The simpler the spec, the less likely mistakes will be made and the >>>> less chance of ambiguity. >>>> >>>>> Yes, if Ralf's proposal happens to be rejected, we will still have >>>>> a problem in the lack of elide handling, and yes we will have to >>>>> put in the time an effort to consider those alternatives, but, first, >>>>> in order to have some chance of finishing the specification of CIF2 >>>>> before the summer meeting deadlines (at least one of which is in >>>>> just a little more than 3 weeks), might it not be a good idea >>>>> to discuss and consider what was actually proposed instead of >>>>> chasing after lots of plausible alternatives that we already discussed >>>>> and rejected, and so are not very likely to agree upon rapidly now. >>>> >>>> I have some hope that, by restricting our discussion to treble-quoted >>>> strings, we can make progress compared to previous attempts. I have >>>> considered and discussed at length Ralf's proposal, and would be >>>> interested in your responses to my particular objections. >>>> >>>>> So, before I will delve into the many subtle variations of elide >>>>> mechanisms, I would appreciate our finishing consideration of Ralf's >>>>> actual proposal: >>>>> >>>>> ======================= >>>>> >>>>> His revised wording (with one correction) is: >>>>> >>>>> ======================== >>>>> >>>>> CHANGE 7 NEW >>>>> >>>>> >>>>> Triple-quote delimited strings. >>>>> >>>>> The following ASCII sequences delimit the beginning of a string: >>>>> >>>>> """ >>>>> ''' >>>>> r""" >>>>> r''' >>>>> u""" >>>>> u''' >>>>> >>>>> The characters following the delimiter sequence are interpreted >>>>> with exactly the same algorithm as implemented for triple-quoted >>>>> strings in the Python programming language version 2 series. >>>>> In this algorithm, triple-quoted strings are terminated by matching >>>>> """ or ''' delimiters. >>>>> >>>>> For example >>>>> >>>>> """He said "His name is O'Hearly".""" >>>>> r'''In {\bf \TeX} the accents are \' and \".''' >>>>> >>>>> Triple-quoted strings provide a reliable mechanism for storing any >>>>> arbitrary string in a CIF2 file. >>>>> >>>>> ========================= >>>>> >>>>> This is cleaner and simpler than the original change 7 wording. >>>>> It probably does not conflict with existing CIF1 documents and the >>>>> _only_ CIF2 documents it can conflict with are the very few >>>>> that happen to end in \""" or \''''. The new leading delimiters >>>>> r""", r''', u""" and u''' will have to be added to the list of forbidden >>>>> starts to white-space delimited data values in change 5. In exchange >>>>> for >>>>> this minor adjustments to valid CIF2 syntax we gain a fully documented, >>>>> software supported way to include arbitrary strings in a CIF2 document >>>>> that people are already used to working with. >>>>> >>>>> I have reviewed the discussion of the "use of elides in strings" >>>>> thread in the ddlm-group discussion list, and, while I did not >>>>> then and do not now understand the objections to the general use >>>>> of elides in quoted strings, I particularly do not understand >>>>> the logic of objecting to the use of elides in treble-quoted strings, >>>>> which are a construct completely new to CIF and therefore in >>>>> conflict with no existing data files. >>>>> >>>>> Would those who have an objection to Ralf's proposal please >>>>> state their objections. An objection that says we object because >>>>> in past discussions another body could not manage to come to an >>>>> agreement and just gave up does not speak to the merits of this >>>>> specific proposal. >>>>> >>>>> I have no idea why we are considering other proposals before >>>>> settling the status of Ralf's proposal. >>>> >>>> It is also useful to know what the alternatives might be when >>>> considering a proposal. >>>> >>>>> I agree with Ralf's proposal. >>>>> >>>>> Regards, >>>>> Herbert >>>>> >>>>> At 12:37 AM +0000 1/8/11, SIMON WESTRIP wrote: >>>>>> >>>>>> Dear Herbert >>>>>> >>>>>> I fail to see how the adoption of python string quoting rules is going >>>>>> to >>>>>> make life easier for anyone other than a python programmer? >>>>>> Even then, the mechanism is restricted to treble-quoted strings, >>>>>> which are only >>>>>> one part of CIF. Maybe I've missed something, but just because CIF >>>>>> might share >>>>>> common syntax with a programming language in one respect, does not >>>>>> necessarily mean >>>>>> that the tools of that medium are available to CIF? >>>>>> >>>>>> If you're looking to base CIF extensions on established mechanisms, >>>>>> why not adopt >>>>>> the minimal \(newline) and \\ escape sequences, which in essence are >>>>>> the same as >>>>>> the established CIF line-folding protocol (just dropping the initial >>>>>> \ following the opening >>>>>> delimiter and formalising the protocol as an inherent part of the >>>>>> spec). Afterall, I beleive you >>>>>> have already been using it, or at least interpreted it, as a means >>>>>> to escape 'semicolon delimiters' within >>>>>> semicolon-delimited values (I seem to recall discussions that >>>>>> identified an issue with the published 'trip tests' >>>>>> relating to line folding). >>>>>> >>>>>> Forgive me if I have missed something regarding the usefulness of >>>>>> python in CIF; please enlighten me >>>>>> as to its benefits if I were to write a CIF reader using anything >>>>>> but python. As far as I can see, the only >>>>>> advantages lie in the fact that the logic is established and thus >>>>>> unquestionable; but that does not mean it is >>>>>> necessarily entirely appropriate for CIF (which afterall isn't a >>>>>> programming language). >>>>>> >>>>>> Cheers >>>>>> >>>>>> Simon >>>>>> >>>>>> >>>>>> >>>>>> From: Herbert J. Bernstein <yaya@bernstein-plus-sons.com> >>>>>> To: Group finalising DDLm and associated dictionaries >>>>>> <ddlm-group@iucr.org> >>>>>> Sent: Friday, 7 January, 2011 23:07:40 >>>>>> Subject: Re: [ddlm-group] Eliding in triple-quoted strings: >>>>>> Proposals C and D. .. .. . >>>>>> >>>>>> Dear Colleagues, >>>>>> >>>>>> Ralf's proposal is what it is. Before we go haring off in other >>>>>> directions, we should respond constructively to what he has proposed. >>>>>> I support it. Ralf and John W. support it. John B. and James H. >>>>>> oppose it. I think they are mistaken because ... >>>>>> >>>>>> It is well and good to adopt a "Real Programmers Don't Each >>>>>> Quiche" let's-start-from-scratch-and-roll-our-own approach when >>>>>> you have the resources to accomplish our goals that way. It >>>>>> is a lot of fun, and has the potential to truly advance the >>>>>> field, but it is also, in the current funding climate, unrealistic. >>>>>> >>>>>> In the U.S., there is a serious prospect to science funding being >>>>>> cut back so severely that the hit rates on grants next year may >>>>>> be as low as 1 in 10. I suspect an honest review of funding prospects >>>>>> in other countries will uncover similarly dire warnings. >>>>>> >>>>>> This does not mean we are all going out of buisness, but we do have >>>>>> to be careful to conserve resources and focus our do-it-from-scratch >>>>>> efforts on those areas that have the highest priority, and I fear, >>>>>> for most of our community, CIF2, while important, is not likely to >>>>>> be seen as worth that approach, and certainly filing the edges of >>>>>> a brand-new treble quote spec is likely to be very far down >>>>>> on anybody's priority list. >>>>>> >>>>>> Ralf has made a proposal that will save all of us a lot of effort >>>>>> and allow us to devote more resources to higher priority problems. >>>>>> >>>>>> Not only is he right on this one point, but I urge us to look for >>>>>> other areas where we can get to CIF2 by building on work that is >>>>>> already done. >>>>>> >>>>>> This is not a good time for wheel-reinvention. >>>>>> >>>>>> I would appreciate knowing from those who wish to reinvent this >>>>>> particular wheel, why they wish to do that and from where they >>>>>> expect to get the resources to do it? >>>>>> >>>>>> Regards, >>>>>> Herbert >>>>>> >>>>>> ===================================================== >>>>>> Herbert J. Bernstein, Professor of Computer Science >>>>>> Dowling College, Kramer Science Center, KSC 121 >>>>>> Idle Hour Blvd, Oakdale, NY, 11769 >>>>>> >>>>>> +1-631-244-3035 >>>>>> <mailto:yaya@dowling.edu>yaya@dowling.edu >>>>>> ===================================================== >>>>>> >>>>>> On Fri, 7 Jan 2011, Bollinger, John C wrote: >>>>>> >>>>>>> >>>>>>> On Friday, January 07, 2011 3:14 PM, Herbert J. Bernstein wrote: >>>>>>> >>>>>>>> We seem not to be communicating effectively. >>>>>> >>>>>> >> >>>>>>>> >>>>>>>> What I am asking for is an _existing_, supported treble quote >>>>>>>> specification >>>>>>>> from an _existing_ language with _existing_ documentation and >>>>>>>> _existing_ software as an alternative to the Python specification, >>>>>>>> documentation and software to which we all have access, that is >>>>>>>> being >>>>>>>> proposed as an alternative >>>>>>>> to what Ralf has proposed. >>>>>>> >>>>>>> Thank you for that clarification. You are right, I didn't understand >>>>>>> what you were asking for. >>>>>>> >>>>>>> I hope this will likewise clarify my position: I reject the premise >>>>>>> that >>>>>>> the system we choose must meet those criteria, and I oppose adopting >>>>>>> the >>>>>>> full Python syntax and semantics. >>>>>>> >>>>>>>> The Python specification is available at >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> <http://docs.python.org/reference/index.html>http://docs.python.org/reference/index.html >>>>>>>> >>>>>>>> with the lexical analysis at >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> <http://docs.python.org/reference/lexical_analysis.html>http://docs.python.org/reference/lexical_analysis.html >>>>>>> >>>>>>> Thanks, though that is exactly what I was looking at already. It >>>>>>> leaves >>>>>>> several details unclear, some of which I discussed in previous >>>>>>> messages. >>>>>>> Hence, I consider it slightly short of a *full* specification. It >>>>>>> does, >>>>>>> however, provide my grounds for opposing adoption of that scheme for >>>>>>> CIF. >>>>>>> >>>>>>>> The complete source code and binaries are available at: >>>>>>> >>>>>>> Unless you propose to append a particular set of Python sources to >>>>>>> the >>>>>>> CIF specification as a reference, I have no interest in perusing the >>>>>>> source code to seek answers to such questions of detail as I have. >>>>>>> Furthermore, I would oppose adding such an appendix on the grounds >>>>>>> that >>>>>>> it would be exceedingly difficult to use to resolve questions such as >>>>>>> mine. >>>>>>> >>>>>>> I am likewise unwilling to rely on the behavior the python binary >>>>>>> that >>>>>>> happens to be installed on my computer to answer them. If the >>>>>>> correct >>>>>>> behavior is not documented independent of the program then there is >>>>>>> no >>>>>>> particular reason to trust that it won't change in future versions, >>>>>>> or >>>>>>> that any particular implementation is correct or bug-free. >>>>>>> >>>>>>> >>>>>>> Regards, >>>>>>> >>>>>>> John >>>>>>> >>>>>>> -- >>>>>>> John C. Bollinger, Ph.D. >>>>>>> Department of Structural Biology >>>>>>> St. Jude Children's Research Hospital >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> Email Disclaimer: >>>>>>> <http://www.stjude.org/emaildisclaimer>www.stjude.org/emaildisclaimer >>>>>>> >>>>>>> _______________________________________________ >>>>>>> ddlm-group mailing list >>>>>>> <mailto:ddlm-group@iucr.org>ddlm-group@iucr.org >>>>>>> >>>>>>> >>>>>>> <http://scripts.iucr.org/mailman/listinfo/ddlm-group>http://scripts.iucr.org/mailman/listinfo/ddlm-group >>>>>>> >>>>>> _______________________________________________ >>>>>> ddlm-group mailing list >>>>>> <mailto:ddlm-group@iucr.org>ddlm-group@iucr.org >>>>>> >>>>>> <http://scripts.iucr.org/mailman/listinfo/ddlm-group>http://scripts.iucr.org/mailman/listinfo/ddlm-group >>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> ddlm-group mailing list >>>>>> ddlm-group@iucr.org >>>>>> http://scripts.iucr.org/mailman/listinfo/ddlm-group >>>>> >>>>> >>>>> -- >>>>> ===================================================== >>>>> Herbert J. Bernstein, Professor of Computer Science >>>>> Dowling College, Kramer Science Center, KSC 121 >>>>> Idle Hour Blvd, Oakdale, NY, 11769 >>>>> >>>>> +1-631-244-3035 >>>>> yaya@dowling.edu >>>>> ===================================================== >>>>> _______________________________________________ >>>>> ddlm-group mailing list >>>>> ddlm-group@iucr.org >>>>> http://scripts.iucr.org/mailman/listinfo/ddlm-group >>>>> >>>> >>>> >>>> >>>> -- >>>> T +61 (02) 9717 9907 >>>> F +61 (02) 9717 3145 >>>> M +61 (04) 0249 4148 >>>> _______________________________________________ >>>> ddlm-group mailing list >>>> ddlm-group@iucr.org >>>> http://scripts.iucr.org/mailman/listinfo/ddlm-group >> >> _______________________________________________ >> ddlm-group mailing list >> ddlm-group@iucr.org >> http://scripts.iucr.org/mailman/listinfo/ddlm-group >> >> > > > > -- > T +61 (02) 9717 9907 > F +61 (02) 9717 3145 > M +61 (04) 0249 4148 > _______________________________________________ > ddlm-group mailing list > ddlm-group@iucr.org > http://scripts.iucr.org/mailman/listinfo/ddlm-group >
_______________________________________________ ddlm-group mailing list ddlm-group@iucr.org http://scripts.iucr.org/mailman/listinfo/ddlm-group
Reply to: [list | sender only]
- Follow-Ups:
- Re: [ddlm-group] Simon's elide proposal (James Hester)
- Re: [ddlm-group] Simon's elide proposal (Herbert J. Bernstein)
- References:
- [ddlm-group] Simon's elide proposal (James Hester)
- Prev by Date: [ddlm-group] Simon's elide proposal
- Next by Date: Re: [ddlm-group] Simon's elide proposal
- Prev by thread: [ddlm-group] Simon's elide proposal
- Next by thread: Re: [ddlm-group] Simon's elide proposal
- Index(es):