[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [ddlm-group] Simon's elide proposal

In triple-quoted strings there is no need to create \" or \' elides.
It is sufficient to simply break up any embedded triple quotes.  This
is the insight behind proposals C and D and Simon's proposal.  Simon's
proposal is therefore complete.  If you don't believe me, please
present me with a string that you think is not handled by this
proposal and I'll undertake to present you with the elided version.

On Wed, Jan 12, 2011 at 9:39 PM, Herbert J. Bernstein
<yaya@bernstein-plus-sons.com> wrote:
> Actually, Simon's proposal, while useful, is not complete,
> inasmuch as \" and \' are not handled yet.  I urge adoption
> of my compromise suggestion as written.   Without it,
> we are going down the same slippery slope we crashed on
> the last time we tried to resolve this issue. -- Herbert
>
> =====================================================
>  Herbert J. Bernstein, Professor of Computer Science
>   Dowling College, Kramer Science Center, KSC 121
>        Idle Hour Blvd, Oakdale, NY, 11769
>
>                 +1-631-244-3035
>                 yaya@dowling.edu
> =====================================================
>
> On Wed, 12 Jan 2011, James Hester wrote:
>
>> Note that Simon's proposal *does* completely answer Ralf's concern
>> about the lack of elide mechanism in triple quoted strings.  It
>> provides line folding as well.  I for one would consider our job
>> finished if we were to adopt Simon's proposal, and see no need for the
>> further steps proposed by Herbert.  Herbert is of course welcome to
>> propose including the various Python behaviours as a separate
>> amendment to the CIF2 standard.
>>
>> I would propose a slight tweak to Simon's proposal, so that it works as
>> follows:
>>
>> The datavalue is obtained from the triple-quoted string in two steps:
>> (1) All instances of <backslash><eol> are removed from the string
>> where the <backslash> is not preceded by another <backslash>
>> (2) All other instances of <backslash><eol> are replaced with <eol>
>>
>> This means that a sequence of n backslashes followed by newline is
>> replaced by a sequence of n-1 backslashes followed by newline, except
>> if there is one backslash before the newline, in which case both
>> newline and backslash are removed.  Triple quote sequences are elided
>> by inserting a <backslash><eol> sequence between <delimiter>
>> characters to break up the triple delimiter sequence.  Note also that
>> backslash has no special meaning if not in a sequence finishing with
>> <eol>.
>>
>> I will be posting a separate email, hopefully tonight, where I will
>> list the current elide proposals and request that we all indicate
>> which ones are potentially acceptable to us, with a ranking if
>> possible.  This may help us to restrict discussion to something that
>> is mutually acceptable.
>>
>> On Sun, Jan 9, 2011 at 2:45 AM, Herbert J. Bernstein
>> <yaya@bernstein-plus-sons.com> wrote:
>>>
>>> Here is a possible compromise.  This thread began with
>>> Ralf's concern about the lack of an elide mechanism
>>> in treble quoted strings.  Simon's suggestion does
>>> not really answer that question, but it is a reasonable
>>> step in that direction.  So, how about ...
>>>
>>> 1.  Immediately adopt Simon's suggestion to allow the
>>> \\n and \\ elides in treble quoted strings.  Except for
>>> the confusion in the meaning of \"""" if a more general
>>> elide is eventually adopted that should cause very little
>>> stress for anybody.
>>>
>>> 2.  Add Ralf's proosed new section 7 to the CIF2
>>> document as a proposal under discussion, with the
>>> advice that people may wish to avoid creating treble-quoted
>>> string that conflict with the full python elide
>>> conventions.
>>>
>>> 3.  Provide a coherent discussion document for COMCIFS
>>> and the community at large on the alternatives in
>>> handling the treble-quoted string, asking for comments
>>> to the list prior to the Madrid meeting.  I would suggest
>>> that Ralf be asked to contribute a page or 2 on the
>>> merits of his proposal and that either John B. or James
>>> contribute a page or 2 on their objections and alternatives.
>>>
>>> 4.  Discuss it face to face at the Msdrid meeting and
>>> try to come to a resolution.
>>>
>>> 5.  Move forward with the rest of CIF2 as proposed in
>>> the meantime so we will be ready to discuss all of CIF2
>>> at the Madrid meeting, with a effort to have sample parsers
>>> and data sets available on the web prior to the meeting.
>>>
>>> Regards,
>>>  Herbert
>>>
>>> =====================================================
>>>  Herbert J. Bernstein, Professor of Computer Science
>>>   Dowling College, Kramer Science Center, KSC 121
>>>        Idle Hour Blvd, Oakdale, NY, 11769
>>>
>>>                 +1-631-244-3035
>>>                 yaya@dowling.edu
>>> =====================================================
>>>
>>> On Sat, 8 Jan 2011, Herbert J. Bernstein wrote:
>>>
>>>> Dear James,
>>>>
>>>>  You are clearly a much better programmer than I am. When I got down
>>>> into
>>>> the interactions among the treble quote, single quotes, text fields,
>>>> elides,
>>>> the bracketed constructs and comments in the lexical scan, I found the
>>>> going
>>>> tough.  If you have it done neatly, I would greatly appreciate seeing
>>>> it.
>>>>
>>>>  I think we need a face to face meeting or Skype meeting to resolve not
>>>> just this one issue, but the process of getting a workable CIF2.
>>>>  Perhaps we
>>>> can finally get to do that in Madrid.
>>>>
>>>>  Regards,
>>>>   Herbert
>>>>
>>>>
>>>> =====================================================
>>>> Herbert J. Bernstein, Professor of Computer Science
>>>>  Dowling College, Kramer Science Center, KSC 121
>>>>       Idle Hour Blvd, Oakdale, NY, 11769
>>>>
>>>>                +1-631-244-3035
>>>>                yaya@dowling.edu
>>>> =====================================================
>>>>
>>>> On Sat, 8 Jan 2011, James Hester wrote:
>>>>
>>>>> I can't let these assertions go unchallenged:
>>>>>
>>>>> On Sat, Jan 8, 2011 at 12:04 PM, Herbert J. Bernstein
>>>>> <yaya@bernstein-plus-sons.com> wrote:
>>>>>>
>>>>>> Dear Simon,
>>>>>>
>>>>>>   Adoption of Ralf's proposal will ...
>>>>>>
>>>>>>   1.  Make it much easier to create a CIF2 parser, because for one of
>>>>>> the messiest parts of the code we will have a clear specification,
>>>>>> sample code and a way to validate the tough cases.
>>>>>
>>>>> If we adopt a simpler spec than the Python in toto spec:
>>>>> - there will be many fewer tough cases
>>>>> - there will be a simpler and therefore clearer specification
>>>>> - for many alternative schemes the lexer will be unchanged from the
>>>>> current version, with the elide behaviour
>>>>>  simply requiring a search and replace following lexing
>>>>> Triple-quoted string handling is not currently a messy part of the
>>>>> code, I don't understand why you think this.  It will become
>>>>> significantly more complex under Ralf's proposal.
>>>>>
>>>>>>   2.  Make it easier for users to conform the the quoting rules,
>>>>>> because
>>>>>> at least that one part of CIF2 will be thoroughly documented with lots
>>>>>> of examples.
>>>>>
>>>>> Quoting rules are not rocket science.  About 3 examples will be
>>>>> enough, if we adopt a simple specification rather
>>>>> than the unicode+raw+lots of escapes that the Python proposal entails.
>>>>> Doing things the Python way would
>>>>> imply more chance for user misunderstanding, especially bearing in
>>>>> mind that CIF2 users are not necessarily
>>>>> Python programmers or even programmers at all.  For these users, there
>>>>> is absolutely no benefit in adopting Python or any other language's
>>>>> approach - they are unfamiliar with them all.
>>>>>
>>>>>>   3.  Make is easier for the journals and archives to deal with "odd"
>>>>>> CIF2 files containing complex treble quoted strings because at
>>>>>> least  that one part of CIF2 will be throughly documented with lots
>>>>>> of examples, and, with a utility (IDLE) all ready to allow them
>>>>>> to pull out a troublesome treble-quoted string and figure out what
>>>>>> it means or what it might mean if some intuitive change were made.
>>>>>
>>>>> The simpler the spec, the less likely mistakes will be made and the
>>>>> less chance of ambiguity.
>>>>>
>>>>>>   Yes, if Ralf's proposal happens to be rejected, we will still have
>>>>>> a problem in the lack of elide handling, and yes we will have to
>>>>>> put in the time an effort to consider those alternatives, but, first,
>>>>>> in order to have some chance of finishing the specification of CIF2
>>>>>> before the summer meeting deadlines (at least one of which is in
>>>>>> just a little more than 3 weeks), might it not be a good idea
>>>>>> to discuss and consider what was actually proposed instead of
>>>>>> chasing after lots of plausible alternatives that we already discussed
>>>>>> and rejected, and so are not very likely to agree upon rapidly now.
>>>>>
>>>>> I have some hope that, by restricting our discussion to treble-quoted
>>>>> strings, we can make progress compared to previous attempts.  I have
>>>>> considered and discussed at length Ralf's proposal, and would be
>>>>> interested in your responses to my particular objections.
>>>>>
>>>>>>   So, before I will delve into the many subtle variations of elide
>>>>>> mechanisms, I would appreciate our finishing consideration of Ralf's
>>>>>> actual proposal:
>>>>>>
>>>>>> =======================
>>>>>>
>>>>>> His revised wording (with one correction) is:
>>>>>>
>>>>>> ========================
>>>>>>
>>>>>> CHANGE 7 NEW
>>>>>>
>>>>>>
>>>>>> Triple-quote delimited strings.
>>>>>>
>>>>>> The following ASCII sequences delimit the beginning of a string:
>>>>>>
>>>>>>     """
>>>>>>     '''
>>>>>>     r"""
>>>>>>     r'''
>>>>>>     u"""
>>>>>>     u'''
>>>>>>
>>>>>> The characters following the delimiter sequence are interpreted
>>>>>> with exactly the same algorithm as implemented for triple-quoted
>>>>>> strings in the Python programming language version 2 series.
>>>>>> In this algorithm, triple-quoted strings are terminated by matching
>>>>>> """ or ''' delimiters.
>>>>>>
>>>>>> For example
>>>>>>
>>>>>>     """He said "His name is O'Hearly"."""
>>>>>>     r'''In {\bf \TeX} the accents are \' and \".'''
>>>>>>
>>>>>> Triple-quoted strings provide a reliable mechanism for storing any
>>>>>> arbitrary string in a CIF2 file.
>>>>>>
>>>>>> =========================
>>>>>>
>>>>>> This is cleaner and simpler than the original change 7 wording.
>>>>>> It probably does not conflict with existing CIF1 documents and the
>>>>>> _only_ CIF2 documents it can conflict with are the very few
>>>>>> that happen to end in \""" or \''''.  The new leading delimiters
>>>>>> r""", r''', u""" and u''' will have to be added to the list of
>>>>>> forbidden
>>>>>> starts to white-space delimited data values in change 5.  In exchange
>>>>>> for
>>>>>> this minor adjustments to valid CIF2 syntax we gain a fully
>>>>>> documented,
>>>>>> software supported way to include arbitrary strings in a CIF2 document
>>>>>> that people are already used to working with.
>>>>>>
>>>>>> I have reviewed the discussion of the "use of elides in strings"
>>>>>> thread in the ddlm-group discussion list, and, while I did not
>>>>>> then and do not now understand the objections to the general use
>>>>>> of elides in quoted strings, I particularly do not understand
>>>>>> the logic of objecting to the use of elides in treble-quoted strings,
>>>>>> which are a construct completely new to CIF and therefore in
>>>>>> conflict with no existing data files.
>>>>>>
>>>>>> Would those who have an objection to Ralf's proposal please
>>>>>> state their objections.  An objection that says we object because
>>>>>> in past discussions another body could not manage to come to an
>>>>>> agreement and just gave up does not speak to the merits of this
>>>>>> specific proposal.
>>>>>>
>>>>>> I have no idea why we are considering other proposals before
>>>>>> settling the status of Ralf's proposal.
>>>>>
>>>>> It is also useful to know what the alternatives might be when
>>>>> considering a proposal.
>>>>>
>>>>>> I agree with Ralf's proposal.
>>>>>>
>>>>>> Regards,
>>>>>>   Herbert
>>>>>>
>>>>>> At 12:37 AM +0000 1/8/11, SIMON WESTRIP wrote:
>>>>>>>
>>>>>>> Dear Herbert
>>>>>>>
>>>>>>> I fail to see how the adoption of python string quoting rules is
>>>>>>> going
>>>>>>> to
>>>>>>> make life easier for anyone other than a python programmer?
>>>>>>> Even then, the mechanism is restricted to treble-quoted strings,
>>>>>>> which are only
>>>>>>> one part of CIF. Maybe I've missed something, but just because CIF
>>>>>>> might share
>>>>>>> common syntax with a programming language in one respect, does not
>>>>>>> necessarily mean
>>>>>>> that the tools of that medium are available to CIF?
>>>>>>>
>>>>>>> If you're looking to base CIF extensions on established mechanisms,
>>>>>>> why not adopt
>>>>>>> the minimal \(newline) and \\ escape sequences, which in essence are
>>>>>>> the same as
>>>>>>> the established CIF line-folding protocol (just dropping the initial
>>>>>>> \ following the opening
>>>>>>> delimiter and formalising the protocol as an inherent part of the
>>>>>>> spec). Afterall, I beleive you
>>>>>>> have already been using it, or at least interpreted it, as a means
>>>>>>> to escape 'semicolon delimiters' within
>>>>>>> semicolon-delimited values (I seem to recall discussions that
>>>>>>> identified an issue with the published 'trip tests'
>>>>>>> relating to line folding).
>>>>>>>
>>>>>>> Forgive me if I have missed something regarding the usefulness of
>>>>>>> python in CIF; please enlighten me
>>>>>>> as to its benefits if I were to write a CIF reader using anything
>>>>>>> but python. As far as I can see, the only
>>>>>>> advantages lie in the fact that the logic is established and thus
>>>>>>> unquestionable; but that does not mean it is
>>>>>>> necessarily entirely appropriate for CIF (which afterall isn't a
>>>>>>> programming language).
>>>>>>>
>>>>>>> Cheers
>>>>>>>
>>>>>>> Simon
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> From: Herbert J. Bernstein <yaya@bernstein-plus-sons.com>
>>>>>>> To: Group finalising DDLm and associated dictionaries
>>>>>>> <ddlm-group@iucr.org>
>>>>>>> Sent: Friday, 7 January, 2011 23:07:40
>>>>>>> Subject: Re: [ddlm-group] Eliding in triple-quoted strings:
>>>>>>> Proposals C and D. .. .. .
>>>>>>>
>>>>>>> Dear Colleagues,
>>>>>>>
>>>>>>>   Ralf's proposal is what it is.  Before we go haring off in other
>>>>>>> directions, we should respond constructively to what he has proposed.
>>>>>>> I support it.  Ralf and John W. support it.  John B. and James H.
>>>>>>> oppose it.  I think they are mistaken because ...
>>>>>>>
>>>>>>>   It is well and good to adopt a "Real Programmers Don't Each
>>>>>>> Quiche" let's-start-from-scratch-and-roll-our-own approach when
>>>>>>> you have the resources to accomplish our goals that way.  It
>>>>>>> is a lot of fun, and has the potential to truly advance the
>>>>>>> field, but it is also, in the current funding climate, unrealistic.
>>>>>>>
>>>>>>>   In the U.S., there is a serious prospect to science funding being
>>>>>>> cut back so severely that the hit rates on grants next year may
>>>>>>> be as low as 1 in 10.  I suspect an honest review of funding
>>>>>>> prospects
>>>>>>> in other countries will uncover similarly dire warnings.
>>>>>>>
>>>>>>>   This does not mean we are all going out of buisness, but we do have
>>>>>>> to be careful to conserve resources and focus our do-it-from-scratch
>>>>>>> efforts on those areas that have the highest priority, and I fear,
>>>>>>> for most of our community, CIF2, while important, is not likely to
>>>>>>> be seen as worth that approach, and certainly filing the edges of
>>>>>>> a brand-new treble quote spec is likely to be very far down
>>>>>>> on anybody's priority list.
>>>>>>>
>>>>>>> Ralf has made a proposal that will save all of us a lot of effort
>>>>>>> and allow us to devote more resources to higher priority problems.
>>>>>>>
>>>>>>> Not only is he right on this one point, but I urge us to look for
>>>>>>> other areas where we can get to CIF2 by building on work that is
>>>>>>> already done.
>>>>>>>
>>>>>>> This is not a good time for wheel-reinvention.
>>>>>>>
>>>>>>> I would appreciate knowing from those who wish to reinvent this
>>>>>>> particular wheel, why they wish to do that and from where they
>>>>>>> expect to get the resources to do it?
>>>>>>>
>>>>>>> Regards,
>>>>>>>   Herbert
>>>>>>>
>>>>>>> =====================================================
>>>>>>>   Herbert J. Bernstein, Professor of Computer Science
>>>>>>>     Dowling College, Kramer Science Center, KSC 121
>>>>>>>         Idle Hour Blvd, Oakdale, NY, 11769
>>>>>>>
>>>>>>>                   +1-631-244-3035
>>>>>>>                   <mailto:yaya@dowling.edu>yaya@dowling.edu
>>>>>>> =====================================================
>>>>>>>
>>>>>>> On Fri, 7 Jan 2011, Bollinger, John C wrote:
>>>>>>>
>>>>>>>>
>>>>>>>>  On Friday, January 07, 2011 3:14 PM, Herbert J. Bernstein wrote:
>>>>>>>>
>>>>>>>>>  We seem not to be communicating effectively.
>>>>>>>
>>>>>>>  >>
>>>>>>>>>
>>>>>>>>>  What I am asking for is an _existing_, supported treble quote
>>>>>>>>> specification
>>>>>>>>>  from an _existing_ language with _existing_ documentation and
>>>>>>>>>  _existing_ software as an alternative to the Python specification,
>>>>>>>>>  documentation and software to which we all have access, that is
>>>>>>>>> being
>>>>>>>>>  proposed as an alternative
>>>>>>>>>  to what Ralf has proposed.
>>>>>>>>
>>>>>>>>  Thank you for that clarification.  You are right, I didn't
>>>>>>>> understand
>>>>>>>>  what you were asking for.
>>>>>>>>
>>>>>>>>  I hope this will likewise clarify my position: I reject the premise
>>>>>>>> that
>>>>>>>>  the system we choose must meet those criteria, and I oppose
>>>>>>>> adopting
>>>>>>>> the
>>>>>>>>  full Python syntax and semantics.
>>>>>>>>
>>>>>>>>>  The Python specification is available at
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> <http://docs.python.org/reference/index.html>http://docs.python.org/reference/index.html
>>>>>>>>>
>>>>>>>>>  with the lexical analysis at
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> <http://docs.python.org/reference/lexical_analysis.html>http://docs.python.org/reference/lexical_analysis.html
>>>>>>>>
>>>>>>>>  Thanks, though that is exactly what I was looking at already.  It
>>>>>>>> leaves
>>>>>>>>  several details unclear, some of which I discussed in previous
>>>>>>>> messages.
>>>>>>>>  Hence, I consider it slightly short of a *full* specification.  It
>>>>>>>> does,
>>>>>>>>  however, provide my grounds for opposing adoption of that scheme
>>>>>>>> for
>>>>>>>>  CIF.
>>>>>>>>
>>>>>>>>>  The complete source code and binaries are available at:
>>>>>>>>
>>>>>>>>  Unless you propose to append a particular set of Python sources to
>>>>>>>> the
>>>>>>>>  CIF specification as a reference, I have no interest in perusing
>>>>>>>> the
>>>>>>>>  source code to seek answers to such questions of detail as I have.
>>>>>>>>  Furthermore, I would oppose adding such an appendix on the grounds
>>>>>>>> that
>>>>>>>>  it would be exceedingly difficult to use to resolve questions such
>>>>>>>> as
>>>>>>>>  mine.
>>>>>>>>
>>>>>>>>  I am likewise unwilling to rely on the behavior the python binary
>>>>>>>> that
>>>>>>>>  happens to be installed on my computer to answer them.  If the
>>>>>>>> correct
>>>>>>>>  behavior is not documented independent of the program then there is
>>>>>>>> no
>>>>>>>>  particular reason to trust that it won't change in future versions,
>>>>>>>> or
>>>>>>>>  that any particular implementation is correct or bug-free.
>>>>>>>>
>>>>>>>>
>>>>>>>>  Regards,
>>>>>>>>
>>>>>>>>  John
>>>>>>>>
>>>>>>>>  --
>>>>>>>>  John C. Bollinger, Ph.D.
>>>>>>>>  Department of Structural Biology
>>>>>>>>  St. Jude Children's Research Hospital
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>  Email Disclaimer:
>>>>>>>>
>>>>>>>> <http://www.stjude.org/emaildisclaimer>www.stjude.org/emaildisclaimer
>>>>>>>>
>>>>>>>>  _______________________________________________
>>>>>>>>  ddlm-group mailing list
>>>>>>>>  <mailto:ddlm-group@iucr.org>ddlm-group@iucr.org
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> <http://scripts.iucr.org/mailman/listinfo/ddlm-group>http://scripts.iucr.org/mailman/listinfo/ddlm-group
>>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> ddlm-group mailing list
>>>>>>> <mailto:ddlm-group@iucr.org>ddlm-group@iucr.org
>>>>>>>
>>>>>>>
>>>>>>> <http://scripts.iucr.org/mailman/listinfo/ddlm-group>http://scripts.iucr.org/mailman/listinfo/ddlm-group
>>>>>>>
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> ddlm-group mailing list
>>>>>>> ddlm-group@iucr.org
>>>>>>> http://scripts.iucr.org/mailman/listinfo/ddlm-group
>>>>>>
>>>>>>
>>>>>> --
>>>>>> =====================================================
>>>>>>  Herbert J. Bernstein, Professor of Computer Science
>>>>>>    Dowling College, Kramer Science Center, KSC 121
>>>>>>         Idle Hour Blvd, Oakdale, NY, 11769
>>>>>>
>>>>>>                  +1-631-244-3035
>>>>>>                  yaya@dowling.edu
>>>>>> =====================================================
>>>>>> _______________________________________________
>>>>>> ddlm-group mailing list
>>>>>> ddlm-group@iucr.org
>>>>>> http://scripts.iucr.org/mailman/listinfo/ddlm-group
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> T +61 (02) 9717 9907
>>>>> F +61 (02) 9717 3145
>>>>> M +61 (04) 0249 4148
>>>>> _______________________________________________
>>>>> ddlm-group mailing list
>>>>> ddlm-group@iucr.org
>>>>> http://scripts.iucr.org/mailman/listinfo/ddlm-group
>>>
>>> _______________________________________________
>>> ddlm-group mailing list
>>> ddlm-group@iucr.org
>>> http://scripts.iucr.org/mailman/listinfo/ddlm-group
>>>
>>>
>>
>>
>>
>> --
>> T +61 (02) 9717 9907
>> F +61 (02) 9717 3145
>> M +61 (04) 0249 4148
>> _______________________________________________
>> ddlm-group mailing list
>> ddlm-group@iucr.org
>> http://scripts.iucr.org/mailman/listinfo/ddlm-group
>
> _______________________________________________
> ddlm-group mailing list
> ddlm-group@iucr.org
> http://scripts.iucr.org/mailman/listinfo/ddlm-group
>
>



-- 
T +61 (02) 9717 9907
F +61 (02) 9717 3145
M +61 (04) 0249 4148
_______________________________________________
ddlm-group mailing list
ddlm-group@iucr.org
http://scripts.iucr.org/mailman/listinfo/ddlm-group


Reply to: [list | sender only]