[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [ddlm-group] Simon's elide proposal

To: Group finalising DDLm and associated dictionaries <[email protected]>
Subject: Re: [ddlm-group] Simon's elide proposal
From: "Herbert J. Bernstein" <[email protected]>
Date: Wed, 12 Jan 2011 05:39:31 -0500 (EST)
In-Reply-To: <[email protected]>
References: <[email protected]>

Actually, Simon's proposal, while useful, is not complete,
inasmuch as \" and \' are not handled yet.  I urge adoption
of my compromise suggestion as written.   Without it,
we are going down the same slippery slope we crashed on
the last time we tried to resolve this issue. -- Herbert

=====================================================
  Herbert J. Bernstein, Professor of Computer Science
    Dowling College, Kramer Science Center, KSC 121
         Idle Hour Blvd, Oakdale, NY, 11769

                  +1-631-244-3035
                  [email protected]
=====================================================

On Wed, 12 Jan 2011, James Hester wrote:

> Note that Simon's proposal *does* completely answer Ralf's concern
> about the lack of elide mechanism in triple quoted strings.  It
> provides line folding as well.  I for one would consider our job
> finished if we were to adopt Simon's proposal, and see no need for the
> further steps proposed by Herbert.  Herbert is of course welcome to
> propose including the various Python behaviours as a separate
> amendment to the CIF2 standard.
>
> I would propose a slight tweak to Simon's proposal, so that it works as follows:
>
> The datavalue is obtained from the triple-quoted string in two steps:
> (1) All instances of <backslash><eol> are removed from the string
> where the <backslash> is not preceded by another <backslash>
> (2) All other instances of <backslash><eol> are replaced with <eol>
>
> This means that a sequence of n backslashes followed by newline is
> replaced by a sequence of n-1 backslashes followed by newline, except
> if there is one backslash before the newline, in which case both
> newline and backslash are removed.  Triple quote sequences are elided
> by inserting a <backslash><eol> sequence between <delimiter>
> characters to break up the triple delimiter sequence.  Note also that
> backslash has no special meaning if not in a sequence finishing with
> <eol>.
>
> I will be posting a separate email, hopefully tonight, where I will
> list the current elide proposals and request that we all indicate
> which ones are potentially acceptable to us, with a ranking if
> possible.  This may help us to restrict discussion to something that
> is mutually acceptable.
>
> On Sun, Jan 9, 2011 at 2:45 AM, Herbert J. Bernstein
> <[email protected]> wrote:
>> Here is a possible compromise. �This thread began with
>> Ralf's concern about the lack of an elide mechanism
>> in treble quoted strings. �Simon's suggestion does
>> not really answer that question, but it is a reasonable
>> step in that direction. �So, how about ...
>>
>> 1. �Immediately adopt Simon's suggestion to allow the
>> \\n and \\ elides in treble quoted strings. �Except for
>> the confusion in the meaning of \"""" if a more general
>> elide is eventually adopted that should cause very little
>> stress for anybody.
>>
>> 2. �Add Ralf's proosed new section 7 to the CIF2
>> document as a proposal under discussion, with the
>> advice that people may wish to avoid creating treble-quoted
>> string that conflict with the full python elide
>> conventions.
>>
>> 3. �Provide a coherent discussion document for COMCIFS
>> and the community at large on the alternatives in
>> handling the treble-quoted string, asking for comments
>> to the list prior to the Madrid meeting. �I would suggest
>> that Ralf be asked to contribute a page or 2 on the
>> merits of his proposal and that either John B. or James
>> contribute a page or 2 on their objections and alternatives.
>>
>> 4. �Discuss it face to face at the Msdrid meeting and
>> try to come to a resolution.
>>
>> 5. �Move forward with the rest of CIF2 as proposed in
>> the meantime so we will be ready to discuss all of CIF2
>> at the Madrid meeting, with a effort to have sample parsers
>> and data sets available on the web prior to the meeting.
>>
>> Regards,
>> �Herbert
>>
>> =====================================================
>> �Herbert J. Bernstein, Professor of Computer Science
>> � Dowling College, Kramer Science Center, KSC 121
>> � � � �Idle Hour Blvd, Oakdale, NY, 11769
>>
>> � � � � � � � � +1-631-244-3035
>> � � � � � � � � [email protected]
>> =====================================================
>>
>> On Sat, 8 Jan 2011, Herbert J. Bernstein wrote:
>>
>>> Dear James,
>>>
>>> �You are clearly a much better programmer than I am. When I got down into
>>> the interactions among the treble quote, single quotes, text fields, elides,
>>> the bracketed constructs and comments in the lexical scan, I found the going
>>> tough. �If you have it done neatly, I would greatly appreciate seeing it.
>>>
>>> �I think we need a face to face meeting or Skype meeting to resolve not
>>> just this one issue, but the process of getting a workable CIF2. �Perhaps we
>>> can finally get to do that in Madrid.
>>>
>>> �Regards,
>>> � Herbert
>>>
>>>
>>> =====================================================
>>> Herbert J. Bernstein, Professor of Computer Science
>>> �Dowling College, Kramer Science Center, KSC 121
>>> � � � Idle Hour Blvd, Oakdale, NY, 11769
>>>
>>> � � � � � � � �+1-631-244-3035
>>> � � � � � � � �[email protected]
>>> =====================================================
>>>
>>> On Sat, 8 Jan 2011, James Hester wrote:
>>>
>>>> I can't let these assertions go unchallenged:
>>>>
>>>> On Sat, Jan 8, 2011 at 12:04 PM, Herbert J. Bernstein
>>>> <[email protected]> wrote:
>>>>>
>>>>> Dear Simon,
>>>>>
>>>>> � Adoption of Ralf's proposal will ...
>>>>>
>>>>> � 1. �Make it much easier to create a CIF2 parser, because for one of
>>>>> the messiest parts of the code we will have a clear specification,
>>>>> sample code and a way to validate the tough cases.
>>>>
>>>> If we adopt a simpler spec than the Python in toto spec:
>>>> - there will be many fewer tough cases
>>>> - there will be a simpler and therefore clearer specification
>>>> - for many alternative schemes the lexer will be unchanged from the
>>>> current version, with the elide behaviour
>>>> �simply requiring a search and replace following lexing
>>>> Triple-quoted string handling is not currently a messy part of the
>>>> code, I don't understand why you think this. �It will become
>>>> significantly more complex under Ralf's proposal.
>>>>
>>>>> � 2. �Make it easier for users to conform the the quoting rules, because
>>>>> at least that one part of CIF2 will be thoroughly documented with lots
>>>>> of examples.
>>>>
>>>> Quoting rules are not rocket science. �About 3 examples will be
>>>> enough, if we adopt a simple specification rather
>>>> than the unicode+raw+lots of escapes that the Python proposal entails.
>>>> Doing things the Python way would
>>>> imply more chance for user misunderstanding, especially bearing in
>>>> mind that CIF2 users are not necessarily
>>>> Python programmers or even programmers at all. �For these users, there
>>>> is absolutely no benefit in adopting Python or any other language's
>>>> approach - they are unfamiliar with them all.
>>>>
>>>>> � 3. �Make is easier for the journals and archives to deal with "odd"
>>>>> CIF2 files containing complex treble quoted strings because at
>>>>> least �that one part of CIF2 will be throughly documented with lots
>>>>> of examples, and, with a utility (IDLE) all ready to allow them
>>>>> to pull out a troublesome treble-quoted string and figure out what
>>>>> it means or what it might mean if some intuitive change were made.
>>>>
>>>> The simpler the spec, the less likely mistakes will be made and the
>>>> less chance of ambiguity.
>>>>
>>>>> � Yes, if Ralf's proposal happens to be rejected, we will still have
>>>>> a problem in the lack of elide handling, and yes we will have to
>>>>> put in the time an effort to consider those alternatives, but, first,
>>>>> in order to have some chance of finishing the specification of CIF2
>>>>> before the summer meeting deadlines (at least one of which is in
>>>>> just a little more than 3 weeks), might it not be a good idea
>>>>> to discuss and consider what was actually proposed instead of
>>>>> chasing after lots of plausible alternatives that we already discussed
>>>>> and rejected, and so are not very likely to agree upon rapidly now.
>>>>
>>>> I have some hope that, by restricting our discussion to treble-quoted
>>>> strings, we can make progress compared to previous attempts. �I have
>>>> considered and discussed at length Ralf's proposal, and would be
>>>> interested in your responses to my particular objections.
>>>>
>>>>> � So, before I will delve into the many subtle variations of elide
>>>>> mechanisms, I would appreciate our finishing consideration of Ralf's
>>>>> actual proposal:
>>>>>
>>>>> =======================
>>>>>
>>>>> His revised wording (with one correction) is:
>>>>>
>>>>> ========================
>>>>>
>>>>> CHANGE 7 NEW
>>>>>
>>>>>
>>>>> Triple-quote delimited strings.
>>>>>
>>>>> The following ASCII sequences delimit the beginning of a string:
>>>>>
>>>>> � � """
>>>>> � � '''
>>>>> � � r"""
>>>>> � � r'''
>>>>> � � u"""
>>>>> � � u'''
>>>>>
>>>>> The characters following the delimiter sequence are interpreted
>>>>> with exactly the same algorithm as implemented for triple-quoted
>>>>> strings in the Python programming language version 2 series.
>>>>> In this algorithm, triple-quoted strings are terminated by matching
>>>>> """ or ''' delimiters.
>>>>>
>>>>> For example
>>>>>
>>>>> � � """He said "His name is O'Hearly"."""
>>>>> � � r'''In {\bf \TeX} the accents are \' and \".'''
>>>>>
>>>>> Triple-quoted strings provide a reliable mechanism for storing any
>>>>> arbitrary string in a CIF2 file.
>>>>>
>>>>> =========================
>>>>>
>>>>> This is cleaner and simpler than the original change 7 wording.
>>>>> It probably does not conflict with existing CIF1 documents and the
>>>>> _only_ CIF2 documents it can conflict with are the very few
>>>>> that happen to end in \""" or \''''. �The new leading delimiters
>>>>> r""", r''', u""" and u''' will have to be added to the list of forbidden
>>>>> starts to white-space delimited data values in change 5. �In exchange
>>>>> for
>>>>> this minor adjustments to valid CIF2 syntax we gain a fully documented,
>>>>> software supported way to include arbitrary strings in a CIF2 document
>>>>> that people are already used to working with.
>>>>>
>>>>> I have reviewed the discussion of the "use of elides in strings"
>>>>> thread in the ddlm-group discussion list, and, while I did not
>>>>> then and do not now understand the objections to the general use
>>>>> of elides in quoted strings, I particularly do not understand
>>>>> the logic of objecting to the use of elides in treble-quoted strings,
>>>>> which are a construct completely new to CIF and therefore in
>>>>> conflict with no existing data files.
>>>>>
>>>>> Would those who have an objection to Ralf's proposal please
>>>>> state their objections. �An objection that says we object because
>>>>> in past discussions another body could not manage to come to an
>>>>> agreement and just gave up does not speak to the merits of this
>>>>> specific proposal.
>>>>>
>>>>> I have no idea why we are considering other proposals before
>>>>> settling the status of Ralf's proposal.
>>>>
>>>> It is also useful to know what the alternatives might be when
>>>> considering a proposal.
>>>>
>>>>> I agree with Ralf's proposal.
>>>>>
>>>>> Regards,
>>>>> � Herbert
>>>>>
>>>>> At 12:37 AM +0000 1/8/11, SIMON WESTRIP wrote:
>>>>>>
>>>>>> Dear Herbert
>>>>>>
>>>>>> I fail to see how the adoption of python string quoting rules is going
>>>>>> to
>>>>>> make life easier for anyone other than a python programmer?
>>>>>> Even then, the mechanism is restricted to treble-quoted strings,
>>>>>> which are only
>>>>>> one part of CIF. Maybe I've missed something, but just because CIF
>>>>>> might share
>>>>>> common syntax with a programming language in one respect, does not
>>>>>> necessarily mean
>>>>>> that the tools of that medium are available to CIF?
>>>>>>
>>>>>> If you're looking to base CIF extensions on established mechanisms,
>>>>>> why not adopt
>>>>>> the minimal \(newline) and \\ escape sequences, which in essence are
>>>>>> the same as
>>>>>> the established CIF line-folding protocol (just dropping the initial
>>>>>> \ following the opening
>>>>>> delimiter and formalising the protocol as an inherent part of the
>>>>>> spec). Afterall, I beleive you
>>>>>> have already been using it, or at least interpreted it, as a means
>>>>>> to escape 'semicolon delimiters' within
>>>>>> semicolon-delimited values (I seem to recall discussions that
>>>>>> identified an issue with the published 'trip tests'
>>>>>> relating to line folding).
>>>>>>
>>>>>> Forgive me if I have missed something regarding the usefulness of
>>>>>> python in CIF; please enlighten me
>>>>>> as to its benefits if I were to write a CIF reader using anything
>>>>>> but python. As far as I can see, the only
>>>>>> advantages lie in the fact that the logic is established and thus
>>>>>> unquestionable; but that does not mean it is
>>>>>> necessarily entirely appropriate for CIF (which afterall isn't a
>>>>>> programming language).
>>>>>>
>>>>>> Cheers
>>>>>>
>>>>>> Simon
>>>>>>
>>>>>>
>>>>>>
>>>>>> From: Herbert J. Bernstein <[email protected]>
>>>>>> To: Group finalising DDLm and associated dictionaries
>>>>>> <[email protected]>
>>>>>> Sent: Friday, 7 January, 2011 23:07:40
>>>>>> Subject: Re: [ddlm-group] Eliding in triple-quoted strings:
>>>>>> Proposals C and D. .. .. .
>>>>>>
>>>>>> Dear Colleagues,
>>>>>>
>>>>>> � Ralf's proposal is what it is. �Before we go haring off in other
>>>>>> directions, we should respond constructively to what he has proposed.
>>>>>> I support it. �Ralf and John W. support it. �John B. and James H.
>>>>>> oppose it. �I think they are mistaken because ...
>>>>>>
>>>>>> � It is well and good to adopt a "Real Programmers Don't Each
>>>>>> Quiche" let's-start-from-scratch-and-roll-our-own approach when
>>>>>> you have the resources to accomplish our goals that way. �It
>>>>>> is a lot of fun, and has the potential to truly advance the
>>>>>> field, but it is also, in the current funding climate, unrealistic.
>>>>>>
>>>>>> � In the U.S., there is a serious prospect to science funding being
>>>>>> cut back so severely that the hit rates on grants next year may
>>>>>> be as low as 1 in 10. �I suspect an honest review of funding prospects
>>>>>> in other countries will uncover similarly dire warnings.
>>>>>>
>>>>>> � This does not mean we are all going out of buisness, but we do have
>>>>>> to be careful to conserve resources and focus our do-it-from-scratch
>>>>>> efforts on those areas that have the highest priority, and I fear,
>>>>>> for most of our community, CIF2, while important, is not likely to
>>>>>> be seen as worth that approach, and certainly filing the edges of
>>>>>> a brand-new treble quote spec is likely to be very far down
>>>>>> on anybody's priority list.
>>>>>>
>>>>>> Ralf has made a proposal that will save all of us a lot of effort
>>>>>> and allow us to devote more resources to higher priority problems.
>>>>>>
>>>>>> Not only is he right on this one point, but I urge us to look for
>>>>>> other areas where we can get to CIF2 by building on work that is
>>>>>> already done.
>>>>>>
>>>>>> This is not a good time for wheel-reinvention.
>>>>>>
>>>>>> I would appreciate knowing from those who wish to reinvent this
>>>>>> particular wheel, why they wish to do that and from where they
>>>>>> expect to get the resources to do it?
>>>>>>
>>>>>> Regards,
>>>>>> � Herbert
>>>>>>
>>>>>> =====================================================
>>>>>> � Herbert J. Bernstein, Professor of Computer Science
>>>>>> � � Dowling College, Kramer Science Center, KSC 121
>>>>>> � � � � Idle Hour Blvd, Oakdale, NY, 11769
>>>>>>
>>>>>> � � � � � � � � � +1-631-244-3035
>>>>>> � � � � � � � � � <mailto:[email protected]>[email protected]
>>>>>> =====================================================
>>>>>>
>>>>>> On Fri, 7 Jan 2011, Bollinger, John C wrote:
>>>>>>
>>>>>>>
>>>>>>> �On Friday, January 07, 2011 3:14 PM, Herbert J. Bernstein wrote:
>>>>>>>
>>>>>>>> �We seem not to be communicating effectively.
>>>>>>
>>>>>> �>>
>>>>>>>>
>>>>>>>> �What I am asking for is an _existing_, supported treble quote
>>>>>>>> specification
>>>>>>>> �from an _existing_ language with _existing_ documentation and
>>>>>>>> �_existing_ software as an alternative to the Python specification,
>>>>>>>> �documentation and software to which we all have access, that is
>>>>>>>> being
>>>>>>>> �proposed as an alternative
>>>>>>>> �to what Ralf has proposed.
>>>>>>>
>>>>>>> �Thank you for that clarification. �You are right, I didn't understand
>>>>>>> �what you were asking for.
>>>>>>>
>>>>>>> �I hope this will likewise clarify my position: I reject the premise
>>>>>>> that
>>>>>>> �the system we choose must meet those criteria, and I oppose adopting
>>>>>>> the
>>>>>>> �full Python syntax and semantics.
>>>>>>>
>>>>>>>> �The Python specification is available at
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> <http://docs.python.org/reference/index.html>http://docs.python.org/reference/index.html
>>>>>>>>
>>>>>>>> �with the lexical analysis at
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> <http://docs.python.org/reference/lexical_analysis.html>http://docs.python.org/reference/lexical_analysis.html
>>>>>>>
>>>>>>> �Thanks, though that is exactly what I was looking at already. �It
>>>>>>> leaves
>>>>>>> �several details unclear, some of which I discussed in previous
>>>>>>> messages.
>>>>>>> �Hence, I consider it slightly short of a *full* specification. �It
>>>>>>> does,
>>>>>>> �however, provide my grounds for opposing adoption of that scheme for
>>>>>>> �CIF.
>>>>>>>
>>>>>>>> �The complete source code and binaries are available at:
>>>>>>>
>>>>>>> �Unless you propose to append a particular set of Python sources to
>>>>>>> the
>>>>>>> �CIF specification as a reference, I have no interest in perusing the
>>>>>>> �source code to seek answers to such questions of detail as I have.
>>>>>>> �Furthermore, I would oppose adding such an appendix on the grounds
>>>>>>> that
>>>>>>> �it would be exceedingly difficult to use to resolve questions such as
>>>>>>> �mine.
>>>>>>>
>>>>>>> �I am likewise unwilling to rely on the behavior the python binary
>>>>>>> that
>>>>>>> �happens to be installed on my computer to answer them. �If the
>>>>>>> correct
>>>>>>> �behavior is not documented independent of the program then there is
>>>>>>> no
>>>>>>> �particular reason to trust that it won't change in future versions,
>>>>>>> or
>>>>>>> �that any particular implementation is correct or bug-free.
>>>>>>>
>>>>>>>
>>>>>>> �Regards,
>>>>>>>
>>>>>>> �John
>>>>>>>
>>>>>>> �--
>>>>>>> �John C. Bollinger, Ph.D.
>>>>>>> �Department of Structural Biology
>>>>>>> �St. Jude Children's Research Hospital
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> �Email Disclaimer:
>>>>>>> <http://www.stjude.org/emaildisclaimer>www.stjude.org/emaildisclaimer
>>>>>>>
>>>>>>> �_______________________________________________
>>>>>>> �ddlm-group mailing list
>>>>>>> �<mailto:[email protected]>[email protected]
>>>>>>>
>>>>>>>
>>>>>>> <http://scripts.iucr.org/mailman/listinfo/ddlm-group>http://scripts.iucr.org/mailman/listinfo/ddlm-group
>>>>>>>
>>>>>> _______________________________________________
>>>>>> ddlm-group mailing list
>>>>>> <mailto:[email protected]>[email protected]
>>>>>>
>>>>>> <http://scripts.iucr.org/mailman/listinfo/ddlm-group>http://scripts.iucr.org/mailman/listinfo/ddlm-group
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> ddlm-group mailing list
>>>>>> [email protected]
>>>>>> http://scripts.iucr.org/mailman/listinfo/ddlm-group
>>>>>
>>>>>
>>>>> --
>>>>> =====================================================
>>>>> �Herbert J. Bernstein, Professor of Computer Science
>>>>> � �Dowling College, Kramer Science Center, KSC 121
>>>>> � � � � Idle Hour Blvd, Oakdale, NY, 11769
>>>>>
>>>>> � � � � � � � � �+1-631-244-3035
>>>>> � � � � � � � � �[email protected]
>>>>> =====================================================
>>>>> _______________________________________________
>>>>> ddlm-group mailing list
>>>>> [email protected]
>>>>> http://scripts.iucr.org/mailman/listinfo/ddlm-group
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> T +61 (02) 9717 9907
>>>> F +61 (02) 9717 3145
>>>> M +61 (04) 0249 4148
>>>> _______________________________________________
>>>> ddlm-group mailing list
>>>> [email protected]
>>>> http://scripts.iucr.org/mailman/listinfo/ddlm-group
>>
>> _______________________________________________
>> ddlm-group mailing list
>> [email protected]
>> http://scripts.iucr.org/mailman/listinfo/ddlm-group
>>
>>
>
>
>
> -- 
> T +61 (02) 9717 9907
> F +61 (02) 9717 3145
> M +61 (04) 0249 4148
> _______________________________________________
> ddlm-group mailing list
> [email protected]
> http://scripts.iucr.org/mailman/listinfo/ddlm-group
>

_______________________________________________
ddlm-group mailing list
[email protected]
http://scripts.iucr.org/mailman/listinfo/ddlm-group

Reply to: [list | sender only]

Follow-Ups:

Re: [ddlm-group] Simon's elide proposal (James Hester)

Re: [ddlm-group] Simon's elide proposal (Herbert J. Bernstein)

References:

[ddlm-group] Simon's elide proposal (James Hester)

Prev by Date: [ddlm-group] Simon's elide proposal

Next by Date: Re: [ddlm-group] Simon's elide proposal

Prev by thread: [ddlm-group] Simon's elide proposal

Next by thread: Re: [ddlm-group] Simon's elide proposal

Index(es):

Date

Thread

Discussion List Archives

Re: [ddlm-group] Simon's elide proposal