Discussion List Archives

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[ddlm-group] Simon's elide proposal

Note that Simon's proposal *does* completely answer Ralf's concern
about the lack of elide mechanism in triple quoted strings.  It
provides line folding as well.  I for one would consider our job
finished if we were to adopt Simon's proposal, and see no need for the
further steps proposed by Herbert.  Herbert is of course welcome to
propose including the various Python behaviours as a separate
amendment to the CIF2 standard.

I would propose a slight tweak to Simon's proposal, so that it works as follows:

The datavalue is obtained from the triple-quoted string in two steps:
(1) All instances of <backslash><eol> are removed from the string
where the <backslash> is not preceded by another <backslash>
(2) All other instances of <backslash><eol> are replaced with <eol>

This means that a sequence of n backslashes followed by newline is
replaced by a sequence of n-1 backslashes followed by newline, except
if there is one backslash before the newline, in which case both
newline and backslash are removed.  Triple quote sequences are elided
by inserting a <backslash><eol> sequence between <delimiter>
characters to break up the triple delimiter sequence.  Note also that
backslash has no special meaning if not in a sequence finishing with

I will be posting a separate email, hopefully tonight, where I will
list the current elide proposals and request that we all indicate
which ones are potentially acceptable to us, with a ranking if
possible.  This may help us to restrict discussion to something that
is mutually acceptable.

On Sun, Jan 9, 2011 at 2:45 AM, Herbert J. Bernstein
<yaya@bernstein-plus-sons.com> wrote:
> Here is a possible compromise.  This thread began with
> Ralf's concern about the lack of an elide mechanism
> in treble quoted strings.  Simon's suggestion does
> not really answer that question, but it is a reasonable
> step in that direction.  So, how about ...
> 1.  Immediately adopt Simon's suggestion to allow the
> \\n and \\ elides in treble quoted strings.  Except for
> the confusion in the meaning of \"""" if a more general
> elide is eventually adopted that should cause very little
> stress for anybody.
> 2.  Add Ralf's proosed new section 7 to the CIF2
> document as a proposal under discussion, with the
> advice that people may wish to avoid creating treble-quoted
> string that conflict with the full python elide
> conventions.
> 3.  Provide a coherent discussion document for COMCIFS
> and the community at large on the alternatives in
> handling the treble-quoted string, asking for comments
> to the list prior to the Madrid meeting.  I would suggest
> that Ralf be asked to contribute a page or 2 on the
> merits of his proposal and that either John B. or James
> contribute a page or 2 on their objections and alternatives.
> 4.  Discuss it face to face at the Msdrid meeting and
> try to come to a resolution.
> 5.  Move forward with the rest of CIF2 as proposed in
> the meantime so we will be ready to discuss all of CIF2
> at the Madrid meeting, with a effort to have sample parsers
> and data sets available on the web prior to the meeting.
> Regards,
>  Herbert
> =====================================================
>  Herbert J. Bernstein, Professor of Computer Science
>   Dowling College, Kramer Science Center, KSC 121
>        Idle Hour Blvd, Oakdale, NY, 11769
>                 +1-631-244-3035
>                 yaya@dowling.edu
> =====================================================
> On Sat, 8 Jan 2011, Herbert J. Bernstein wrote:
>> Dear James,
>>  You are clearly a much better programmer than I am. When I got down into
>> the interactions among the treble quote, single quotes, text fields, elides,
>> the bracketed constructs and comments in the lexical scan, I found the going
>> tough.  If you have it done neatly, I would greatly appreciate seeing it.
>>  I think we need a face to face meeting or Skype meeting to resolve not
>> just this one issue, but the process of getting a workable CIF2.  Perhaps we
>> can finally get to do that in Madrid.
>>  Regards,
>>   Herbert
>> =====================================================
>> Herbert J. Bernstein, Professor of Computer Science
>>  Dowling College, Kramer Science Center, KSC 121
>>       Idle Hour Blvd, Oakdale, NY, 11769
>>                +1-631-244-3035
>>                yaya@dowling.edu
>> =====================================================
>> On Sat, 8 Jan 2011, James Hester wrote:
>>> I can't let these assertions go unchallenged:
>>> On Sat, Jan 8, 2011 at 12:04 PM, Herbert J. Bernstein
>>> <yaya@bernstein-plus-sons.com> wrote:
>>>> Dear Simon,
>>>>   Adoption of Ralf's proposal will ...
>>>>   1.  Make it much easier to create a CIF2 parser, because for one of
>>>> the messiest parts of the code we will have a clear specification,
>>>> sample code and a way to validate the tough cases.
>>> If we adopt a simpler spec than the Python in toto spec:
>>> - there will be many fewer tough cases
>>> - there will be a simpler and therefore clearer specification
>>> - for many alternative schemes the lexer will be unchanged from the
>>> current version, with the elide behaviour
>>>  simply requiring a search and replace following lexing
>>> Triple-quoted string handling is not currently a messy part of the
>>> code, I don't understand why you think this.  It will become
>>> significantly more complex under Ralf's proposal.
>>>>   2.  Make it easier for users to conform the the quoting rules, because
>>>> at least that one part of CIF2 will be thoroughly documented with lots
>>>> of examples.
>>> Quoting rules are not rocket science.  About 3 examples will be
>>> enough, if we adopt a simple specification rather
>>> than the unicode+raw+lots of escapes that the Python proposal entails.
>>> Doing things the Python way would
>>> imply more chance for user misunderstanding, especially bearing in
>>> mind that CIF2 users are not necessarily
>>> Python programmers or even programmers at all.  For these users, there
>>> is absolutely no benefit in adopting Python or any other language's
>>> approach - they are unfamiliar with them all.
>>>>   3.  Make is easier for the journals and archives to deal with "odd"
>>>> CIF2 files containing complex treble quoted strings because at
>>>> least  that one part of CIF2 will be throughly documented with lots
>>>> of examples, and, with a utility (IDLE) all ready to allow them
>>>> to pull out a troublesome treble-quoted string and figure out what
>>>> it means or what it might mean if some intuitive change were made.
>>> The simpler the spec, the less likely mistakes will be made and the
>>> less chance of ambiguity.
>>>>   Yes, if Ralf's proposal happens to be rejected, we will still have
>>>> a problem in the lack of elide handling, and yes we will have to
>>>> put in the time an effort to consider those alternatives, but, first,
>>>> in order to have some chance of finishing the specification of CIF2
>>>> before the summer meeting deadlines (at least one of which is in
>>>> just a little more than 3 weeks), might it not be a good idea
>>>> to discuss and consider what was actually proposed instead of
>>>> chasing after lots of plausible alternatives that we already discussed
>>>> and rejected, and so are not very likely to agree upon rapidly now.
>>> I have some hope that, by restricting our discussion to treble-quoted
>>> strings, we can make progress compared to previous attempts.  I have
>>> considered and discussed at length Ralf's proposal, and would be
>>> interested in your responses to my particular objections.
>>>>   So, before I will delve into the many subtle variations of elide
>>>> mechanisms, I would appreciate our finishing consideration of Ralf's
>>>> actual proposal:
>>>> =======================
>>>> His revised wording (with one correction) is:
>>>> ========================
>>>> Triple-quote delimited strings.
>>>> The following ASCII sequences delimit the beginning of a string:
>>>>     """
>>>>     '''
>>>>     r"""
>>>>     r'''
>>>>     u"""
>>>>     u'''
>>>> The characters following the delimiter sequence are interpreted
>>>> with exactly the same algorithm as implemented for triple-quoted
>>>> strings in the Python programming language version 2 series.
>>>> In this algorithm, triple-quoted strings are terminated by matching
>>>> """ or ''' delimiters.
>>>> For example
>>>>     """He said "His name is O'Hearly"."""
>>>>     r'''In {\bf \TeX} the accents are \' and \".'''
>>>> Triple-quoted strings provide a reliable mechanism for storing any
>>>> arbitrary string in a CIF2 file.
>>>> =========================
>>>> This is cleaner and simpler than the original change 7 wording.
>>>> It probably does not conflict with existing CIF1 documents and the
>>>> _only_ CIF2 documents it can conflict with are the very few
>>>> that happen to end in \""" or \''''.  The new leading delimiters
>>>> r""", r''', u""" and u''' will have to be added to the list of forbidden
>>>> starts to white-space delimited data values in change 5.  In exchange
>>>> for
>>>> this minor adjustments to valid CIF2 syntax we gain a fully documented,
>>>> software supported way to include arbitrary strings in a CIF2 document
>>>> that people are already used to working with.
>>>> I have reviewed the discussion of the "use of elides in strings"
>>>> thread in the ddlm-group discussion list, and, while I did not
>>>> then and do not now understand the objections to the general use
>>>> of elides in quoted strings, I particularly do not understand
>>>> the logic of objecting to the use of elides in treble-quoted strings,
>>>> which are a construct completely new to CIF and therefore in
>>>> conflict with no existing data files.
>>>> Would those who have an objection to Ralf's proposal please
>>>> state their objections.  An objection that says we object because
>>>> in past discussions another body could not manage to come to an
>>>> agreement and just gave up does not speak to the merits of this
>>>> specific proposal.
>>>> I have no idea why we are considering other proposals before
>>>> settling the status of Ralf's proposal.
>>> It is also useful to know what the alternatives might be when
>>> considering a proposal.
>>>> I agree with Ralf's proposal.
>>>> Regards,
>>>>   Herbert
>>>> At 12:37 AM +0000 1/8/11, SIMON WESTRIP wrote:
>>>>> Dear Herbert
>>>>> I fail to see how the adoption of python string quoting rules is going
>>>>> to
>>>>> make life easier for anyone other than a python programmer?
>>>>> Even then, the mechanism is restricted to treble-quoted strings,
>>>>> which are only
>>>>> one part of CIF. Maybe I've missed something, but just because CIF
>>>>> might share
>>>>> common syntax with a programming language in one respect, does not
>>>>> necessarily mean
>>>>> that the tools of that medium are available to CIF?
>>>>> If you're looking to base CIF extensions on established mechanisms,
>>>>> why not adopt
>>>>> the minimal \(newline) and \\ escape sequences, which in essence are
>>>>> the same as
>>>>> the established CIF line-folding protocol (just dropping the initial
>>>>> \ following the opening
>>>>> delimiter and formalising the protocol as an inherent part of the
>>>>> spec). Afterall, I beleive you
>>>>> have already been using it, or at least interpreted it, as a means
>>>>> to escape 'semicolon delimiters' within
>>>>> semicolon-delimited values (I seem to recall discussions that
>>>>> identified an issue with the published 'trip tests'
>>>>> relating to line folding).
>>>>> Forgive me if I have missed something regarding the usefulness of
>>>>> python in CIF; please enlighten me
>>>>> as to its benefits if I were to write a CIF reader using anything
>>>>> but python. As far as I can see, the only
>>>>> advantages lie in the fact that the logic is established and thus
>>>>> unquestionable; but that does not mean it is
>>>>> necessarily entirely appropriate for CIF (which afterall isn't a
>>>>> programming language).
>>>>> Cheers
>>>>> Simon
>>>>> From: Herbert J. Bernstein <yaya@bernstein-plus-sons.com>
>>>>> To: Group finalising DDLm and associated dictionaries
>>>>> <ddlm-group@iucr.org>
>>>>> Sent: Friday, 7 January, 2011 23:07:40
>>>>> Subject: Re: [ddlm-group] Eliding in triple-quoted strings:
>>>>> Proposals C and D. .. .. .
>>>>> Dear Colleagues,
>>>>>   Ralf's proposal is what it is.  Before we go haring off in other
>>>>> directions, we should respond constructively to what he has proposed.
>>>>> I support it.  Ralf and John W. support it.  John B. and James H.
>>>>> oppose it.  I think they are mistaken because ...
>>>>>   It is well and good to adopt a "Real Programmers Don't Each
>>>>> Quiche" let's-start-from-scratch-and-roll-our-own approach when
>>>>> you have the resources to accomplish our goals that way.  It
>>>>> is a lot of fun, and has the potential to truly advance the
>>>>> field, but it is also, in the current funding climate, unrealistic.
>>>>>   In the U.S., there is a serious prospect to science funding being
>>>>> cut back so severely that the hit rates on grants next year may
>>>>> be as low as 1 in 10.  I suspect an honest review of funding prospects
>>>>> in other countries will uncover similarly dire warnings.
>>>>>   This does not mean we are all going out of buisness, but we do have
>>>>> to be careful to conserve resources and focus our do-it-from-scratch
>>>>> efforts on those areas that have the highest priority, and I fear,
>>>>> for most of our community, CIF2, while important, is not likely to
>>>>> be seen as worth that approach, and certainly filing the edges of
>>>>> a brand-new treble quote spec is likely to be very far down
>>>>> on anybody's priority list.
>>>>> Ralf has made a proposal that will save all of us a lot of effort
>>>>> and allow us to devote more resources to higher priority problems.
>>>>> Not only is he right on this one point, but I urge us to look for
>>>>> other areas where we can get to CIF2 by building on work that is
>>>>> already done.
>>>>> This is not a good time for wheel-reinvention.
>>>>> I would appreciate knowing from those who wish to reinvent this
>>>>> particular wheel, why they wish to do that and from where they
>>>>> expect to get the resources to do it?
>>>>> Regards,
>>>>>   Herbert
>>>>> =====================================================
>>>>>   Herbert J. Bernstein, Professor of Computer Science
>>>>>     Dowling College, Kramer Science Center, KSC 121
>>>>>         Idle Hour Blvd, Oakdale, NY, 11769
>>>>>                   +1-631-244-3035
>>>>>                   <mailto:yaya@dowling.edu>yaya@dowling.edu
>>>>> =====================================================
>>>>> On Fri, 7 Jan 2011, Bollinger, John C wrote:
>>>>>>  On Friday, January 07, 2011 3:14 PM, Herbert J. Bernstein wrote:
>>>>>>>  We seem not to be communicating effectively.
>>>>>  >>
>>>>>>>  What I am asking for is an _existing_, supported treble quote
>>>>>>> specification
>>>>>>>  from an _existing_ language with _existing_ documentation and
>>>>>>>  _existing_ software as an alternative to the Python specification,
>>>>>>>  documentation and software to which we all have access, that is
>>>>>>> being
>>>>>>>  proposed as an alternative
>>>>>>>  to what Ralf has proposed.
>>>>>>  Thank you for that clarification.  You are right, I didn't understand
>>>>>>  what you were asking for.
>>>>>>  I hope this will likewise clarify my position: I reject the premise
>>>>>> that
>>>>>>  the system we choose must meet those criteria, and I oppose adopting
>>>>>> the
>>>>>>  full Python syntax and semantics.
>>>>>>>  The Python specification is available at
>>>>>>> <http://docs.python.org/reference/index.html>http://docs.python.org/reference/index.html
>>>>>>>  with the lexical analysis at
>>>>>>> <http://docs.python.org/reference/lexical_analysis.html>http://docs.python.org/reference/lexical_analysis.html
>>>>>>  Thanks, though that is exactly what I was looking at already.  It
>>>>>> leaves
>>>>>>  several details unclear, some of which I discussed in previous
>>>>>> messages.
>>>>>>  Hence, I consider it slightly short of a *full* specification.  It
>>>>>> does,
>>>>>>  however, provide my grounds for opposing adoption of that scheme for
>>>>>>  CIF.
>>>>>>>  The complete source code and binaries are available at:
>>>>>>  Unless you propose to append a particular set of Python sources to
>>>>>> the
>>>>>>  CIF specification as a reference, I have no interest in perusing the
>>>>>>  source code to seek answers to such questions of detail as I have.
>>>>>>  Furthermore, I would oppose adding such an appendix on the grounds
>>>>>> that
>>>>>>  it would be exceedingly difficult to use to resolve questions such as
>>>>>>  mine.
>>>>>>  I am likewise unwilling to rely on the behavior the python binary
>>>>>> that
>>>>>>  happens to be installed on my computer to answer them.  If the
>>>>>> correct
>>>>>>  behavior is not documented independent of the program then there is
>>>>>> no
>>>>>>  particular reason to trust that it won't change in future versions,
>>>>>> or
>>>>>>  that any particular implementation is correct or bug-free.
>>>>>>  Regards,
>>>>>>  John
>>>>>>  --
>>>>>>  John C. Bollinger, Ph.D.
>>>>>>  Department of Structural Biology
>>>>>>  St. Jude Children's Research Hospital
>>>>>>  Email Disclaimer:
>>>>>> <http://www.stjude.org/emaildisclaimer>www.stjude.org/emaildisclaimer
>>>>>>  _______________________________________________
>>>>>>  ddlm-group mailing list
>>>>>>  <mailto:ddlm-group@iucr.org>ddlm-group@iucr.org
>>>>>> <http://scripts.iucr.org/mailman/listinfo/ddlm-group>http://scripts.iucr.org/mailman/listinfo/ddlm-group
>>>>> _______________________________________________
>>>>> ddlm-group mailing list
>>>>> <mailto:ddlm-group@iucr.org>ddlm-group@iucr.org
>>>>> <http://scripts.iucr.org/mailman/listinfo/ddlm-group>http://scripts.iucr.org/mailman/listinfo/ddlm-group
>>>>> _______________________________________________
>>>>> ddlm-group mailing list
>>>>> ddlm-group@iucr.org
>>>>> http://scripts.iucr.org/mailman/listinfo/ddlm-group
>>>> --
>>>> =====================================================
>>>>  Herbert J. Bernstein, Professor of Computer Science
>>>>    Dowling College, Kramer Science Center, KSC 121
>>>>         Idle Hour Blvd, Oakdale, NY, 11769
>>>>                  +1-631-244-3035
>>>>                  yaya@dowling.edu
>>>> =====================================================
>>>> _______________________________________________
>>>> ddlm-group mailing list
>>>> ddlm-group@iucr.org
>>>> http://scripts.iucr.org/mailman/listinfo/ddlm-group
>>> --
>>> T +61 (02) 9717 9907
>>> F +61 (02) 9717 3145
>>> M +61 (04) 0249 4148
>>> _______________________________________________
>>> ddlm-group mailing list
>>> ddlm-group@iucr.org
>>> http://scripts.iucr.org/mailman/listinfo/ddlm-group
> _______________________________________________
> ddlm-group mailing list
> ddlm-group@iucr.org
> http://scripts.iucr.org/mailman/listinfo/ddlm-group

T +61 (02) 9717 9907
F +61 (02) 9717 3145
M +61 (04) 0249 4148
ddlm-group mailing list

Reply to: [list | sender only]
International Union of Crystallography

Scientific Union Member of the International Science Council (admitted 1947). Member of CODATA, the ISC Committee on Data. Partner with UNESCO, the United Nations Educational, Scientific and Cultural Organization in the International Year of Crystallography 2014.

International Science Council Scientific Freedom Policy

The IUCr observes the basic policy of non-discrimination and affirms the right and freedom of scientists to associate in international scientific activity without regard to such factors as ethnic origin, religion, citizenship, language, political stance, gender, sex or age, in accordance with the Statutes of the International Council for Science.