Discussion List Archives

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [ddlm-group] Eliding in triple-quoted strings: Proposals C andD. .. .. .

The "15 minutes" refers to the time taken to implement Simon's
proposal, given software that already processes triple-quote delimited
strings as per the current CIF2 standard.  You can peruse the code
that I have for lexing CIF2 files at:

http://hg.berlios.de/repos/pycifrw/annotate/3e098b54c97d/pycifrw/YappsStarParser.nw

Line 206 is the regular expression for triple-quoted strings.  Matches
to this regex have their triple quotes stripped and are then processed
as any other datavalue, as you can see from line 347.  Implementing
Simon's proposal therefore only requires expanding the function
"striptriple" to also remove <backslash><newline> characters and
substituting <backslash> for <backslash><backslash>.  In Python this
would be two search-replace regexps, or two lines of code, thus the 15
minute estimate.  Note that the exact mechanism of eliding backslashes
requires a little fine-tuning, in my opinion, to restrict such elision
to only those backslashes that might be relevant for eliding
<newline>.  There may be some small changes to the approach, but my
point remains that whatever you do, implementing the complete Python
behaviour is an order of magnitude more time and complexity.

I'm not sure that this is particularly useful to those not using
yacc/lex type tools, but I think my 15 minute estimate is reasonable
in my case.  I emphasise that the reason it is so quick to do is that
Simon's proposal, and several others, do not require a change to the
lexing logic; the eliding transformations can be carried out after
lexing.  Such magic is only possible for digraph or trigraph delimited
strings.

I do not understand Herbert's thinking about "flagging illegal data
values" below.  Neither I nor anybody else is suggesting that \b or
any of the other Python escape sequences be banned from CIF2 strings.
However, their interpretation as ASCII BEL or LaTeX boldface or poor
man's Unicode character is left to the dictionary to decide.

And finally, enough already with the dire predictions of doom.  I have
suggested that we continue until the end of this month searching for a
solution, and if none is forthcoming, that we leave the currently
approved standard as is.  Ralf has made clear that he would be happy
with a minimal solution, and Herbert has offered a constructive
compromise.  There are no signs of an impasse as yet in our
discussions.  Let us continue for a few weeks to see what we come up
with.

On Sun, Jan 9, 2011 at 12:16 AM, Herbert J. Bernstein
<yaya@bernstein-plus-sons.com> wrote:
> First to the important part -- if James can do the complete
> implementation of the treble quote parsing for CIF2 in 15 minutes,
> then I respectfully request that he do so, and make the
> code available to the rest of us as a template for all
> to follow and understand.
>
> Second, what James sees as negative issues in 1 and 2, I see as postive
> ones, especially for support of imgCIF, but in no case do I understand
> what harm is done to any user or software developer by allowing the
> greater generality of the python treble quote with the raw string
> and the unicode string.  The question seems to come down to how
> early in the parse logic we will be required by the CIF2 standard
> to issue warnings or error messages about "illegal" data values.
> Are we to have a mandatory requirement for flsgging this at the
> lexical level?  Why?  Some of us are going to have to allow for
> the suppression on those errors and warnings to be able to process
> our data (again imgCIF), but even for those who do not have such
> a need, if the treble quote logic returns a string that contains
> something "improper" (e.g. some of the disallowed uniocde values)
> they can still report it.  How is it different from someone who
> is working with a unicode-aware editor who produces a single-quoted
> version of one of those characters?
>
> It is unfortunate that we are having this discussion without Ralf.
> As I feared, we now seem headed towards a decision by this body
> that will require a full re-opened discussion at the COMCIFS level
> and and will further delay CIF2, probably for another 3 years.
> At least we should be able to take a shot at what we really need,
> a face-to-face meeting, in Madrid.
>
> It is a shame.  Ralf really is right.
>
> =====================================================
>  Herbert J. Bernstein, Professor of Computer Science
>   Dowling College, Kramer Science Center, KSC 121
>        Idle Hour Blvd, Oakdale, NY, 11769
>
>                 +1-631-244-3035
>                 yaya@dowling.edu
> =====================================================
>
> On Sat, 8 Jan 2011, James Hester wrote:
>
>> Perhaps I was unclear as to why I am not satisfied with Ralf's
>> proposal.  I object because:
>>
>> (1) It defines a large number of unnecessary escapes (I listed 10),
>> some of which are not allowed CIF characters;
>> (2) It defines both raw and unicode strings, which is excessive for
>> our requirements
>> (3) The sequences <backslash><quote> and <backslash><apostrophe> are
>> ambiguous in raw strings: are they elide sequences, or are they
>> intended for the string consumer?
>>
>> Perhaps the supporters of the Python approach would like to explain
>> why these objections are immaterial, especially given that there are
>> already about 6 significantly simpler proposals on the table to which
>> these objections do not apply.
>>
>> I do not perceive any advantage in adopting the Python approach
>> wholesale.  For example, Simon's minimalist suggestion would be much
>> easier to implement, interpret and document than the complete Python
>> scheme - I estimate about 15 minutes of coding time.
>>
>> On Sat, Jan 8, 2011 at 9:32 AM, Bollinger, John C
>> <John.Bollinger@stjude.org> wrote:
>>>
>>> On Friday, January 07, 2011 3:14 PM, Herbert J. Bernstein wrote:
>>>
>>>> We seem not to be communicating effectively.
>>>>
>>>> What I am asking for is an _existing_, supported treble quote
>>>> specification
>>>> from an _existing_ language with _existing_ documentation and
>>>> _existing_ software as an alternative to the Python specification,
>>>> documentation and software to which we all have access, that is being
>>>> proposed as an alternative
>>>> to what Ralf has proposed.
>>>
>>> Thank you for that clarification.  You are right, I didn't understand
>>> what you were asking for.
>>>
>>> I hope this will likewise clarify my position: I reject the premise that
>>> the system we choose must meet those criteria, and I oppose adopting the
>>> full Python syntax and semantics.
>>>
>>>> The Python specification is available at
>>>>
>>>> http://docs.python.org/reference/index.html
>>>>
>>>> with the lexical analysis at
>>>>
>>>> http://docs.python.org/reference/lexical_analysis.html
>>>
>>> Thanks, though that is exactly what I was looking at already.  It leaves
>>> several details unclear, some of which I discussed in previous messages.
>>>  Hence, I consider it slightly short of a *full* specification.  It does,
>>> however, provide my grounds for opposing adoption of that scheme for CIF.
>>>
>>>> The complete source code and binaries are available at:
>>>
>>> Unless you propose to append a particular set of Python sources to the
>>> CIF specification as a reference, I have no interest in perusing the source
>>> code to seek answers to such questions of detail as I have.  Furthermore, I
>>> would oppose adding such an appendix on the grounds that it would be
>>> exceedingly difficult to use to resolve questions such as mine.
>>>
>>> I am likewise unwilling to rely on the behavior the python binary that
>>> happens to be installed on my computer to answer them.  If the correct
>>> behavior is not documented independent of the program then there is no
>>> particular reason to trust that it won't change in future versions, or that
>>> any particular implementation is correct or bug-free.
>>>
>>>
>>> Regards,
>>>
>>> John
>>>
>>> --
>>> John C. Bollinger, Ph.D.
>>> Department of Structural Biology
>>> St. Jude Children's Research Hospital
>>>
>>>
>>>
>>>
>>> Email Disclaimer:  www.stjude.org/emaildisclaimer
>>>
>>> _______________________________________________
>>> ddlm-group mailing list
>>> ddlm-group@iucr.org
>>> http://scripts.iucr.org/mailman/listinfo/ddlm-group
>>>
>>
>>
>>
>> --
>> T +61 (02) 9717 9907
>> F +61 (02) 9717 3145
>> M +61 (04) 0249 4148
>> _______________________________________________
>> ddlm-group mailing list
>> ddlm-group@iucr.org
>> http://scripts.iucr.org/mailman/listinfo/ddlm-group
>
> _______________________________________________
> ddlm-group mailing list
> ddlm-group@iucr.org
> http://scripts.iucr.org/mailman/listinfo/ddlm-group
>
>



-- 
T +61 (02) 9717 9907
F +61 (02) 9717 3145
M +61 (04) 0249 4148
_______________________________________________
ddlm-group mailing list
ddlm-group@iucr.org
http://scripts.iucr.org/mailman/listinfo/ddlm-group


Reply to: [list | sender only]
International Union of Crystallography

Scientific Union Member of the International Science Council (admitted 1947). Member of CODATA, the ISC Committee on Data. Partner with UNESCO, the United Nations Educational, Scientific and Cultural Organization in the International Year of Crystallography 2014.

International Science Council Scientific Freedom Policy

The IUCr observes the basic policy of non-discrimination and affirms the right and freedom of scientists to associate in international scientific activity without regard to such factors as ethnic origin, religion, citizenship, language, political stance, gender, sex or age, in accordance with the Statutes of the International Council for Science.