Discussion List Archives

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [ddlm-group] Vote on moving elide discussion to COMCIFS. .. .

Hi Herbert: if you are concerned about resources, then Proposal F or
F' represent the least drain on those resources (if not 15 minutes,
considerably less than has been devoted to this discussion).  Proposal
P requires considerably more implementation time for everybody except
those who already use the Python parser to parse CIF files (nobody
does this, of course, because the grammars are different).  So: please
justify your statement below that Ralf's proposal will require "less
effort in software development".  To help your effort estimate, I can
see the following necessary 5 steps, which would require on average
the order of 5 minutes to be competitive with Proposals F or F':

(1) Adjustments to lexer to recognise the r""", ur""" and u""" string delimiters
(2) Adjustments to lexer to recognise <backslash><delimiter> as an
escape (can't use Python code for this)
(3) Installation and/or preparation of Unicode code point name
database for use in (4), including managing distribution if necessary
(4) Replacement of defined elide sequences for each string type in
triple-quoted unicode and plain strings with their Unicode character
equivalents, running left to right
(5) Adjustment of any application code that fails on character values
less than 0x20 (e.g. ASCII BEL).

You may prefer the following sequence:
(1) and (2) as for (1),(2) above - you can't leverage Python for this
as you have a CIF, not Python, grammar
(3) prepare a wrapper for your particular development environment
around the CPython code for triple quoted strings, keeping track of
the type of string (u,r etc.) and if necessary adjusting the string
storage mechanism and internal encoding.  Note that you will need to
*find* the Python code for triple-quoted string processing first.
(4) insert calls to the relevant CPython wrapper code at appropriate
points in your parser
(5) make sure that you include all dependencies of the CPython code
when distributing your code (e.g. Unicode database)

Note also that the internal CPython representation of the string may
need further manipulation in the wrapper, given the split between
Unicode/non-Unicode strings in Python and various encodings that may
be in use.

On Tue, Feb 22, 2011 at 11:01 AM, Herbert J. Bernstein
<yaya@bernstein-plus-sons.com> wrote:
> Dear Simon,
>
>   Thank you for your substantive response.  I grant that there
> is no way to know how many crystallographers use any particular
> programming language directly, but those who have to implement
> the software need to make CIF2 and DDLm a reality are very
> likely to be reasonably familiar with Python, and the Python
> GUI, IDLE, provide a very convenient test environment to
> see what some particular quoted string unwinds to.
>
>   I thinks John W.'s point deserves particularly careful
> consideration.  With what is going on right now in the U.S.
> Congress those of us in the US at least are likely to be
> very short of funds and resources that require funds for the
> next few years.  Anything we do to leverage existing technologies
> helps in surviving a very difficult period for doing any science
> at all.  That applies both to CIF developers and to users.
> We need to make things as easy and inexpensive as possible
> consistent with getting done what we need to get done.
>
>   That is why I think your F-type proposal is a reasonable
> fall-back if it is otherwise impossible to get agreement
> on using Python for now.  It is not terribly inconsistent
> with Python and at least leverages a portion of the CIF1.1
> software effort.  It well may end up as the final compromise,
> but the current budget crunch on the US makes Ralf's proposal,
> which, I believe will require less effort in software development,
> more attractive to me.
>
>   I, too, would prefer that we had opted for a lexical definition
> for CIF2 that did not require any reformatting of strings.
> Fortunately we now seem to have agreed to continue support
> for CIF1.1 DDL1 and DDL2 data files, so the "maximally
> disruptive" approach adopted for CIF2 is less problematic
> than it might otherwise have been.  Having chosen to require
> the reformatting of CIF1.1 strings for them to be acceptable
> as CIF2, the question become one of how to deal with the
> difficult cases.  Having some form of the triple quotes
> allows those difficult cases to be handled easily, and using the
> well-documented and software-supported python approach Ralf
> suggested will, I believe, minimize the overall level of confusion
> in this process.
>
>   In any case, thank you very much for clarifying why you prefer
> proposal F to P.
>
>
>   Regards,
>     Herbert
>
> P.S.  If the current U.S. House budget resolution goes through,
> we are not talking about delays in new grants, we are talking
> about major recisions in current grants, major layoffs at
> synchrotrons and labs, and an end to scientific careers for
> many young people.  (The proposed cuts for the U.S. Department
> of Energy Office of Science are about 20% from current spending
> levels for the entire year, coming 6 months into the year.  That
> means effective budget cuts of 30 - 40%). Let us hope that it
> does not happen, but anything we can do to conserve resources
> in infrastructure efforts would be the right thing to do.
>
> Yes, I am being alarmist.  There are times when being alarmist
> is the right thing to do.  I think that this is one of those
> times.
>
> At 10:54 PM +0000 2/21/11, SIMON WESTRIP wrote:
>>Attempting to address Herbert's issues:
>>
>>"such informal
>>descriptions are never as reliable as an actual implementation,
>>in particular one that's been around for many years and is used
>>by millions of people."  (Ralf)
>>
>>What proportion of those millions of people are regular CIF users?
>>(By 'user' I'm talking about end-users rather than programmers.)
>>
>>   "meaningful adoption of DDLm/CIF2 will require embracing
>>and leveraging existing technologies as much as possible." (John W.)
>>
>>True enough, but I'm not convinced that adoption of one programming
>>language's syntax for
>>just one means of representing a CIF data value is going to make
>>much difference
>>(a python programmer will still have to read values delimited by the
>>other tokens).
>>
>>"I find it [counter-intuitive] and unproductive to adopt something
>>that looks very much like the python treble quoted
>>string but which follows confusingly different rules." (HJB)
>>
>>As a CIF user familiar with CIF1, the F-type proposal is so close to the
>>existing line-folding semantics that I doubt it will be any more
>>confusing than
>>that protocol (which I suspect many users are unaware of).
>>More confusing (counter-intuitive) is the fact that by using
>>treble-quoted delimiters, the entire data value may have to be reformatted.
>>
>>Cheers
>>
>>Simon
>>
>>
>>
>>From: Herbert J. Bernstein <yaya@bernstein-plus-sons.com>
>>To: Group finalising DDLm and associated dictionaries <ddlm-group@iucr.org>
>>Sent: Monday, 21 February, 2011 22:11:20
>>Subject: Re: [ddlm-group] Vote on moving elide discussion to COMCIFS. .. .
>>
>>Dear John B.,
>>
>>   Thank you, that was very helpful.  To summarize those messages,
>>a majority on COMCIFS made a proposal to make the treble-quoted
>>strings agree with those of Python.  The reasons given were:
>>
>>"such informal
>>descriptions are never as reliable as an actual implementation,
>>in particular one that's been around for many years and is used
>>by millions of people."  (Ralf)
>>
>>   "meaningful adoption of DDLm/CIF2 will require embracing
>>and leveraging existing technologies as much as possible." (John W.)
>>
>>"I find it [counter-intuitive] and unproductive to adopt something
>>that looks very much like the python treble quoted
>>string but which follows confusingly different rules." (HJB)
>>
>>The responses you cite did not seem to address those issues.  Was
>>there a discussion on those issues that I missed?
>>
>>Regards,
>>     Herbert
>>
>>
>>
>>
>>At 3:32 PM -0600 2/21/11, Bollinger, John C wrote:
>>>Dear Herbert,
>>>
>>>On Monday, February 21, 2011 2:35 PM, you wrote:
>>>>     Other than my own messages, could you point me to where there
>>>>was a discussion of the actual proposal Ralf made, rather than
>>>>of variations and interpretations, but of the actual wording
>>>>change Ralf proposed for the CIF2 document?  I cannot seem
>>>>to find that.  That wording seemed/seems pretty sensible to
>>>>me.
>>>
>>>For reference, the message to the COMCIFS list in which Ralf
>>>proposed his wording change is archived here:
>>><http://www.iucr.org/__data/iucr/lists/comcifs-l/msg00500.html>http://www.iucr.org/__data/iucr/lists/comcifs-l/msg00500.html
>>>
>>>Some messages on the DDLm list, other than your own, in which Ralf's
>>>proposal is directly discussed include these:
>>>
>>><http://www.iucr.org/__data/iucr/lists/ddlm-group/msg00899.html>http://www.iucr.org/__data/iucr/lists/ddlm-group/msg00899.html
>>><http://www.iucr.org/__data/iucr/lists/ddlm-group/msg00901.html>http://www.iucr.org/__data/iucr/lists/ddlm-group/msg00901.html
>>><http://www.iucr.org/__data/iucr/lists/ddlm-group/msg00904.html>http://www.iucr.org/__data/iucr/lists/ddlm-group/msg00904.html
>>><http://www.iucr.org/__data/iucr/lists/ddlm-group/msg00906.html>http://www.iucr.org/__data/iucr/lists/ddlm-group/msg00906.html
>>><http://www.iucr.org/__data/iucr/lists/ddlm-group/msg00921.html>http://www.iucr.org/__data/iucr/lists/ddlm-group/msg00921.html
>>>
>>>Some of those also discuss alternatives, but all of them discuss
>>>Ralf's proposal, a.k.a. proposal P.  I probably missed some, and of
>>>course your own comments in favor of proposal P are not represented.
>>>
>>>Moreover, it distorts the (meta-)discussion to ignore commentary
>>>about alternative proposals.  The existence and characteristics of
>>>alternatives to Ralf's proposal are relevant to any decision about
>>>it.  That the discussion shifted to focusing on alternatives is
>>>natural given that most participants in the discussion disfavored
>>>proposal P.
>>>
>>>I hope this helps.
>>>
>>>
>>>Regards,
>>>
>>>John
>>>
>>>--
>>>John C. Bollinger, Ph.D.
>>>Department of Structural Biology
>>>St. Jude Children's Research Hospital
>>>
>>>
>>>
>>>Email Disclaimer:
>>><http://www.stjude.org/emaildisclaimer>www.stjude.org/emaildisclaimer
>>>
>>>_______________________________________________
>>>ddlm-group mailing list
>>><mailto:ddlm-group@iucr.org>ddlm-group@iucr.org
>>><http://scripts.iucr.org/mailman/listinfo/ddlm-group>http://scripts.iucr.org/mailman/listinfo/ddlm-group
>>
>>
>>--
>>=====================================================
>>   Herbert J. Bernstein, Professor of Computer Science
>>     Dowling College, Kramer Science Center, KSC 121
>>         Idle Hour Blvd, Oakdale, NY, 11769
>>
>>                   +1-631-244-3035
>>                   <mailto:yaya@dowling.edu>yaya@dowling.edu
>>=====================================================
>>_______________________________________________
>>ddlm-group mailing list
>><mailto:ddlm-group@iucr.org>ddlm-group@iucr.org
>><http://scripts.iucr.org/mailman/listinfo/ddlm-group>http://scripts.iucr.org/mailman/listinfo/ddlm-group
>>
>>
>>_______________________________________________
>>ddlm-group mailing list
>>ddlm-group@iucr.org
>>http://scripts.iucr.org/mailman/listinfo/ddlm-group
>
>
> --
> =====================================================
>  Herbert J. Bernstein, Professor of Computer Science
>    Dowling College, Kramer Science Center, KSC 121
>         Idle Hour Blvd, Oakdale, NY, 11769
>
>                  +1-631-244-3035
>                  yaya@dowling.edu
> =====================================================
> _______________________________________________
> ddlm-group mailing list
> ddlm-group@iucr.org
> http://scripts.iucr.org/mailman/listinfo/ddlm-group
>



-- 
T +61 (02) 9717 9907
F +61 (02) 9717 3145
M +61 (04) 0249 4148
_______________________________________________
ddlm-group mailing list
ddlm-group@iucr.org
http://scripts.iucr.org/mailman/listinfo/ddlm-group


Reply to: [list | sender only]
International Union of Crystallography

Scientific Union Member of the International Science Council (admitted 1947). Member of CODATA, the ISC Committee on Data. Partner with UNESCO, the United Nations Educational, Scientific and Cultural Organization in the International Year of Crystallography 2014.

International Science Council Scientific Freedom Policy

The IUCr observes the basic policy of non-discrimination and affirms the right and freedom of scientists to associate in international scientific activity without regard to such factors as ethnic origin, religion, citizenship, language, political stance, gender, sex or age, in accordance with the Statutes of the International Council for Science.