Discussion List Archives

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [ddlm-group] Vote on moving elide discussion to COMCIFS. .. .

Dear James,

   Thank you.  That was very helpful.  I for one, find the python
treble quote clearer than the F proposal and that comes from personal
experience with having done several implementations of the CIF
line folding protocol upon which proposal F is based.  It is not
impossibly difficult, but each time I have done it, whether in
C or Fortran, it has taken me a lot more than 15 minutes to get right,
and I am not certain I have the code for the interaction among
the bracketed constructs, strings and elides done right yet.
For me it would be a relief to lift the python lexical code,
even if I have to do some C to C or C to Fortran conversion.  Clearly
your experiences with this type of code are different than mine.

   I find John W.'s position very clear.  He seems to be trying to
minimize the trauma of the DDLm and CIF2 conversions for the PDB, just as
I am trying to minimize the trauma for imgCIF.  Both minimizing changes
and levering existing technologies help in achieving those goals.
That there is some tension between those constraints is natural.
That does not mean they should not be weighed in making decisions.
As in most things, it is a matter of achieving the right balance.

   Now that Fortran has a well-defined C-interface, I am strongly
tempted to use C-code for some of the trickier parts of CIF2, but
I won't know if I will need to yield to that temptation until
we have a solid CIF2 spec and I see how far I get into the
next round of changes to CIFtbx sticking to pure fortran.  That
being said, I don't see any special problems in Fortran in handling
the character elides and UTF-8 strings from the Python treble quote
definition that I have not already handled for the line-folding
protocol, and I would really appreciate being able to test my
code against IDLE's interpretation of the same strings.

   In any case, if Simon's proposal is adopted, I agree that it would
be helpful to clearly inform people of the differences between
this treble quote spec and the one in Python.

   Now for your two examples of embedded elides of quotes:

<start> I have no idea what the last characters of this string are\"<finish>

is, internally, as a C-string

I have no idea what the last characters of this string are"\0

<start> Does this string have two\""" or three internal quotes?<finish>

is, internally as a C-string

Does this string have two""" or three internal quotes?\0

I settled that by simply cranking up IDLE and doing:

>>>  print """I have no idea what the last characters of this string 
>>>are\"""" I have no idea what the last characters of this string 
>>>are" >>> print """Does this string have two\""" or three internal 
>>>quotes?""" Does this string have two""" or three internal quotes?

As you well know, having IDLE around is a big help.

   Thank you again for taking the time to clarify your position
on Ralf's proposal.  I think I now understand why you prefer Simon's
proposal.

   Regards,
     Herbert





At 12:11 PM +1100 2/22/11, James Hester wrote:
>Herbert: I've indicated below where I think you will find discussion
>of the points you raise.
>
>On Tue, Feb 22, 2011 at 9:11 AM, Herbert J. Bernstein
><yaya@bernstein-plus-sons.com> wrote:
>>  Dear John B.,
>>
>>    Thank you, that was very helpful.  To summarize those messages,
>>  a majority on COMCIFS made a proposal to make the treble-quoted
>>  strings agree with those of Python.  The reasons given were:
>>
>>  "such informal
>>  descriptions are never as reliable as an actual implementation,
>>  in particular one that's been around for many years and is used
>>  by millions of people."  (Ralf)
>
>The potential reliability or otherwise of Ralf's proposal compared to
>others on the table was addressed by myself at
>http://www.iucr.org/__data/iucr/lists/ddlm-group/msg00922.html
>
>In that and subsequent messages I pointed out that Proposal F is a
>whole lot simpler and therefore easier to understand and implement
>than the Python proposal, hence favouring the Python version on the
>basis of superior reliability and clearer specification is not
>reasonable.  Note my justification of 15 minutes programming time for
>Proposal F at http://www.iucr.org/__data/iucr/lists/ddlm-group/msg00929.html
>
>>   "meaningful adoption of DDLm/CIF2 will require embracing
>>  and leveraging existing technologies as much as possible." (John W.)
>
>I too find John W's position difficult to decipher - on the one hand
>he favours minimal changes to CIF syntax, on the other leveraging
>existing technologies.  I would be cautious in counting on his support
>for or against the Python proposal.
>
>Note the following message from John B which touches on the
>meaningfulness or otherwise of "leveraging" existing technologies:
>http://www.iucr.org/__data/iucr/lists/ddlm-group/msg00906.html
>
>I also wrote on this issue, but can't find the message right now.  In
>a nutshell, "leveraging" seems to be shorthand for "improving
>compatibility and reducing specification and implementation time by
>reusing somebody else's work".  In this particular case (Proposal F),
>the specification has been done and the implementation will take 15
>minutes (a simple search and replace, no change to syntax as such).  I
>estimate that implementing the Python alternative would take at least
>5 times that long, in the favourable case that you have included the
>Python interpreter or parser in your code and consider that Ralf's
>original text is sufficiently precise.  If you can't plug your code
>into the Python interpreter or parser, you are forced to reimplement
>the Python semantics from scratch, including finding some way to
>access the Unicode name database on all distribution platforms, so I
>would estimate *at least* a day's work for every CIF parser that is
>written, not to mention the complexity is such that some mistakes may
>be made.  How long do you think it would take you to implement the
>Python proposal in Fortran, Herbert, assuming that you already have
>implemented triple-quoted strings?
>
>>  "I find it [counter-intuitive] and unproductive to adopt something
>>  that looks very much like the python treble quoted
>>  string but which follows confusingly different rules." (HJB)
>
>This hasn't been addressed in the discussion as far as I recall, and
>Ralf has said the same thing as you to me privately.  How much value
>you assign to this point really depends on how likely you think it is
>that confusion will arise in the real world of CIF programmers and
>users.  I personally think that the most important attribute of
>triple-quoted strings is that they go on until the next triple quote.
>What elides are defined is something I would tend to check in the
>relevant standard to find out.  In addition, the fact that CIF is a
>data file and Python is a programming language means that the contexts
>are sufficiently different to minimise confusion.  Did you know that
>"Unix" is also a tupperware-like product?
>
>We can add a big bold sentence to the Proposal F text to state "Unlike
>Python, no other elide sequences are defined" if that would allay your
>concerns on this point.
>
>>  The responses you cite did not seem to address those issues.  Was
>>  there a discussion on those issues that I missed?
>
>One technical issue with Proposal P that has not been resolved is how
>a CIF application is supposed to interpret the sequence
><backslash><double quote> when encountered in a string returned from
>the parser.  Is this sequence:
>(a) a terminator elide sequence that was left in a raw string, so
>corresponds to <double quote>?
>(b) something with meaning for the application so should be
><backslash><double quote>?
>
>Please therefore advise how a CIF application will disambiguate the
>following string content from a Proposal P parser:
>
><start> I have no idea what the last characters of this string are\"<finish>
><start> Does this string have two\""" or three internal quotes?<finish>
>
>James
>
>>  Regards,
>>     Herbert
>>
>>
>>
>>
>>  At 3:32 PM -0600 2/21/11, Bollinger, John C wrote:
>  >>Dear Herbert,
>>>
>>>On Monday, February 21, 2011 2:35 PM, you wrote:
>>>>     Other than my own messages, could you point me to where there
>>>>was a discussion of the actual proposal Ralf made, rather than
>>>>of variations and interpretations, but of the actual wording
>>>>change Ralf proposed for the CIF2 document?  I cannot seem
>>>>to find that.  That wording seemed/seems pretty sensible to
>>>>me.
>>>
>>>For reference, the message to the COMCIFS list in which Ralf
>>>proposed his wording change is archived here:
>>>http://www.iucr.org/__data/iucr/lists/comcifs-l/msg00500.html
>>>
>>>Some messages on the DDLm list, other than your own, in which Ralf's
>>>proposal is directly discussed include these:
>>>
>>>http://www.iucr.org/__data/iucr/lists/ddlm-group/msg00899.html
>>>http://www.iucr.org/__data/iucr/lists/ddlm-group/msg00901.html
>>>http://www.iucr.org/__data/iucr/lists/ddlm-group/msg00904.html
>>>http://www.iucr.org/__data/iucr/lists/ddlm-group/msg00906.html
>>>http://www.iucr.org/__data/iucr/lists/ddlm-group/msg00921.html
>>>
>>>Some of those also discuss alternatives, but all of them discuss
>>>Ralf's proposal, a.k.a. proposal P.  I probably missed some, and of
>>>course your own comments in favor of proposal P are not represented.
>>>
>>>Moreover, it distorts the (meta-)discussion to ignore commentary
>>>about alternative proposals.  The existence and characteristics of
>>>alternatives to Ralf's proposal are relevant to any decision about
>>>it.  That the discussion shifted to focusing on alternatives is
>>>natural given that most participants in the discussion disfavored
>>>proposal P.
>>>
>>>I hope this helps.
>>>
>>>
>>>Regards,
>>>
>>>John
>>>
>>>--
>>>John C. Bollinger, Ph.D.
>>>Department of Structural Biology
>>>St. Jude Children's Research Hospital
>>>
>>>
>>>
>>>Email Disclaimer:  www.stjude.org/emaildisclaimer
>>>
>>>_______________________________________________
>>>ddlm-group mailing list
>  >>ddlm-group@iucr.org
>>>http://scripts.iucr.org/mailman/listinfo/ddlm-group
>>
>>
>>  --
>>  =====================================================
>>   Herbert J. Bernstein, Professor of Computer Science
>>     Dowling College, Kramer Science Center, KSC 121
>>          Idle Hour Blvd, Oakdale, NY, 11769
>>
>>                   +1-631-244-3035
>>                   yaya@dowling.edu
>>  =====================================================
>>  _______________________________________________
>>  ddlm-group mailing list
>>  ddlm-group@iucr.org
>>  http://scripts.iucr.org/mailman/listinfo/ddlm-group
>>
>
>
>
>--
>T +61 (02) 9717 9907
>F +61 (02) 9717 3145
>M +61 (04) 0249 4148
>_______________________________________________
>ddlm-group mailing list
>ddlm-group@iucr.org
>http://scripts.iucr.org/mailman/listinfo/ddlm-group


-- 
=====================================================
  Herbert J. Bernstein, Professor of Computer Science
    Dowling College, Kramer Science Center, KSC 121
         Idle Hour Blvd, Oakdale, NY, 11769

                  +1-631-244-3035
                  yaya@dowling.edu
=====================================================
_______________________________________________
ddlm-group mailing list
ddlm-group@iucr.org
http://scripts.iucr.org/mailman/listinfo/ddlm-group

Reply to: [list | sender only]
International Union of Crystallography

Scientific Union Member of the International Science Council (admitted 1947). Member of CODATA, the ISC Committee on Data. Partner with UNESCO, the United Nations Educational, Scientific and Cultural Organization in the International Year of Crystallography 2014.

International Science Council Scientific Freedom Policy

The IUCr observes the basic policy of non-discrimination and affirms the right and freedom of scientists to associate in international scientific activity without regard to such factors as ethnic origin, religion, citizenship, language, political stance, gender, sex or age, in accordance with the Statutes of the International Council for Science.