[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [ddlm-group] Technical issues with Proposal P. .

I have been following this discussion with interest and have learned much about things I never knew existed, such as cooked and raw strings.  Since the relative merits of these are beyond my experience I have stayed out of the discussion.

However, it seeme that the ddlm-group consists primarily (or only?) of software developers and a comment from a software non-developer might be in order.  What gets lost in this discussion is the distinction between CIFs and CIF dictionaries.  If triple strings are desinged primarily for use in dictionaries, then the consitutency of users is limited to those writing software and those writing dictionaries, the first being a small group and the second consisting of no more than one can count on the fingers of one hand.  In this case the complexities of proposal P would be manageable.

If the triple quote delimiter is intended to be used in the CIFs themselves, the situation is very different.  There are hundreds of users, most quite innocent of python and its subtleties (and sometimes innocent of crystallography as well, but that is another matter).  I assume that the additional functionality of triple quotes may be needed with TEX and unicode text formats, though I have no experience with either.  If CIFs containing triple quotes are to be written and read entirely by software and these files are invisible to the user (as e.g., are the files written and read by word processors) then P should present no problem.  However, current practice often involves visually inspecting the CIF and editing it manually, e.g., under current practice, to shorten lines with more than 80 characters as required for submission to Acta Cryst.  Since the CIFs are currently easy to read, many people inspect the CIF directly to check the information on it without the filtering that inevitably occurs when viewed with the aid of a CIF editor.  I can see serious problems arising with proposal P if the use of triple quotes become widely adopted in the CIFs themselves unless we radically change the way in which CIFs are currently used.  Such a change would definitely need to be discussed widely by COMCIFS.


Bollinger, John C wrote:
On Thursday, February 24, 2011 7:51 AM, Herbert J. Bernstein wrote:

  The Python cooked strings are something many people are familiar with.
That is indisputable, but not directly relevant.  What matters in there is how familiar CIF stakeholders are with Python cooked strings.  I daresay that developers as a group are far more familiar with it than general users, but I have no basis for judging what proportion of either subgroup has any familiarity whatever.  However, over the past decade or so I have discovered that personally, I tend to *over*estimate crystallographers' technical proficiency.  Certainly it is not on average what it once was.

Any use of the treble quote is something new to CIF, with implications for both users and developers.

 Use of the straight python versions should reduce the learning curve for both communities and the costs of data conversion for CIF 1.1 data to CIF2.
I see little basis for that evaluation.  Use of the straight Python version would reduce the learning curve for those developers and users who are already proficient (not merely familiar) with Python, but it would increase the curve for everyone else.  As long as we're pulling estimates out of the air, I say that on average, proposal P will increase the learning curve significantly.

Proposal P does not change the difficulty of data conversion in the general sense.  ALL existing well-formed CIF 1.1 delimited data values can be converted to CIF 2.0 by expressing them in semicolon-delimited form.  Existing multi-line values must already be in that form, and require no changes.  To the extent that it is desired specifically to convert CIF 1.1 values to triple-quoted CIF 2.0 form, proposal P will require more changes to lexical values than any other proposal on the table.  It is thus extremely generous even to attribute to it parity with the other alternatives in this regard.

 I don't deny that there can be better ways to do the same thing.  This reminds me of when IBM came up with a better keyboard for computers, shifting a few keys.  It drove everybody nuts, not because there was anything wrong with it, it just was sufficiently different to slow down typing in creased the error rate.  Somebody totally new to typing on a computer keyboard had not problem, but it certainly was not worth the costs involved for people who had established habits.
I accept that adopting Python triple-quote syntax wholesale would be of some benefit to some stakeholders.  It would be an obstacle to many others, however, and a substantial one to developers in particular.  It is and always has been a judgment call, and my judgment remains that the benefits of proposal P would not come close to balancing its costs.

John C. Bollinger, Ph.D.
Department of Structural Biology
St. Jude Children's Research Hospital

Email Disclaimer:  www.stjude.org/emaildisclaimer
ddlm-group mailing list

fn:I.David Brown
org:McMaster University;Brockhouse Institute for Materials Research
adr:;;King St. W;Hamilton;Ontario;L8S 4M1;Canada
title:Professor Emeritus
tel;work:+905 525 9140 x 24710
tel;fax:+905 521 2773

ddlm-group mailing list

Reply to: [list | sender only]