Discussion List Archives

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [ddlm-group] Searching for a compromise on eliding

If at all possible, I would like to try at least a little more to
achieve a consensus.  I am sorry that James' P-prime does not
seem to reach that goal.  I just suggested that COMCIFS have a
skype meeting.  Perhaps this group should have one, as well.  Maybe
we can find common ground verbally and visually -- the latest
version of Skype seems to be capable of video conference calls.


At 9:52 PM +0000 2/26/11, SIMON WESTRIP wrote:
>For what its worth, I summarise my position in the hope that this 
>issue will shortly return to COMCIFS.
>1) I see no reason to abandon use of triple-quotes:
>Python may be alone in its use of triple quotes (?), but it also 
>uses single quotes,
>and in a markedly different way to many other programming languages 
>- so why can't CIF use
>triple-quotes in a different way to python?. Alternatives may be 
>found, but I suspect it will prove
>difficult to agree on any of them (of suggestions so far, John B's 
>is 'visually' confusing and Brian's
>is questionable - a double quote could just delimit a null or empty string?).
>2) For various reasons, I think the P-type proposals are 
>inappropriate for CIF:
>I see little point in repeating my arguments and on the whole I 
>support John B's and James's arguments.
>3) I favour James's F' proposal to my initial F version:
>My version was intended to be minimalist but was formulated in terms 
>that might fit the python model (common escape sequences etc.),
>but, among other considerations, James's version is all that is necessary.
>In conclusion, I would like this group to return an F' proposal to 
>COMCIFS for their consideration.
>Of the active participants in this group, Herbert seems to be in a 
>minority in his wholehearted supported for the
>adoption of python, but as a voting member of COMCIFS, Herbert can 
>obviously influence matters in that forum.
>From: Brian McMahon <bm@iucr.org>
>To: Group finalising DDLm and associated dictionaries <ddlm-group@iucr.org>
>Sent: Saturday, 26 February, 2011 12:55:48
>Subject: Re: [ddlm-group] Searching for a compromise on eliding
>Dear Colleagues
>I have been out of the office all week and largely away from email. I
>apologise for not saying so when I last posted, but I had at the
>time anticipated being able to keep in touch with this conversation.
>On technical grounds, I favour
>F'    - requires least handling of special escapes
>G    - formally equivalent to F'
>You will recognise the first line as a verbatim extract from my
>posting of 18 January in response to the first call for a vote in this
>matter. I have not seen any new technical considerations to change
>my preference for the economy of such a specification. Proposal G has
>two possible disadvantages: the need to construct novel (but ideally
>"natural") new delimiters - would paired double quotes "" suffice?;
>and a break with the existing implementation of """ in Nick's initial
>implementation. I know that we have ascertained that we are not bound
>to retain specific novel syntactic features of that implementation,
>but I see no technical advantage in moving away from it.
>I disfavour Option Q because it introduces what I consider an
>unnecessary domain-specific interpretation of character strings.
>The "domains" involved are not mutually exclusive: in IUCr journals
>we would anticipate handling both core CIFs and mmCIFs, while
>applications such as SHELX work with both small and
>macromolecules. However, that's not strictly a technical
>I found Ralf's intervention of 10 January very persuasive:
>>  In my observation any language that persisted long term has a feature to
>>  escape the closing quote token. Therefore I conjecture it is a small but
>>  vital feature.
>This prompted me to revisit the need for a delimiter escape mechanism
>that would then allow encapsulation of arbitrarily complex strings,
>and thus, for example, remove the need for a new string concatenation
>operator (a requested feature over which I was still rather unhappy).
>In reviewing the discussions, I did note that Ralf's point had already
>been made (by Nick, I think), but its importance was unfortunately
>not appreciated at that time.
>Proposal F'/G therefore address this technical "vital feature" to
>my satisfaction.
>The rest of the discussions seem to pivot around psychology more than
>technical requirements. In my experience, analysis of psychology
>provides useful insight into understanding a historical sequence of
>events, but is rarely successful at predicting the future.
>I prefer, as I stated before, to consider policy based on such
>psychological or social imperatives in the COMCIFS forum, but if it
>helps to indicate here my opinions on the concerns raised, I would say
>the following:
>>  (i) Behaviour of triple-quoted strings will be too confusing unless
>>  Python behaviour is followed (Ralf)
>There is perhaps some opportunity for confusion; but in other areas
>there are similar opportunities: shell file globbing shares some
>syntactic features with regular expression processing. People who
>really work with such systems manage to overcome the confusion - more
>easily when there is a real difference in purpose (filename globbing
>is indeed distinct from regexp processing, just as string delimiting
>in a data file is different from string processing within an
>interpreted program). As John B has pointed out, adoption of a
>proposal P or close variants also has scope for confusion if a user is
>not completely familiar with the version of Python chosen as the
>underlying paradigm.
>>  (ii) There is considerable criticism of CIF in the macromolecular
>>  community because of idiosyncratic behaviour, particularly concerning
>>  quoting.  We should therefore stick to accepted standards as much as
>>  possible (John W)
>John W (and Herbert) are undoubtedly correct in identifying a distaste
>within the macromolecular community for the idiosyncratic CIF formalism.
>But I believe the second sentence is a non sequitur: I am not convinced
>that adoption of a particular syntactic feature from Python is all
>that is needed to persuade that community to embrace CIF with open
>arms. As has been argued several times on this list, the technical
>requirements on a data input parser for CIF are not very great (and by
>opting for "economical" schemes such as James's proposal F' we would tend
>towards minimising them). If the programmers within the macromolecular
>community - many of whom I know to be extraordinarily competent and
>intelligent - do not build CIF applications, I am sure it is because
>they do not see sufficient scientific value in doing so, rather than
>that the complexity or awkwardness of the file format defeats them.
>Or at least I shall persist in believing that.
>Let us take this element of the discussion onto the COMCIFS list,
>preferably on the back of the revised proposal that I encourage James
>to present from the ddlm-group.
>Back to the technical considerations which I believe this group
>should focus on. I consider the most desirable outcome to be a
>clear and clean specification. Proposals F'/G will achieve that
>elegantly. Proposal P has the potential to achieve that (though one
>does need to specify the version of Python and perhaps reconsider the
>handling of Unicode characters), although I still feel that as
>a specification it carries too high a burden for compliance from
>applications developers working outside of a Python framework. I
>would strongly discourage attempts at a compromise that seeks to
>provide a technical solution based on some minimisation of the
>root mean square unhappiness of the members of this group, but that
>ends up with an unstructured mish-mash of features from different
>On Fri, Feb 25, 2011 at 01:50:59PM +1100, James Hester wrote:
>>  Dear DDLm-group,
>>  I think we have all had a decent chance to argue our case for
>>  Proposals P, F and F'.  I have also been in small side discussions
>  > with Ralf and John W.  Their points of view can be summarised as
>>  follows:
>>  (i) Behaviour of triple-quoted strings will be too confusing unless
>>  Python behaviour is followed (Ralf)
>>  (ii) There is considerable criticism of CIF in the macromolecular
>>  community because of idiosyncratic behaviour, particularly concerning
>>  quoting.  We should therefore stick to accepted standards as much as
>>  possible (John W)
>>  For John W and Ralf these points outweigh any of the disadvantages of
>>  Proposal P, and so Proposal P remains their first choice.  Proposal P
>>  is therefore the first choice of 3 out of 5 COMCIFS voters, and the
>>  last choice of the other two (I would rank it worse than doing
>>  nothing, actually).  I note that non-voting members are uniformly
>>  opposed to Proposal P.
>>  I therefore want to try to seek some common middle ground in the hope
>>  that I can find a proposal that could be at least as acceptable as
>>  Proposal P to Ralf and/or Herbert and/or John W.
>>  Consider the following four new proposals - P-prime, Q, G and null:
>>  * Proposal P-prime: triple-quoted strings are treated as for Python
>>  2.7.  No Unicode or raw strings are defined (ie no strings starting
>>  u""" or r""").
>>  I interpret John W and Ralf's position to be that they would be able
>>  to support this proposal as the preferred choice, as our syntax would
>>  still be entirely consistent with Python.  This proposal is a
>>  considerable improvement on Proposal P, because the dangers of raw
>>  strings are taken out of the equation, and the Unicode database is no
>>  longer a dependency.  We are still left with a whole bunch of (frankly
>>  pointless) elides, leading to Proposal Q:
>>  * Proposal Q: As for Proposal P-prime, with the following changes:
>>  (1) Only <backslash><delimiter> and <backslash><backslash> when it
>>  precedes <backslash><delimiter> are recognised escape sequences at the
>>  syntactical level
>>  (2) A DDLm string type, e.g. "CText", is defined in com_val.dic for
>>  which the remaining escape sequences have the meaning assigned to them
>>  by the Python 2.7 standard.  mmCIF and related domains can standardise
>>  their definitions on this string type and derivatives, making the
>>  above division between syntax and dictionary invisible to users and
>>  programmers in their domain.
>>  * Proposal G: Proposal F', but with a different delimiter
>>  Ralf has indicated that he actually thinks Proposal F' is best, but
>>  only if the delimiters are not going to be confused with Python
>>  delimiters.  I interpret John W's position to be that he would not
>>  support such a change in delimiters as that would simply make CIF even
>>  more idiosyncratic.  Anyway, any such replacement delimiter would need
>>  to be multi-character, easy to type and unlikely to occur as the first
>>  characters in CIF1 datavalues.  We would also need to reduce the
>>  characterset of non-delimited CIF2 strings to exclude any such
>>  delimiters.  Ideas?
>>  * Null proposal: do nothing as we can't agree
>>  I think I could support Proposal Q as an acceptable fallback from F',
>>  and if somebody can find sensible delimiters for Proposal G that works
>>  for me as well.  The preferred treatment for backslash rich text for
>>  Proposals P,P' and Q will necessarily be semicolon-delimited strings.
>>  James.
>>  --
>>  T +61 (02) 9717 9907
>>  F +61 (02) 9717 3145
>>  M +61 (04) 0249 4148
>>  _______________________________________________
>>  ddlm-group mailing list
>>  <mailto:ddlm-group@iucr.org>ddlm-group@iucr.org
>ddlm-group mailing list
>ddlm-group mailing list

  Herbert J. Bernstein, Professor of Computer Science
    Dowling College, Kramer Science Center, KSC 121
         Idle Hour Blvd, Oakdale, NY, 11769

ddlm-group mailing list

Reply to: [list | sender only]
International Union of Crystallography

Scientific Union Member of the International Science Council (admitted 1947). Member of CODATA, the ISC Committee on Data. Partner with UNESCO, the United Nations Educational, Scientific and Cultural Organization in the International Year of Crystallography 2014.

International Science Council Scientific Freedom Policy

The IUCr observes the basic policy of non-discrimination and affirms the right and freedom of scientists to associate in international scientific activity without regard to such factors as ethnic origin, religion, citizenship, language, political stance, gender, sex or age, in accordance with the Statutes of the International Council for Science.