[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [ddlm-group] Searching for a compromise on eliding

If at all possible, I would like to try at least a little more to
achieve a consensus.  I am sorry that James' P-prime does not
seem to reach that goal.  I just suggested that COMCIFS have a
skype meeting.  Perhaps this group should have one, as well.  Maybe
we can find common ground verbally and visually -- the latest
version of Skype seems to be capable of video conference calls.


At 9:52 PM +0000 2/26/11, SIMON WESTRIP wrote:
>For what its worth, I summarise my position in the hope that this 
>issue will shortly return to COMCIFS.
>1) I see no reason to abandon use of triple-quotes:
>Python may be alone in its use of triple quotes (?), but it also 
>uses single quotes,
>and in a markedly different way to many other programming languages 
>- so why can't CIF use
>triple-quotes in a different way to python?. Alternatives may be 
>found, but I suspect it will prove
>difficult to agree on any of them (of suggestions so far, John B's 
>is 'visually' confusing and Brian's
>is questionable - a double quote could just delimit a null or empty string?).
>2) For various reasons, I think the P-type proposals are 
>inappropriate for CIF:
>I see little point in repeating my arguments and on the whole I 
>support John B's and James's arguments.
>3) I favour James's F' proposal to my initial F version:
>My version was intended to be minimalist but was formulated in terms 
>that might fit the python model (common escape sequences etc.),
>but, among other considerations, James's version is all that is necessary.
>In conclusion, I would like this group to return an F' proposal to 
>COMCIFS for their consideration.
>Of the active participants in this group, Herbert seems to be in a 
>minority in his wholehearted supported for the
>adoption of python, but as a voting member of COMCIFS, Herbert can 
>obviously influence matters in that forum.
>From: Brian McMahon <bm@iucr.org>
>To: Group finalising DDLm and associated dictionaries <ddlm-group@iucr.org>
>Sent: Saturday, 26 February, 2011 12:55:48
>Subject: Re: [ddlm-group] Searching for a compromise on eliding
>Dear Colleagues
>I have been out of the office all week and largely away from email. I
>apologise for not saying so when I last posted, but I had at the
>time anticipated being able to keep in touch with this conversation.
>On technical grounds, I favour
>F'    - requires least handling of special escapes
>G    - formally equivalent to F'
>You will recognise the first line as a verbatim extract from my
>posting of 18 January in response to the first call for a vote in this
>matter. I have not seen any new technical considerations to change
>my preference for the economy of such a specification. Proposal G has
>two possible disadvantages: the need to construct novel (but ideally
>"natural") new delimiters - would paired double quotes "" suffice?;
>and a break with the existing implementation of """ in Nick's initial
>implementation. I know that we have ascertained that we are not bound
>to retain specific novel syntactic features of that implementation,
>but I see no technical advantage in moving away from it.
>I disfavour Option Q because it introduces what I consider an
>unnecessary domain-specific interpretation of character strings.
>The "domains" involved are not mutually exclusive: in IUCr journals
>we would anticipate handling both core CIFs and mmCIFs, while
>applications such as SHELX work with both small and
>macromolecules. However, that's not strictly a technical
>I found Ralf's intervention of 10 January very persuasive:
>>  In my observation any language that persisted long term has a feature to
>>  escape the closing quote token. Therefore I conjecture it is a small but
>>  vital feature.
>This prompted me to revisit the need for a delimiter escape mechanism
>that would then allow encapsulation of arbitrarily complex strings,
>and thus, for example, remove the need for a new string concatenation
>operator (a requested feature over which I was still rather unhappy).
>In reviewing the discussions, I did note that Ralf's point had already
>been made (by Nick, I think), but its importance was unfortunately
>not appreciated at that time.
>Proposal F'/G therefore address this technical "vital feature" to
>my satisfaction.
>The rest of the discussions seem to pivot around psychology more than
>technical requirements. In my experience, analysis of psychology
>provides useful insight into understanding a historical sequence of
>events, but is rarely successful at predicting the future.
>I prefer, as I stated before, to consider policy based on such
>psychological or social imperatives in the COMCIFS forum, but if it
>helps to indicate here my opinions on the concerns raised, I would say
>the following:
>>  (i) Behaviour of triple-quoted strings will be too confusing unless
>>  Python behaviour is followed (Ralf)
>There is perhaps some opportunity for confusion; but in other areas
>there are similar opportunities: shell file globbing shares some
>syntactic features with regular expression processing. People who
>really work with such systems manage to overcome the confusion - more
>easily when there is a real difference in purpose (filename globbing
>is indeed distinct from regexp processing, just as string delimiting
>in a data file is different from string processing within an
>interpreted program). As John B has pointed out, adoption of a
>proposal P or close variants also has scope for confusion if a user is
>not completely familiar with the version of Python chosen as the
>underlying paradigm.
>>  (ii) There is considerable criticism of CIF in the macromolecular
>>  community because of idiosyncratic behaviour, particularly concerning
>>  quoting.  We should therefore stick to accepted standards as much as
>>  possible (John W)
>John W (and Herbert) are undoubtedly correct in identifying a distaste
>within the macromolecular community for the idiosyncratic CIF formalism.
>But I believe the second sentence is a non sequitur: I am not convinced
>that adoption of a particular syntactic feature from Python is all
>that is needed to persuade that community to embrace CIF with open
>arms. As has been argued several times on this list, the technical
>requirements on a data input parser for CIF are not very great (and by
>opting for "economical" schemes such as James's proposal F' we would tend
>towards minimising them). If the programmers within the macromolecular
>community - many of whom I know to be extraordinarily competent and
>intelligent - do not build CIF applications, I am sure it is because
>they do not see sufficient scientific value in doing so, rather than
>that the complexity or awkwardness of the file format defeats them.
>Or at least I shall persist in believing that.
>Let us take this element of the discussion onto the COMCIFS list,
>preferably on the back of the revised proposal that I encourage James
>to present from the ddlm-group.
>Back to the technical considerations which I believe this group
>should focus on. I consider the most desirable outcome to be a
>clear and clean specification. Proposals F'/G will achieve that
>elegantly. Proposal P has the potential to achieve that (though one
>does need to specify the version of Python and perhaps reconsider the
>handling of Unicode characters), although I still feel that as
>a specification it carries too high a burden for compliance from
>applications developers working outside of a Python framework. I
>would strongly discourage attempts at a compromise that seeks to
>provide a technical solution based on some minimisation of the
>root mean square unhappiness of the members of this group, but that
>ends up with an unstructured mish-mash of features from different
>On Fri, Feb 25, 2011 at 01:50:59PM +1100, James Hester wrote:
>>  Dear DDLm-group,
>>  I think we have all had a decent chance to argue our case for
>>  Proposals P, F and F'.  I have also been in small side discussions
>  > with Ralf and John W.  Their points of view can be summarised as
>>  follows:
>>  (i) Behaviour of triple-quoted strings will be too confusing unless
>>  Python behaviour is followed (Ralf)
>>  (ii) There is considerable criticism of CIF in the macromolecular
>>  community because of idiosyncratic behaviour, particularly concerning
>>  quoting.  We should therefore stick to accepted standards as much as
>>  possible (John W)
>>  For John W and Ralf these points outweigh any of the disadvantages of
>>  Proposal P, and so Proposal P remains their first choice.  Proposal P
>>  is therefore the first choice of 3 out of 5 COMCIFS voters, and the
>>  last choice of the other two (I would rank it worse than doing
>>  nothing, actually).  I note that non-voting members are uniformly
>>  opposed to Proposal P.
>>  I therefore want to try to seek some common middle ground in the hope
>>  that I can find a proposal that could be at least as acceptable as
>>  Proposal P to Ralf and/or Herbert and/or John W.
>>  Consider the following four new proposals - P-prime, Q, G and null:
>>  * Proposal P-prime: triple-quoted strings are treated as for Python
>>  2.7.  No Unicode or raw strings are defined (ie no strings starting
>>  u""" or r""").
>>  I interpret John W and Ralf's position to be that they would be able
>>  to support this proposal as the preferred choice, as our syntax would
>>  still be entirely consistent with Python.  This proposal is a
>>  considerable improvement on Proposal P, because the dangers of raw
>>  strings are taken out of the equation, and the Unicode database is no
>>  longer a dependency.  We are still left with a whole bunch of (frankly
>>  pointless) elides, leading to Proposal Q:
>>  * Proposal Q: As for Proposal P-prime, with the following changes:
>>  (1) Only <backslash><delimiter> and <backslash><backslash> when it
>>  precedes <backslash><delimiter> are recognised escape sequences at the
>>  syntactical level
>>  (2) A DDLm string type, e.g. "CText", is defined in com_val.dic for
>>  which the remaining escape sequences have the meaning assigned to them
>>  by the Python 2.7 standard.  mmCIF and related domains can standardise
>>  their definitions on this string type and derivatives, making the
>>  above division between syntax and dictionary invisible to users and
>>  programmers in their domain.
>>  * Proposal G: Proposal F', but with a different delimiter
>>  Ralf has indicated that he actually thinks Proposal F' is best, but
>>  only if the delimiters are not going to be confused with Python
>>  delimiters.  I interpret John W's position to be that he would not
>>  support such a change in delimiters as that would simply make CIF even
>>  more idiosyncratic.  Anyway, any such replacement delimiter would need
>>  to be multi-character, easy to type and unlikely to occur as the first
>>  characters in CIF1 datavalues.  We would also need to reduce the
>>  characterset of non-delimited CIF2 strings to exclude any such
>>  delimiters.  Ideas?
>>  * Null proposal: do nothing as we can't agree
>>  I think I could support Proposal Q as an acceptable fallback from F',
>>  and if somebody can find sensible delimiters for Proposal G that works
>>  for me as well.  The preferred treatment for backslash rich text for
>>  Proposals P,P' and Q will necessarily be semicolon-delimited strings.
>>  James.
>>  --
>>  T +61 (02) 9717 9907
>>  F +61 (02) 9717 3145
>>  M +61 (04) 0249 4148
>>  _______________________________________________
>>  ddlm-group mailing list
>>  <mailto:ddlm-group@iucr.org>ddlm-group@iucr.org
>ddlm-group mailing list
>ddlm-group mailing list

  Herbert J. Bernstein, Professor of Computer Science
    Dowling College, Kramer Science Center, KSC 121
         Idle Hour Blvd, Oakdale, NY, 11769

ddlm-group mailing list

Reply to: [list | sender only]