[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Reply to: [list | sender only]
Re: [ddlm-group] Searching for a compromise on eliding
- To: Group finalising DDLm and associated dictionaries <ddlm-group@iucr.org>
- Subject: Re: [ddlm-group] Searching for a compromise on eliding
- From: Brian McMahon <bm@iucr.org>
- Date: Sat, 26 Feb 2011 12:55:48 +0000
- In-Reply-To: <AANLkTi=bEDjCpJgyuB07q1FBFZjA_jbG=4jgLsXEvw4g@mail.gmail.com>
- References: <AANLkTi=bEDjCpJgyuB07q1FBFZjA_jbG=4jgLsXEvw4g@mail.gmail.com>
Dear Colleagues I have been out of the office all week and largely away from email. I apologise for not saying so when I last posted, but I had at the time anticipated being able to keep in touch with this conversation. On technical grounds, I favour F' - requires least handling of special escapes G - formally equivalent to F' You will recognise the first line as a verbatim extract from my posting of 18 January in response to the first call for a vote in this matter. I have not seen any new technical considerations to change my preference for the economy of such a specification. Proposal G has two possible disadvantages: the need to construct novel (but ideally "natural") new delimiters - would paired double quotes "" suffice?; and a break with the existing implementation of """ in Nick's initial implementation. I know that we have ascertained that we are not bound to retain specific novel syntactic features of that implementation, but I see no technical advantage in moving away from it. I disfavour Option Q because it introduces what I consider an unnecessary domain-specific interpretation of character strings. The "domains" involved are not mutually exclusive: in IUCr journals we would anticipate handling both core CIFs and mmCIFs, while applications such as SHELX work with both small and macromolecules. However, that's not strictly a technical consideration. I found Ralf's intervention of 10 January very persuasive: > In my observation any language that persisted long term has a feature to > escape the closing quote token. Therefore I conjecture it is a small but > vital feature. This prompted me to revisit the need for a delimiter escape mechanism that would then allow encapsulation of arbitrarily complex strings, and thus, for example, remove the need for a new string concatenation operator (a requested feature over which I was still rather unhappy). In reviewing the discussions, I did note that Ralf's point had already been made (by Nick, I think), but its importance was unfortunately not appreciated at that time. Proposal F'/G therefore address this technical "vital feature" to my satisfaction. === The rest of the discussions seem to pivot around psychology more than technical requirements. In my experience, analysis of psychology provides useful insight into understanding a historical sequence of events, but is rarely successful at predicting the future. I prefer, as I stated before, to consider policy based on such psychological or social imperatives in the COMCIFS forum, but if it helps to indicate here my opinions on the concerns raised, I would say the following: > (i) Behaviour of triple-quoted strings will be too confusing unless > Python behaviour is followed (Ralf) There is perhaps some opportunity for confusion; but in other areas there are similar opportunities: shell file globbing shares some syntactic features with regular expression processing. People who really work with such systems manage to overcome the confusion - more easily when there is a real difference in purpose (filename globbing is indeed distinct from regexp processing, just as string delimiting in a data file is different from string processing within an interpreted program). As John B has pointed out, adoption of a proposal P or close variants also has scope for confusion if a user is not completely familiar with the version of Python chosen as the underlying paradigm. > (ii) There is considerable criticism of CIF in the macromolecular > community because of idiosyncratic behaviour, particularly concerning > quoting. We should therefore stick to accepted standards as much as > possible (John W) John W (and Herbert) are undoubtedly correct in identifying a distaste within the macromolecular community for the idiosyncratic CIF formalism. But I believe the second sentence is a non sequitur: I am not convinced that adoption of a particular syntactic feature from Python is all that is needed to persuade that community to embrace CIF with open arms. As has been argued several times on this list, the technical requirements on a data input parser for CIF are not very great (and by opting for "economical" schemes such as James's proposal F' we would tend towards minimising them). If the programmers within the macromolecular community - many of whom I know to be extraordinarily competent and intelligent - do not build CIF applications, I am sure it is because they do not see sufficient scientific value in doing so, rather than that the complexity or awkwardness of the file format defeats them. Or at least I shall persist in believing that. Let us take this element of the discussion onto the COMCIFS list, preferably on the back of the revised proposal that I encourage James to present from the ddlm-group. === Back to the technical considerations which I believe this group should focus on. I consider the most desirable outcome to be a clear and clean specification. Proposals F'/G will achieve that elegantly. Proposal P has the potential to achieve that (though one does need to specify the version of Python and perhaps reconsider the handling of Unicode characters), although I still feel that as a specification it carries too high a burden for compliance from applications developers working outside of a Python framework. I would strongly discourage attempts at a compromise that seeks to provide a technical solution based on some minimisation of the root mean square unhappiness of the members of this group, but that ends up with an unstructured mish-mash of features from different proposals. Regards Brian On Fri, Feb 25, 2011 at 01:50:59PM +1100, James Hester wrote: > Dear DDLm-group, > > I think we have all had a decent chance to argue our case for > Proposals P, F and F'. I have also been in small side discussions > with Ralf and John W. Their points of view can be summarised as > follows: > (i) Behaviour of triple-quoted strings will be too confusing unless > Python behaviour is followed (Ralf) > (ii) There is considerable criticism of CIF in the macromolecular > community because of idiosyncratic behaviour, particularly concerning > quoting. We should therefore stick to accepted standards as much as > possible (John W) > > For John W and Ralf these points outweigh any of the disadvantages of > Proposal P, and so Proposal P remains their first choice. Proposal P > is therefore the first choice of 3 out of 5 COMCIFS voters, and the > last choice of the other two (I would rank it worse than doing > nothing, actually). I note that non-voting members are uniformly > opposed to Proposal P. > > I therefore want to try to seek some common middle ground in the hope > that I can find a proposal that could be at least as acceptable as > Proposal P to Ralf and/or Herbert and/or John W. > > Consider the following four new proposals - P-prime, Q, G and null: > > * Proposal P-prime: triple-quoted strings are treated as for Python > 2.7. No Unicode or raw strings are defined (ie no strings starting > u""" or r"""). > > I interpret John W and Ralf's position to be that they would be able > to support this proposal as the preferred choice, as our syntax would > still be entirely consistent with Python. This proposal is a > considerable improvement on Proposal P, because the dangers of raw > strings are taken out of the equation, and the Unicode database is no > longer a dependency. We are still left with a whole bunch of (frankly > pointless) elides, leading to Proposal Q: > > * Proposal Q: As for Proposal P-prime, with the following changes: > (1) Only <backslash><delimiter> and <backslash><backslash> when it > precedes <backslash><delimiter> are recognised escape sequences at the > syntactical level > (2) A DDLm string type, e.g. "CText", is defined in com_val.dic for > which the remaining escape sequences have the meaning assigned to them > by the Python 2.7 standard. mmCIF and related domains can standardise > their definitions on this string type and derivatives, making the > above division between syntax and dictionary invisible to users and > programmers in their domain. > > * Proposal G: Proposal F', but with a different delimiter > > Ralf has indicated that he actually thinks Proposal F' is best, but > only if the delimiters are not going to be confused with Python > delimiters. I interpret John W's position to be that he would not > support such a change in delimiters as that would simply make CIF even > more idiosyncratic. Anyway, any such replacement delimiter would need > to be multi-character, easy to type and unlikely to occur as the first > characters in CIF1 datavalues. We would also need to reduce the > characterset of non-delimited CIF2 strings to exclude any such > delimiters. Ideas? > > * Null proposal: do nothing as we can't agree > > I think I could support Proposal Q as an acceptable fallback from F', > and if somebody can find sensible delimiters for Proposal G that works > for me as well. The preferred treatment for backslash rich text for > Proposals P,P' and Q will necessarily be semicolon-delimited strings. > > James. > -- > T +61 (02) 9717 9907 > F +61 (02) 9717 3145 > M +61 (04) 0249 4148 > _______________________________________________ > ddlm-group mailing list > ddlm-group@iucr.org > http://scripts.iucr.org/mailman/listinfo/ddlm-group _______________________________________________ ddlm-group mailing list ddlm-group@iucr.org http://scripts.iucr.org/mailman/listinfo/ddlm-group
Reply to: [list | sender only]
- Follow-Ups:
- Re: [ddlm-group] Searching for a compromise on eliding (SIMON WESTRIP)
- References:
- [ddlm-group] Searching for a compromise on eliding (James Hester)
- Prev by Date: Re: [ddlm-group] Searching for a compromise on eliding. .. .
- Next by Date: Re: [ddlm-group] Searching for a compromise on eliding
- Prev by thread: Re: [ddlm-group] Searching for a compromise on eliding. .
- Next by thread: Re: [ddlm-group] Searching for a compromise on eliding
- Index(es):