[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Reply to: [list | sender only]
Re: [ddlm-group] Searching for a compromise on eliding
- To: Group finalising DDLm and associated dictionaries <ddlm-group@iucr.org>
- Subject: Re: [ddlm-group] Searching for a compromise on eliding
- From: "Herbert J. Bernstein" <yaya@bernstein-plus-sons.com>
- Date: Fri, 25 Feb 2011 08:22:49 -0500 (EST)
- In-Reply-To: <324958.57741.qm@web87003.mail.ird.yahoo.com>
- References: <AANLkTi=bEDjCpJgyuB07q1FBFZjA_jbG=4jgLsXEvw4g@mail.gmail.com><324958.57741.qm@web87003.mail.ird.yahoo.com>
Dear Colleagues, I support both of Simon's suggestions: 1. Add the elides of P-prime to the strings delimited by the single quote and the strings delimited by the double quote; and 2. Review all currently proposed changes to ensure things have not become "messy" To help in understanding P-prime and Simon's first suggestion, and thereby to help in excuting Simon's second suggestion, here is where to find the Python 2.7 lexical analysis and elides: http://docs.python.org/reference/lexical_analysis.html Please note a very important difference between the Python semantics and those of C: Unlike Standard C, all unrecognized escape sequences are left in the string unchanged, i.e., the backslash is left in the string. (This behavior is useful when debugging: if an escape sequence is mistyped, the resulting output is more easily recognized as broken), so under P-prime and Simon's proposal 1, the full list of recognized elides is: \newline ignored \\ backslash \' single quote \" double quote \a ASCII Bell (BEL) \b ASCII Backspace (BS) \f ASCII Formfeed (FF) \n ASCII Linefeed (LF) \r ASCII Carriage Return (CR) \t ASCII Horizontab Tab (TAB) \v ASCII Vertical Tab (VT) \ooo Character with octal value ooo (1-3 octal digits) \xhh Character with hex value hh (2 hex digits) Note that hexadecimal and octal escapes denote the byte with the given value; it is not necessary that the byte encodes a character in the source character set. In deference to Simon's second suggestion, please note that this differs from Python 3 handling of un-prefixed treble quotes in 2 ways: 1. Python 3 adds \N{name} referencing names in the Unicode database, as well as adding \uxxxx and \Uxxxxxxxx giving hex values for unicode code points 2. The hexadecimal and octal escapes encode the unicode character at the code point. I suggest we stay with the 2.7 version. Regards, Herbert ===================================================== Herbert J. Bernstein, Professor of Computer Science Dowling College, Kramer Science Center, KSC 121 Idle Hour Blvd, Oakdale, NY, 11769 +1-631-244-3035 yaya@dowling.edu ===================================================== On Fri, 25 Feb 2011, SIMON WESTRIP wrote: > If there is acceptance of P', logic suggests that the same > approach should be taken towards the single-quoted strings, > e.g. a user might question: > > I can do this """C\"""", so why can't I do this: "C\"" ? > > This would then just leave the semicolon-delimited fields as > the means to store 'raw' strings. > > This may be 'maximally desruptive', but CIF2 is already distinct > from CIF1 and will require conversion of e.g. "C\"" in any case > (the latter is valid CIF1). > > Basically, I worry that the compromise is starting to make CIF look > a bit 'messy'; perhaps all the changes should be reviewed... > > Cheers > > Simon > > > ____________________________________________________________________________ > From: James Hester <jamesrhester@gmail.com> > To: ddlm-group <ddlm-group@iucr.org> > Sent: Friday, 25 February, 2011 2:50:59 > Subject: [ddlm-group] Searching for a compromise on eliding > > Dear DDLm-group, > > I think we have all had a decent chance to argue our case for > Proposals P, F and F'. I have also been in small side discussions > with Ralf and John W. Their points of view can be summarised as > follows: > (i) Behaviour of triple-quoted strings will be too confusing unless > Python behaviour is followed (Ralf) > (ii) There is considerable criticism of CIF in the macromolecular > community because of idiosyncratic behaviour, particularly concerning > quoting. We should therefore stick to accepted standards as much as > possible (John W) > > For John W and Ralf these points outweigh any of the disadvantages of > Proposal P, and so Proposal P remains their first choice. Proposal P > is therefore the first choice of 3 out of 5 COMCIFS voters, and the > last choice of the other two (I would rank it worse than doing > nothing, actually). I note that non-voting members are uniformly > opposed to Proposal P. > > I therefore want to try to seek some common middle ground in the hope > that I can find a proposal that could be at least as acceptable as > Proposal P to Ralf and/or Herbert and/or John W. > > Consider the following four new proposals - P-prime, Q, G and null: > > * Proposal P-prime: triple-quoted strings are treated as for Python > 2.7. No Unicode or raw strings are defined (ie no strings starting > u""" or r"""). > > I interpret John W and Ralf's position to be that they would be able > to support this proposal as the preferred choice, as our syntax would > still be entirely consistent with Python. This proposal is a > considerable improvement on Proposal P, because the dangers of raw > strings are taken out of the equation, and the Unicode database is no > longer a dependency. We are still left with a whole bunch of (frankly > pointless) elides, leading to Proposal Q: > > * Proposal Q: As for Proposal P-prime, with the following changes: > (1) Only <backslash><delimiter> and <backslash><backslash> when it > precedes <backslash><delimiter> are recognised escape sequences at the > syntactical level > (2) A DDLm string type, e.g. "CText", is defined in com_val.dic for > which the remaining escape sequences have the meaning assigned to them > by the Python 2.7 standard. mmCIF and related domains can standardise > their definitions on this string type and derivatives, making the > above division between syntax and dictionary invisible to users and > programmers in their domain. > > * Proposal G: Proposal F', but with a different delimiter > > Ralf has indicated that he actually thinks Proposal F' is best, but > only if the delimiters are not going to be confused with Python > delimiters. I interpret John W's position to be that he would not > support such a change in delimiters as that would simply make CIF even > more idiosyncratic. Anyway, any such replacement delimiter would need > to be multi-character, easy to type and unlikely to occur as the first > characters in CIF1 datavalues. We would also need to reduce the > characterset of non-delimited CIF2 strings to exclude any such > delimiters. Ideas? > > * Null proposal: do nothing as we can't agree > > I think I could support Proposal Q as an acceptable fallback from F', > and if somebody can find sensible delimiters for Proposal G that works > for me as well. The preferred treatment for backslash rich text for > Proposals P,P' and Q will necessarily be semicolon-delimited strings. > > James. > -- > T +61 (02) 9717 9907 > F +61 (02) 9717 3145 > M +61 (04) 0249 4148 > _______________________________________________ > ddlm-group mailing list > ddlm-group@iucr.org > http://scripts.iucr.org/mailman/listinfo/ddlm-group > >
_______________________________________________ ddlm-group mailing list ddlm-group@iucr.org http://scripts.iucr.org/mailman/listinfo/ddlm-group
Reply to: [list | sender only]
- Follow-Ups:
- Re: [ddlm-group] Searching for a compromise on eliding (Herbert J. Bernstein)
- References:
- [ddlm-group] Searching for a compromise on eliding (James Hester)
- Re: [ddlm-group] Searching for a compromise on eliding (SIMON WESTRIP)
- Prev by Date: Re: [ddlm-group] Searching for a compromise on eliding
- Next by Date: Re: [ddlm-group] Searching for a compromise on eliding. .
- Prev by thread: Re: [ddlm-group] Searching for a compromise on eliding
- Next by thread: Re: [ddlm-group] Searching for a compromise on eliding
- Index(es):