[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Reply to: [list | sender only]
Re: [ddlm-group] Searching for a compromise on eliding
- To: Group finalising DDLm and associated dictionaries <ddlm-group@iucr.org>
- Subject: Re: [ddlm-group] Searching for a compromise on eliding
- From: "Herbert J. Bernstein" <yaya@bernstein-plus-sons.com>
- Date: Fri, 25 Feb 2011 11:38:59 -0500 (EST)
- In-Reply-To: <alpine.BSF.2.00.1102250746270.85678@epsilon.pair.com>
- References: <AANLkTi=bEDjCpJgyuB07q1FBFZjA_jbG=4jgLsXEvw4g@mail.gmail.com><324958.57741.qm@web87003.mail.ird.yahoo.com><alpine.BSF.2.00.1102250746270.85678@epsilon.pair.com>
I just checked the Uncode 5.2.0 names "database" that Python 2.7 uses. I has 21829 names. There is a well-documented Python reference implementation of an API for translation at: http://docs.python.org/library/unicodedata.html If nobody has done it yet, at first glance it does not look too difficult to make matching LGPL'd C/C++/Java APIs. I am not saying it is a trivial task, but is does look doable as part of making a full UTF8 support package for CIF2. Would having that make a Python 3 version of proposal P-prime acceptable? -- Herbert ===================================================== Herbert J. Bernstein, Professor of Computer Science Dowling College, Kramer Science Center, KSC 121 Idle Hour Blvd, Oakdale, NY, 11769 +1-631-244-3035 yaya@dowling.edu ===================================================== On Fri, 25 Feb 2011, Herbert J. Bernstein wrote: > Dear Colleagues, > > I support both of Simon's suggestions: > > 1. Add the elides of P-prime to the strings delimited by > the single quote and the strings delimited by the double quote; > and > > 2. Review all currently proposed changes to ensure things have > not become "messy" > > To help in understanding P-prime and Simon's first suggestion, and > thereby to help in excuting Simon's second suggestion, here > is where to find the Python 2.7 lexical analysis and elides: > > http://docs.python.org/reference/lexical_analysis.html > > Please note a very important difference between the Python semantics > and those of C: > > Unlike Standard C, all unrecognized escape sequences are left in the string > unchanged, i.e., the backslash is left in the string. (This behavior is > useful when debugging: if an escape sequence is mistyped, the resulting > output is more easily recognized as broken), so under P-prime and Simon's > proposal 1, the full list of recognized elides is: > > \newline ignored > \\ backslash > \' single quote > \" double quote > \a ASCII Bell (BEL) > \b ASCII Backspace (BS) > \f ASCII Formfeed (FF) > \n ASCII Linefeed (LF) > \r ASCII Carriage Return (CR) > \t ASCII Horizontab Tab (TAB) > \v ASCII Vertical Tab (VT) > \ooo Character with octal value ooo (1-3 octal digits) > \xhh Character with hex value hh (2 hex digits) > > Note that hexadecimal and octal escapes denote the byte with the given value; > it is not necessary that the byte encodes a character in the source character > set. > > In deference to Simon's second suggestion, please note that this differs from > Python 3 handling of un-prefixed treble quotes in 2 ways: > > 1. Python 3 adds \N{name} referencing names in the Unicode database, as well > as adding \uxxxx and \Uxxxxxxxx giving hex values for unicode code points > 2. The hexadecimal and octal escapes encode the unicode character at > the code point. > > I suggest we stay with the 2.7 version. > > Regards, > Herbert > > ===================================================== > Herbert J. Bernstein, Professor of Computer Science > Dowling College, Kramer Science Center, KSC 121 > Idle Hour Blvd, Oakdale, NY, 11769 > > +1-631-244-3035 > yaya@dowling.edu > ===================================================== > > On Fri, 25 Feb 2011, SIMON WESTRIP wrote: > >> If there is acceptance of P', logic suggests that the same >> approach should be taken towards the single-quoted strings, >> e.g. a user might question: >> >> I can do this """C\"""", so why can't I do this: "C\"" ? >> >> This would then just leave the semicolon-delimited fields as >> the means to store 'raw' strings. >> >> This may be 'maximally desruptive', but CIF2 is already distinct >> from CIF1 and will require conversion of e.g. "C\"" in any case >> (the latter is valid CIF1). >> >> Basically, I worry that the compromise is starting to make CIF look >> a bit 'messy'; perhaps all the changes should be reviewed... >> >> Cheers >> >> Simon >> >> >> ____________________________________________________________________________ >> From: James Hester <jamesrhester@gmail.com> >> To: ddlm-group <ddlm-group@iucr.org> >> Sent: Friday, 25 February, 2011 2:50:59 >> Subject: [ddlm-group] Searching for a compromise on eliding >> >> Dear DDLm-group, >> >> I think we have all had a decent chance to argue our case for >> Proposals P, F and F'. I have also been in small side discussions >> with Ralf and John W. Their points of view can be summarised as >> follows: >> (i) Behaviour of triple-quoted strings will be too confusing unless >> Python behaviour is followed (Ralf) >> (ii) There is considerable criticism of CIF in the macromolecular >> community because of idiosyncratic behaviour, particularly concerning >> quoting. We should therefore stick to accepted standards as much as >> possible (John W) >> >> For John W and Ralf these points outweigh any of the disadvantages of >> Proposal P, and so Proposal P remains their first choice. Proposal P >> is therefore the first choice of 3 out of 5 COMCIFS voters, and the >> last choice of the other two (I would rank it worse than doing >> nothing, actually). I note that non-voting members are uniformly >> opposed to Proposal P. >> >> I therefore want to try to seek some common middle ground in the hope >> that I can find a proposal that could be at least as acceptable as >> Proposal P to Ralf and/or Herbert and/or John W. >> >> Consider the following four new proposals - P-prime, Q, G and null: >> >> * Proposal P-prime: triple-quoted strings are treated as for Python >> 2.7. No Unicode or raw strings are defined (ie no strings starting >> u""" or r"""). >> >> I interpret John W and Ralf's position to be that they would be able >> to support this proposal as the preferred choice, as our syntax would >> still be entirely consistent with Python. This proposal is a >> considerable improvement on Proposal P, because the dangers of raw >> strings are taken out of the equation, and the Unicode database is no >> longer a dependency. We are still left with a whole bunch of (frankly >> pointless) elides, leading to Proposal Q: >> >> * Proposal Q: As for Proposal P-prime, with the following changes: >> (1) Only <backslash><delimiter> and <backslash><backslash> when it >> precedes <backslash><delimiter> are recognised escape sequences at the >> syntactical level >> (2) A DDLm string type, e.g. "CText", is defined in com_val.dic for >> which the remaining escape sequences have the meaning assigned to them >> by the Python 2.7 standard. mmCIF and related domains can standardise >> their definitions on this string type and derivatives, making the >> above division between syntax and dictionary invisible to users and >> programmers in their domain. >> >> * Proposal G: Proposal F', but with a different delimiter >> >> Ralf has indicated that he actually thinks Proposal F' is best, but >> only if the delimiters are not going to be confused with Python >> delimiters. I interpret John W's position to be that he would not >> support such a change in delimiters as that would simply make CIF even >> more idiosyncratic. Anyway, any such replacement delimiter would need >> to be multi-character, easy to type and unlikely to occur as the first >> characters in CIF1 datavalues. We would also need to reduce the >> characterset of non-delimited CIF2 strings to exclude any such >> delimiters. Ideas? >> >> * Null proposal: do nothing as we can't agree >> >> I think I could support Proposal Q as an acceptable fallback from F', >> and if somebody can find sensible delimiters for Proposal G that works >> for me as well. The preferred treatment for backslash rich text for >> Proposals P,P' and Q will necessarily be semicolon-delimited strings. >> >> James. >> -- >> T +61 (02) 9717 9907 >> F +61 (02) 9717 3145 >> M +61 (04) 0249 4148 >> _______________________________________________ >> ddlm-group mailing list >> ddlm-group@iucr.org >> http://scripts.iucr.org/mailman/listinfo/ddlm-group >> >
_______________________________________________ ddlm-group mailing list ddlm-group@iucr.org http://scripts.iucr.org/mailman/listinfo/ddlm-group
Reply to: [list | sender only]
- Follow-Ups:
- Re: [ddlm-group] Searching for a compromise on eliding. . (Bollinger, John C)
- References:
- [ddlm-group] Searching for a compromise on eliding (James Hester)
- Re: [ddlm-group] Searching for a compromise on eliding (SIMON WESTRIP)
- Re: [ddlm-group] Searching for a compromise on eliding (Herbert J. Bernstein)
- Prev by Date: Re: [ddlm-group] Searching for a compromise on eliding. .
- Next by Date: Re: [ddlm-group] Searching for a compromise on eliding. .
- Prev by thread: Re: [ddlm-group] Searching for a compromise on eliding
- Next by thread: Re: [ddlm-group] Searching for a compromise on eliding. .
- Index(es):