[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Reply to: [list | sender only]
[ddlm-group] Use of elides in strings
- To: ddlm-group <ddlm-group@iucr.org>
- Subject: [ddlm-group] Use of elides in strings
- From: James Hester <jamesrhester@gmail.com>
- Date: Thu, 19 Nov 2009 15:58:11 +1100
We need to figure out the behaviour of elides. This was previously discussed in a thread entitled "The alphabet of non-delimited strings", especially in messages around Oct 16th. The behaviour advocated by Nick is for both the eliding and elided character to be returned from the parser. The behaviour I would prefer is for the eliding character to disappear; it should itself be elided if it is to remain in the string. To summarize Nick's and Herbert's arguments from the emails dated Fri Oct 16, 2009 at 6:22AM and subsequently 1. We don't interpret elides because we don't know what algorithm to use (i.e. it might be a greek character sequence) 2. The elide simply signals that the lexer should not interpret the following character My counter-proposal is similar to Simon's original expectation: if the elide character is really eliding a syntactically significant character (i.e. a terminator character or an elide character), the elide sequence is replaced by the single character. I counter the above arguments as follows: (a) The profusion of algorithms for backslash processing is irrelevant. We can interpret the elides because the only algorithm that has any relevance at the parser level is the simple <backslash><character> -> <character>. All other potential uses belong to higher levels. If the higher levels require a <backslash><quote>, that is created by writing <backslash><backslash><backslash><quote> in the on-disk string. (b) The profusion of algorithms for backslash processing means that we *must* remove ambiguity by removing the eliding character during processing; otherwise, an application can't tell if it is e.g. looking at an escaped prime or an acute accent without applying ugly heuristics. Note also that a caller of a CIF reading program doesn't currently need to know what the particular string delimiting character was for a given string value; in order to make a guess at what the backslash might mean, it would often need to know this. It appears that Nick is describing Python raw string behaviour, and I am describing Python 'cooked' string behaviour. Note for the following paragraph from docs.python.org/reference/lexical_analysis.html#strings: When an 'r' or 'R' prefix is present, a character following a backslash is included in the string without change, and all backslashes are left in the string. For example, the string literal r"\n" consists of two characters: a backslash and a lowercase 'n'. String quotes can be escaped with a backslash, but the backslash remains in the string; for example, r"\"" is a valid string literal consisting of two characters: a backslash and a double quote; r"\" is not a valid string literal (even a raw string cannot end in an odd number of backslashes). Specifically, a raw string cannot end in a single backslash (since the backslash would escape the following quote character). Note also that a single backslash followed by a newline is interpreted as those two characters as part of the string, not as a line continuation. Note that raw strings cannot end in a backslash, so I would consider them slightly less expressive than cooked strings, which can express everything. I would challenge Nick et. al. to explain what the advantage of keeping the eliding character in the datavalue is, keeping in mind that programs like CIFtbx and PyCIFRW and several others aim to hide CIF syntax from their users (as a service), and this proposal appears to want to expose a confusing part of it to them. Some questions we toolbox maintainers will need to ask if this goes through: Do you handle escaping any strings passed to you for output? How do you know if the caller has done the escaping already, or not? Do you really expect the calling software to work out whether it wants a single or double or triple quote delimited string? Isn't that the service provided by your software? What are they (not) paying you for, anyway? -- T +61 (02) 9717 9907 F +61 (02) 9717 3145 M +61 (04) 0249 4148 _______________________________________________ ddlm-group mailing list ddlm-group@iucr.org http://scripts.iucr.org/mailman/listinfo/ddlm-group
Reply to: [list | sender only]
- Follow-Ups:
- Re: [ddlm-group] Use of elides in strings (Nick Spadaccini)
- Prev by Date: Re: [ddlm-group] What we have resolved so far
- Next by Date: Re: [ddlm-group] Use of elides in strings
- Prev by thread: Re: [ddlm-group] Handling single string values longer than maximumline length
- Next by thread: Re: [ddlm-group] Use of elides in strings
- Index(es):