[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Reply to: [list | sender only]
Re: [ddlm-group] Use of elides in strings
- To: Group finalising DDLm and associated dictionaries <ddlm-group@iucr.org>
- Subject: Re: [ddlm-group] Use of elides in strings
- From: James Hester <jamesrhester@gmail.com>
- Date: Mon, 23 Nov 2009 16:04:14 +1100
- In-Reply-To: <alpine.BSF.2.00.0911210813280.11915@epsilon.pair.com>
- References: <C72C423A.12515%nick@csse.uwa.edu.au><4B06DEAF.4070109@niehs.nih.gov><alpine.BSF.2.00.0911201441550.25803@epsilon.pair.com><279aad2a0911201545m22547e50i39df8f165c1c340e@mail.gmail.com><4B0744F9.3040907@niehs.nih.gov><279aad2a0911210437v53726196p7ffd0fa9a3e1cee8@mail.gmail.com><alpine.BSF.2.00.0911210813280.11915@epsilon.pair.com>
Dear All, As before, I maintain my position that we should abandon eliding completely. I examine here the proposition that all elide processing is performed at a higher level, where one might expect that different behaviours can be logically separated. Before doing this analysis, note the following: (1) If the meaning of <elide><terminator> in a string received from the lexer is ambiguous, something akin to at least the minimal approach suggested by Nick of mechanically adding/removing one <elide> character from before every <terminator> character is necessary in order to reliably lift the ambiguity. This may be done at the lexer level, as we originally proposed, or indeed at the dictionary level. Regardless of where it is done, the raw string on disk will have extra <elide> characters in those situations where <elide><terminator> does not mean <terminator>. As Nick said in a later email, cutting and pasting will all the same not work in this case. As a concrete example, <backslash><quote> in an IUCr 'legacy' string may mean <acute accent> or it may mean <quote>, but by inserting an extra backslash before those combinations that mean <acute accent>, we can remove ambiguity. (2) In the approach of (1), if the dictionary level doesn't know what the particular terminator character was, it has no way of knowing which character sequences it has to remove the <elide>s from: before all the <quote>s, or before all the <double quotes>? So the lexer will need to pass the particular string delimiter character used to the dictionary level. Alternatively, we can specify that all potential terminator characters are always escaped, even if that particular string has different delimiters. In either case, we are adding significant additional complexity to our specification. Now to Herbert's email: > Let us consider James' example. He is actually making the case > for _not_ removing the reverse-solidus from a string at the > lexical level. > > xxxx<backslash><quote>elxxxx > > or to be more specific > > abcd\'efgh > > and we are presented with the question of ho should the > dictionary interpret that string. > > If we have a string intended to be part of the modern pythonesque > world, then I would expect the data element to have been typed > in a way that says we should read the string as > > abcd'efgh > > If we have a string that is a legacy from a CIF 1 file with > IUCr type-setting codes, I would expect the data element to > have beentyped in a way that says we should read the string as > abcd{e with an acute accent)fgh My point was that *both* readings are possible in a *single* string because, as far as I know, the IUCr currently accepts a plain <quote> character as meaning <quote>. Thus there is ambiguity in the interpretation, thus we need some scheme to disambiguate these uses. > Anything the lexer does to remove the reverse-solidus is > going to disfavor one intepretation or the other. Not disfavour, simply separate lexical and semantic functions. > By moving these two interpretations one level up to two > different utility routines, we gain much more use from > a common lexer and nobody loses any functionality. To repeat: we cannot separate these interpretations into two different routines/dictionary types, because both interpretations are possible in a single string. To take this further: what about strings for which only one meaning of <elide><terminator> is possible, that meaning is not <terminator> (because that reduces trivially to the minimalist proposal), and <terminator> cannot appear apart from in the sequence <elide><terminator>? Can any of you produce a string type from anywhere (computer language, legacy CIF, whatever) for which this is true? If not, I would suggest that leaving handling of elides to the dictionary gains us nothing, at the cost of additional complexity and confusion among users, as Nick points out in a later email. Note that it is reasonable to suppose that if a language has a special meaning for <elide><terminator>, that meaning exists in order to escape the ordinary meaning of <terminator>, which must therefore also exist in that same language. I rest my case that there is no advantage now or ever in leaving elide treatment to the dictionary level because (a) all elide treatment will require differences between on-disk and actual string value (b) complexity is added due to the need to either pass information about string delimiters to the dictionary level, or elide all potential delimiters in all strings. -- T +61 (02) 9717 9907 F +61 (02) 9717 3145 M +61 (04) 0249 4148 _______________________________________________ ddlm-group mailing list ddlm-group@iucr.org http://scripts.iucr.org/mailman/listinfo/ddlm-group
Reply to: [list | sender only]
- References:
- Re: [ddlm-group] Use of elides in strings (Nick Spadaccini)
- Re: [ddlm-group] Use of elides in strings (Joe Krahn)
- Re: [ddlm-group] Use of elides in strings (Herbert J. Bernstein)
- Re: [ddlm-group] Use of elides in strings (James Hester)
- Re: [ddlm-group] Use of elides in strings (Joe Krahn)
- Re: [ddlm-group] Use of elides in strings (James Hester)
- Re: [ddlm-group] Use of elides in strings (Herbert J. Bernstein)
- Prev by Date: Re: [ddlm-group] Use of elides in strings
- Next by Date: Re: [ddlm-group] Use of elides in strings
- Prev by thread: Re: [ddlm-group] Use of elides in strings
- Next by thread: Re: [ddlm-group] Use of elides in strings
- Index(es):