[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [ddlm-group] Use of elides in strings

To: Group finalising DDLm and associated dictionaries <[email protected]>
Subject: Re: [ddlm-group] Use of elides in strings
From: James Hester <[email protected]>
Date: Mon, 23 Nov 2009 16:04:14 +1100
In-Reply-To: <[email protected]>
References: <C72C423A.12515%[email protected]><[email protected]><[email protected]><[email protected]><[email protected]><[email protected]><[email protected]>

Dear All,

As before, I maintain my position that we should abandon eliding
completely. I examine here the proposition that all elide processing
is performed at a higher level, where one might expect that different
behaviours can be logically separated.

Before doing this analysis, note the following:

(1) If the meaning of <elide><terminator> in a string received from
the lexer is ambiguous, something akin to at least the minimal
approach suggested by Nick of mechanically adding/removing one <elide>
character from before every <terminator> character is necessary in
order to reliably lift the ambiguity.  This may be done at the lexer
level, as we originally proposed, or indeed at the dictionary level.
Regardless of where it is done, the raw string on disk will have extra
<elide> characters in those situations where <elide><terminator> does
not mean <terminator>.  As Nick said in a later email, cutting and
pasting will all the same not work in this case.

As a concrete example, <backslash><quote> in an IUCr 'legacy' string
may mean <acute accent> or it may mean <quote>, but by inserting an
extra backslash before those combinations that mean <acute accent>, we
can remove ambiguity.

(2) In the approach of (1), if the dictionary level doesn't know what
the particular terminator character was, it has no way of knowing
which character sequences it has to remove the <elide>s from: before
all the <quote>s, or before all the <double quotes>?  So the lexer
will need to pass the particular string delimiter character used to
the dictionary level.  Alternatively, we can specify that all
potential terminator characters are always escaped, even if that
particular string has different delimiters.  In either case, we are
adding significant additional complexity to our specification.

Now to Herbert's email:

> �Let us consider James' example. �He is actually making the case
> for _not_ removing the reverse-solidus from a string at the
> lexical level.
>
> �xxxx<backslash><quote>elxxxx
>
> or to be more specific
>
> �abcd\'efgh
>
> and we are presented with the question of ho should the
> dictionary interpret that string.
>
> If we have a string intended to be part of the modern pythonesque
> world, then I would expect the data element to have been typed
> in a way that says we should read the string as
>
> �abcd'efgh
>
> If we have a string that is a legacy from a CIF 1 file with
> IUCr type-setting codes, I would expect the data element to
> have beentyped in a way that says we should read the string as
> abcd{e with an acute accent)fgh

My point was that *both* readings are possible in a *single* string
because, as far as I know, the IUCr currently accepts a plain <quote>
character as meaning <quote>.  Thus there is ambiguity in the
interpretation, thus we need some scheme to disambiguate these uses.

> Anything the lexer does to remove the reverse-solidus is
> going to disfavor one intepretation or the other.

Not disfavour, simply separate lexical and semantic functions.

> By moving these two interpretations one level up to two
> different utility routines, we gain much more use from
> a common lexer and nobody loses any functionality.

To repeat: we cannot separate these interpretations into two different
routines/dictionary types, because both interpretations are possible
in a single string.

To take this further: what about strings for which only one meaning of
<elide><terminator> is possible, that meaning is not <terminator>
(because that reduces trivially to the minimalist proposal), and
<terminator> cannot appear apart from in the sequence
<elide><terminator>?  Can any of you produce a string type from
anywhere (computer language, legacy CIF, whatever) for which this is
true?  If not, I would suggest that leaving handling of elides to the
dictionary gains us nothing, at the cost of additional complexity and
confusion among users, as Nick points out in a later email.

Note that it is reasonable to suppose that if a language has a special
meaning for <elide><terminator>, that meaning exists in order to
escape the ordinary meaning of <terminator>, which must therefore also
exist in that same language.

I rest my case that there is no advantage now or ever in leaving elide
treatment to the dictionary level because (a) all elide treatment will
require differences between on-disk and actual string value (b)
complexity is added due to the need to either pass information about
string delimiters to the dictionary level, or elide all potential
delimiters in all strings.

-- 
T +61 (02) 9717 9907
F +61 (02) 9717 3145
M +61 (04) 0249 4148
_______________________________________________
ddlm-group mailing list
[email protected]
http://scripts.iucr.org/mailman/listinfo/ddlm-group

Reply to: [list | sender only]

References:

Re: [ddlm-group] Use of elides in strings (Nick Spadaccini)

Re: [ddlm-group] Use of elides in strings (Joe Krahn)

Re: [ddlm-group] Use of elides in strings (Herbert J. Bernstein)

Re: [ddlm-group] Use of elides in strings (James Hester)

Re: [ddlm-group] Use of elides in strings (Joe Krahn)

Re: [ddlm-group] Use of elides in strings (James Hester)

Re: [ddlm-group] Use of elides in strings (Herbert J. Bernstein)

Prev by Date: Re: [ddlm-group] Use of elides in strings

Next by Date: Re: [ddlm-group] Use of elides in strings

Prev by thread: Re: [ddlm-group] Use of elides in strings

Next by thread: Re: [ddlm-group] Use of elides in strings

Index(es):

Date

Thread

Discussion List Archives

Re: [ddlm-group] Use of elides in strings