[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Reply to: [list | sender only]
Re: [ddlm-group] Use of elides in strings
- To: Group finalising DDLm and associated dictionaries <ddlm-group@iucr.org>
- Subject: Re: [ddlm-group] Use of elides in strings
- From: Nick Spadaccini <nick@csse.uwa.edu.au>
- Date: Tue, 24 Nov 2009 15:30:29 +0800
- Authentication-Results: postfix;
- In-Reply-To: <279aad2a0911231800g6c26bdaancdd4a38fecebbb7a@mail.gmail.com>
It appears to me that we have spent far too long on a syntactic issue which can be avoided 99.9999% of the time. Quite simply given the 5 ways to delimit strings, it is next to impossible to get a situation where you cannot choose one of those to make the problem go away. I think the RCSB systematically avoid it by choosing "ab'cd" 'ab"cd' ;ab'"cd ; But now we additionally have """ and ''' to choose from, making it even easier. So I propose in line with James' position there is NO eliding of terminator character at the CIF2 syntax level. ALL elides in the string are assumed to be user specific encoding (say TeX, IUCr \greek) which can be resolved at the dictionary level. This necessarily means NO terminator character can appear in a string delimited by the same terminator character. You will need to choose a different terminator character. That is No " in "strings" No ' in 'strings' No """ in """strings""" (but separable individual and doublet " are allowed) No ''' in '''strings''' (but separable individual and doublet ' are allowed) EVERYTHING in the string is returned as raw (except the initiating and terminating character). The only time you will not be able to encode anything in a delimited string is when you want to include ' " """ ''' and \n; in the one string. The likelihood of that is almost zero, unless you may want to include a CIF within a CIF (a silly thing to do IMHO). In that case the contents can be encoded in a dictionary driven way. I suggest it be declared as a BASE64 type and then all the syntactic ambiguity disappears. Problem solved! No need to elide because of CIF2 syntax rules all elides are user driven, contents are returned raw. As for Herbs comment in a recent email what about line-folding, then the same holds. That is NOT a lexer issue and it has nothing to do with the parser, everything is read literally and returned raw and what to do with it is promulgated to the downstream application. Straw vote - No elides of terminator strings as described above - Nick On 24/11/09 10:00 AM, "James Hester" <jamesrhester@gmail.com> wrote: > OK, my rewritten voting proposal appears to be an abject failure. Let > me repeat 1 as clearly as possible > > 1. Should CIF2 allow elision of terminator characters? In other > words, should we make it possible to include <quote> as a normal > character in a <quote> delimited string? > > Herbert: It's difficult to understand how to rephrase things if it is > not clear where exactly the problem lies. > > Joe: good point about double backslash. Consider this added to proposal (a). > > Before we discuss (2) precisely, can we agree to use the following > abstract model and terminology for CIF2 file parsing and dictionary > application? If not, please indicate your alternative. > > 1. A CIF lexer separates a CIF file into tokens according to the CIF2 > syntax specification only, that is, this process cannot be altered by > DDL directives. > > 2. A CIF parser accepts the tokens from the lexer. CIF parsers can be > modelled as performing at least the following actions with these > tokens: > (i) assignment of datavalue to dataname > (ii) grouping looped datanames into a set > (iii) assigning looped datavalues to the appropriate dataname and packet > (iv) editing datavalues according to the syntax specification if > this has not been performed in the lexer (e.g. stripping enclosing > quotes, removing elides) > > 3. DDL dictionaries operate on and refer to the datavalues and > datanames returned by the CIF parser after (2). They have no ability > to influence the lexing process, or the parsing actions listed above > (in particular the datavalue editing). > > 4. The 'string value' or 'value' of a token is that value returned by > the parser in (2). In particular, this is the value that: > (i) may be checked against regular expressions in the dictionary; > (ii) is accessed by dREL expressions; > (iii) is returned by dREL expressions; > (iv) is referred to in dictionary descriptive text; > (v) may be passed to client routines for further editing; > (vi) may be passed to external applications > > [Side note: in other words the parser returns the CIF "infoset" and > the dictionaries refer to the CIF "infoset", but we haven't been > talking in those terms so I've been more explicit]. > > So my voting question (2) is: should the 'string value' of a token > referred to in (4) include the eliding characters? > > > On Tue, Nov 24, 2009 at 10:57 AM, Joe Krahn <krahn@niehs.nih.gov> wrote: >> A few points to consider: >> >> James Hester wrote: >> ... >>> 2. Character(s) used to indicate elision should be part of the string value >> This does not specify where the elision character should be stripped. It >> could be done by the parser or the dictionary-level code. The rule only >> refers to the final string for the final output text, right? >> >>> >>> Now for the specifics: >>> >>> 3. Which of the following elision proposals do you support (more than one >>> OK)? >>> >>> Proposal (a) (intended to correspond to Nick's) >>> (i) A character which would otherwise be interpreted as a delimiter >>> is elided by immediately preceding it with a reverse solidus. >>> (ii) Otherwise a reverse solidus in the string has no special >>> lexical significance. >>> >>> Proposal (b) >>> (i) The combinations <reverse solidus><quote> or a <reverse >>> solidus><double quote> always signify <quote> and <double quote> >>> respectively, regardless of the delimiter used in a particular string. >>> (ii) The combinations in (i) elide the <quote> or <double quote> >>> character where that character would otherwise terminate the string >>> (iii) Apart from (i) and (ii), the reverse solidus has no special >>> significance >>> (iv) If not used as the string delimiter, <quote> or <double quote> >>> when not preceded by <reverse solidus> represent themselves. >> >> In both forms <reverse solidus><reverse solidus> should also be defined >> in order to allow a literal string that ends in <reverse solidus>. For >> example, a single <reverse solidus> character has to be written as "\\", >> to avoid eliding the close quote. >> >> Joe Krahn >> _______________________________________________ >> ddlm-group mailing list >> ddlm-group@iucr.org >> http://scripts.iucr.org/mailman/listinfo/ddlm-group >> > > cheers Nick -------------------------------- Associate Professor N. Spadaccini, PhD School of Computer Science & Software Engineering The University of Western Australia t: +61 (0)8 6488 3452 35 Stirling Highway f: +61 (0)8 6488 1089 CRAWLEY, Perth, WA 6009 AUSTRALIA w3: www.csse.uwa.edu.au/~nick MBDP M002 CRICOS Provider Code: 00126G e: Nick.Spadaccini@uwa.edu.au _______________________________________________ ddlm-group mailing list ddlm-group@iucr.org http://scripts.iucr.org/mailman/listinfo/ddlm-group
Reply to: [list | sender only]
- Follow-Ups:
- Re: [ddlm-group] Use of elides in strings (Joe Krahn)
- Re: [ddlm-group] Use of elides in strings (James Hester)
- References:
- Re: [ddlm-group] Use of elides in strings (James Hester)
- Prev by Date: Re: [ddlm-group] Use of elides in strings
- Next by Date: Re: [ddlm-group] Use of elides in strings
- Prev by thread: Re: [ddlm-group] Use of elides in strings
- Next by thread: Re: [ddlm-group] Use of elides in strings
- Index(es):