[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Reply to: [list | sender only]
Re: [ddlm-group] Use of elides in strings
- To: Nick.Spadaccini@uwa.edu.au, Group finalising DDLm and associated dictionaries <ddlm-group@iucr.org>
- Subject: Re: [ddlm-group] Use of elides in strings
- From: James Hester <jamesrhester@gmail.com>
- Date: Wed, 25 Nov 2009 09:14:18 +1100
- In-Reply-To: <C731AC95.125CB%nick@csse.uwa.edu.au>
- References: <279aad2a0911231800g6c26bdaancdd4a38fecebbb7a@mail.gmail.com><C731AC95.125CB%nick@csse.uwa.edu.au>
I wholeheartedly agree with Nick's suggestion. On Tue, Nov 24, 2009 at 6:30 PM, Nick Spadaccini <nick@csse.uwa.edu.au> wrote: > It appears to me that we have spent far too long on a syntactic issue which > can be avoided 99.9999% of the time. Quite simply given the 5 ways to > delimit strings, it is next to impossible to get a situation where you > cannot choose one of those to make the problem go away. > > I think the RCSB systematically avoid it by choosing > > "ab'cd" > 'ab"cd' > ;ab'"cd > ; > > But now we additionally have """ and ''' to choose from, making it even > easier. > > So I propose in line with James' position there is NO eliding of terminator > character at the CIF2 syntax level. ALL elides in the string are assumed to > be user specific encoding (say TeX, IUCr \greek) which can be resolved at > the dictionary level. > > This necessarily means NO terminator character can appear in a string > delimited by the same terminator character. You will need to choose a > different terminator character. That is > > No " in "strings" > No ' in 'strings' > No """ in """strings""" (but separable individual and doublet " are allowed) > No ''' in '''strings''' (but separable individual and doublet ' are allowed) > > EVERYTHING in the string is returned as raw (except the initiating and > terminating character). > > The only time you will not be able to encode anything in a delimited string > is when you want to include ' " """ ''' and \n; in the one string. The > likelihood of that is almost zero, unless you may want to include a CIF > within a CIF (a silly thing to do IMHO). In that case the contents can be > encoded in a dictionary driven way. I suggest it be declared as a BASE64 > type and then all the syntactic ambiguity disappears. > > Problem solved! No need to elide because of CIF2 syntax rules all elides are > user driven, contents are returned raw. > > As for Herbs comment in a recent email what about line-folding, then the > same holds. That is NOT a lexer issue and it has nothing to do with the > parser, everything is read literally and returned raw and what to do with it > is promulgated to the downstream application. > > Straw vote - No elides of terminator strings as described above - Nick > > > On 24/11/09 10:00 AM, "James Hester" <jamesrhester@gmail.com> wrote: > >> OK, my rewritten voting proposal appears to be an abject failure. Let >> me repeat 1 as clearly as possible >> >> 1. Should CIF2 allow elision of terminator characters? In other >> words, should we make it possible to include <quote> as a normal >> character in a <quote> delimited string? >> >> Herbert: It's difficult to understand how to rephrase things if it is >> not clear where exactly the problem lies. >> >> Joe: good point about double backslash. Consider this added to proposal (a). >> >> Before we discuss (2) precisely, can we agree to use the following >> abstract model and terminology for CIF2 file parsing and dictionary >> application? If not, please indicate your alternative. >> >> 1. A CIF lexer separates a CIF file into tokens according to the CIF2 >> syntax specification only, that is, this process cannot be altered by >> DDL directives. >> >> 2. A CIF parser accepts the tokens from the lexer. CIF parsers can be >> modelled as performing at least the following actions with these >> tokens: >> (i) assignment of datavalue to dataname >> (ii) grouping looped datanames into a set >> (iii) assigning looped datavalues to the appropriate dataname and packet >> (iv) editing datavalues according to the syntax specification if >> this has not been performed in the lexer (e.g. stripping enclosing >> quotes, removing elides) >> >> 3. DDL dictionaries operate on and refer to the datavalues and >> datanames returned by the CIF parser after (2). They have no ability >> to influence the lexing process, or the parsing actions listed above >> (in particular the datavalue editing). >> >> 4. The 'string value' or 'value' of a token is that value returned by >> the parser in (2). In particular, this is the value that: >> (i) may be checked against regular expressions in the dictionary; >> (ii) is accessed by dREL expressions; >> (iii) is returned by dREL expressions; >> (iv) is referred to in dictionary descriptive text; >> (v) may be passed to client routines for further editing; >> (vi) may be passed to external applications >> >> [Side note: in other words the parser returns the CIF "infoset" and >> the dictionaries refer to the CIF "infoset", but we haven't been >> talking in those terms so I've been more explicit]. >> >> So my voting question (2) is: should the 'string value' of a token >> referred to in (4) include the eliding characters? >> >> >> On Tue, Nov 24, 2009 at 10:57 AM, Joe Krahn <krahn@niehs.nih.gov> wrote: >>> A few points to consider: >>> >>> James Hester wrote: >>> ... >>>> 2. Character(s) used to indicate elision should be part of the string value >>> This does not specify where the elision character should be stripped. It >>> could be done by the parser or the dictionary-level code. The rule only >>> refers to the final string for the final output text, right? >>> >>>> >>>> Now for the specifics: >>>> >>>> 3. Which of the following elision proposals do you support (more than one >>>> OK)? >>>> >>>> Proposal (a) (intended to correspond to Nick's) >>>> (i) A character which would otherwise be interpreted as a delimiter >>>> is elided by immediately preceding it with a reverse solidus. >>>> (ii) Otherwise a reverse solidus in the string has no special >>>> lexical significance. >>>> >>>> Proposal (b) >>>> (i) The combinations <reverse solidus><quote> or a <reverse >>>> solidus><double quote> always signify <quote> and <double quote> >>>> respectively, regardless of the delimiter used in a particular string. >>>> (ii) The combinations in (i) elide the <quote> or <double quote> >>>> character where that character would otherwise terminate the string >>>> (iii) Apart from (i) and (ii), the reverse solidus has no special >>>> significance >>>> (iv) If not used as the string delimiter, <quote> or <double quote> >>>> when not preceded by <reverse solidus> represent themselves. >>> >>> In both forms <reverse solidus><reverse solidus> should also be defined >>> in order to allow a literal string that ends in <reverse solidus>. For >>> example, a single <reverse solidus> character has to be written as "\\", >>> to avoid eliding the close quote. >>> >>> Joe Krahn >>> _______________________________________________ >>> ddlm-group mailing list >>> ddlm-group@iucr.org >>> http://scripts.iucr.org/mailman/listinfo/ddlm-group >>> >> >> > > cheers > > Nick > > -------------------------------- > Associate Professor N. Spadaccini, PhD > School of Computer Science & Software Engineering > > The University of Western Australia t: +61 (0)8 6488 3452 > 35 Stirling Highway f: +61 (0)8 6488 1089 > CRAWLEY, Perth, WA 6009 AUSTRALIA w3: www.csse.uwa.edu.au/~nick > MBDP M002 > > CRICOS Provider Code: 00126G > > e: Nick.Spadaccini@uwa.edu.au > > > > > _______________________________________________ > ddlm-group mailing list > ddlm-group@iucr.org > http://scripts.iucr.org/mailman/listinfo/ddlm-group > -- T +61 (02) 9717 9907 F +61 (02) 9717 3145 M +61 (04) 0249 4148 _______________________________________________ ddlm-group mailing list ddlm-group@iucr.org http://scripts.iucr.org/mailman/listinfo/ddlm-group
Reply to: [list | sender only]
- Follow-Ups:
- Re: [ddlm-group] Use of elides in strings (Herbert J. Bernstein)
- References:
- Re: [ddlm-group] Use of elides in strings (James Hester)
- Re: [ddlm-group] Use of elides in strings (Nick Spadaccini)
- Prev by Date: Re: [ddlm-group] Use of elides in strings
- Next by Date: Re: [ddlm-group] Use of elides in strings
- Prev by thread: Re: [ddlm-group] Use of elides in strings
- Next by thread: Re: [ddlm-group] Use of elides in strings
- Index(es):