[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Reply to: [list | sender only]
Re: [ddlm-group] Use of elides in strings
- To: Group finalising DDLm and associated dictionaries <ddlm-group@iucr.org>
- Subject: Re: [ddlm-group] Use of elides in strings
- From: "Herbert J. Bernstein" <yaya@bernstein-plus-sons.com>
- Date: Tue, 24 Nov 2009 17:23:40 -0500 (EST)
- Cc: Nick.Spadaccini@uwa.edu.au
- In-Reply-To: <279aad2a0911241414j1d89b6b3mfec464fdc401fbfd@mail.gmail.com>
- References: <279aad2a0911231800g6c26bdaancdd4a38fecebbb7a@mail.gmail.com><C731AC95.125CB%nick@csse.uwa.edu.au><279aad2a0911241414j1d89b6b3mfec464fdc401fbfd@mail.gmail.com>
The only problem with referring all elisdes to the application is that with the removal of the requirement of a blank after a \n; for it to be effective, the line folding protocol develops a slight gap. The case is as follows ;\ ;\ ; Is a valid single text field in CIF 1.1, which when handled with the line folding protocol translates to the equivalent of ';' because the embedded ;\ is not a valid text terminator. If we require that a text field the begins with "\n;\\" must be terminated by "\n; " or "\n;\n" or "\n;\t" that problem would be fixed. ===================================================== Herbert J. Bernstein, Professor of Computer Science Dowling College, Kramer Science Center, KSC 121 Idle Hour Blvd, Oakdale, NY, 11769 +1-631-244-3035 yaya@dowling.edu ===================================================== On Wed, 25 Nov 2009, James Hester wrote: > I wholeheartedly agree with Nick's suggestion. > > On Tue, Nov 24, 2009 at 6:30 PM, Nick Spadaccini <nick@csse.uwa.edu.au> wrote: >> It appears to me that we have spent far too long on a syntactic issue which >> can be avoided 99.9999% of the time. Quite simply given the 5 ways to >> delimit strings, it is next to impossible to get a situation where you >> cannot choose one of those to make the problem go away. >> >> I think the RCSB systematically avoid it by choosing >> >> "ab'cd" >> 'ab"cd' >> ;ab'"cd >> ; >> >> But now we additionally have """ and ''' to choose from, making it even >> easier. >> >> So I propose in line with James' position there is NO eliding of terminator >> character at the CIF2 syntax level. ALL elides in the string are assumed to >> be user specific encoding (say TeX, IUCr \greek) which can be resolved at >> the dictionary level. >> >> This necessarily means NO terminator character can appear in a string >> delimited by the same terminator character. You will need to choose a >> different terminator character. That is >> >> No " in "strings" >> No ' in 'strings' >> No """ in """strings""" (but separable individual and doublet " are allowed) >> No ''' in '''strings''' (but separable individual and doublet ' are allowed) >> >> EVERYTHING in the string is returned as raw (except the initiating and >> terminating character). >> >> The only time you will not be able to encode anything in a delimited string >> is when you want to include ' " """ ''' and \n; in the one string. The >> likelihood of that is almost zero, unless you may want to include a CIF >> within a CIF (a silly thing to do IMHO). In that case the contents can be >> encoded in a dictionary driven way. I suggest it be declared as a BASE64 >> type and then all the syntactic ambiguity disappears. >> >> Problem solved! No need to elide because of CIF2 syntax rules all elides are >> user driven, contents are returned raw. >> >> As for Herbs comment in a recent email what about line-folding, then the >> same holds. That is NOT a lexer issue and it has nothing to do with the >> parser, everything is read literally and returned raw and what to do with it >> is promulgated to the downstream application. >> >> Straw vote - No elides of terminator strings as described above - Nick >> >> >> On 24/11/09 10:00 AM, "James Hester" <jamesrhester@gmail.com> wrote: >> >>> OK, my rewritten voting proposal appears to be an abject failure. Let >>> me repeat 1 as clearly as possible >>> >>> 1. Should CIF2 allow elision of terminator characters? In other >>> words, should we make it possible to include <quote> as a normal >>> character in a <quote> delimited string? >>> >>> Herbert: It's difficult to understand how to rephrase things if it is >>> not clear where exactly the problem lies. >>> >>> Joe: good point about double backslash. Consider this added to proposal (a). >>> >>> Before we discuss (2) precisely, can we agree to use the following >>> abstract model and terminology for CIF2 file parsing and dictionary >>> application? If not, please indicate your alternative. >>> >>> 1. A CIF lexer separates a CIF file into tokens according to the CIF2 >>> syntax specification only, that is, this process cannot be altered by >>> DDL directives. >>> >>> 2. A CIF parser accepts the tokens from the lexer. CIF parsers can be >>> modelled as performing at least the following actions with these >>> tokens: >>> (i) assignment of datavalue to dataname >>> (ii) grouping looped datanames into a set >>> (iii) assigning looped datavalues to the appropriate dataname and packet >>> (iv) editing datavalues according to the syntax specification if >>> this has not been performed in the lexer (e.g. stripping enclosing >>> quotes, removing elides) >>> >>> 3. DDL dictionaries operate on and refer to the datavalues and >>> datanames returned by the CIF parser after (2). They have no ability >>> to influence the lexing process, or the parsing actions listed above >>> (in particular the datavalue editing). >>> >>> 4. The 'string value' or 'value' of a token is that value returned by >>> the parser in (2). In particular, this is the value that: >>> (i) may be checked against regular expressions in the dictionary; >>> (ii) is accessed by dREL expressions; >>> (iii) is returned by dREL expressions; >>> (iv) is referred to in dictionary descriptive text; >>> (v) may be passed to client routines for further editing; >>> (vi) may be passed to external applications >>> >>> [Side note: in other words the parser returns the CIF "infoset" and >>> the dictionaries refer to the CIF "infoset", but we haven't been >>> talking in those terms so I've been more explicit]. >>> >>> So my voting question (2) is: should the 'string value' of a token >>> referred to in (4) include the eliding characters? >>> >>> >>> On Tue, Nov 24, 2009 at 10:57 AM, Joe Krahn <krahn@niehs.nih.gov> wrote: >>>> A few points to consider: >>>> >>>> James Hester wrote: >>>> ... >>>>> 2. Character(s) used to indicate elision should be part of the string value >>>> This does not specify where the elision character should be stripped. It >>>> could be done by the parser or the dictionary-level code. The rule only >>>> refers to the final string for the final output text, right? >>>> >>>>> >>>>> Now for the specifics: >>>>> >>>>> 3. Which of the following elision proposals do you support (more than one >>>>> OK)? >>>>> >>>>> Proposal (a) (intended to correspond to Nick's) >>>>> (i) A character which would otherwise be interpreted as a delimiter >>>>> is elided by immediately preceding it with a reverse solidus. >>>>> (ii) Otherwise a reverse solidus in the string has no special >>>>> lexical significance. >>>>> >>>>> Proposal (b) >>>>> (i) The combinations <reverse solidus><quote> or a <reverse >>>>> solidus><double quote> always signify <quote> and <double quote> >>>>> respectively, regardless of the delimiter used in a particular string. >>>>> (ii) The combinations in (i) elide the <quote> or <double quote> >>>>> character where that character would otherwise terminate the string >>>>> (iii) Apart from (i) and (ii), the reverse solidus has no special >>>>> significance >>>>> (iv) If not used as the string delimiter, <quote> or <double quote> >>>>> when not preceded by <reverse solidus> represent themselves. >>>> >>>> In both forms <reverse solidus><reverse solidus> should also be defined >>>> in order to allow a literal string that ends in <reverse solidus>. For >>>> example, a single <reverse solidus> character has to be written as "\\", >>>> to avoid eliding the close quote. >>>> >>>> Joe Krahn >>>> _______________________________________________ >>>> ddlm-group mailing list >>>> ddlm-group@iucr.org >>>> http://scripts.iucr.org/mailman/listinfo/ddlm-group >>>> >>> >>> >> >> cheers >> >> Nick >> >> -------------------------------- >> Associate Professor N. Spadaccini, PhD >> School of Computer Science & Software Engineering >> >> The University of Western Australia t: +61 (0)8 6488 3452 >> 35 Stirling Highway f: +61 (0)8 6488 1089 >> CRAWLEY, Perth, WA 6009 AUSTRALIA w3: www.csse.uwa.edu.au/~nick >> MBDP M002 >> >> CRICOS Provider Code: 00126G >> >> e: Nick.Spadaccini@uwa.edu.au >> >> >> >> >> _______________________________________________ >> ddlm-group mailing list >> ddlm-group@iucr.org >> http://scripts.iucr.org/mailman/listinfo/ddlm-group >> > > > > -- > T +61 (02) 9717 9907 > F +61 (02) 9717 3145 > M +61 (04) 0249 4148 > _______________________________________________ > ddlm-group mailing list > ddlm-group@iucr.org > http://scripts.iucr.org/mailman/listinfo/ddlm-group >
_______________________________________________ ddlm-group mailing list ddlm-group@iucr.org http://scripts.iucr.org/mailman/listinfo/ddlm-group
Reply to: [list | sender only]
- Follow-Ups:
- Re: [ddlm-group] Use of elides in strings (James Hester)
- References:
- Re: [ddlm-group] Use of elides in strings (James Hester)
- Re: [ddlm-group] Use of elides in strings (Nick Spadaccini)
- Re: [ddlm-group] Use of elides in strings (James Hester)
- Prev by Date: Re: [ddlm-group] Use of elides in strings
- Next by Date: Re: [ddlm-group] Use of elides in strings
- Prev by thread: Re: [ddlm-group] Use of elides in strings
- Next by thread: Re: [ddlm-group] Use of elides in strings
- Index(es):