[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Reply to: [list | sender only]
Re: [ddlm-group] CIF-2 changes
- To: Nick.Spadaccini@uwa.edu.au, Group finalising DDLm and associated dictionaries <ddlm-group@iucr.org>
- Subject: Re: [ddlm-group] CIF-2 changes
- From: James Hester <jamesrhester@gmail.com>
- Date: Mon, 9 Nov 2009 22:30:45 +1100
- In-Reply-To: <C71DF323.12383%nick@csse.uwa.edu.au>
- References: <20091029225032.U8567@epsilon.pair.com><C71DF323.12383%nick@csse.uwa.edu.au>
I wrote: > And the only CIF2 parsers that will fail when they see a square > bracket in a dataname are those that are (incorrectly) prepared to > accept no spaces between dataname and datavalue. So I repeat: the > only reason we have moved away from square brackets as list delimiters > is so that in the specific case that a space is missing between a > dataname and a datavalue the parser can continue. I see no other > justification. Nick responded: >Yes it is the reason. But short of re-visiting the long discussion on >whitespace as token separators (as they usually are) versus whitespace being >1 of the 2 token terminating characters, and the subsequent problem that >there need to be two definitions for every type depending on their position >in recursion, it is a necessary consequence. I don't see any necessary consequence. We stopped using whitespace as one of two token terminating characters the moment we agreed that a closing quote/double quote finished a quote-delimited string regardless of the following character (and we have adopted the same philosophy for bracket-delimited values). Whitespace in CIF2 is purely a token separator, and remains so whether or not brackets are allowed inside datanames. I repeat, allowing brackets inside datanames will not change the grammar *at all*: it will simply mean two extra characters in the list of acceptable characters for a dataname. In particular, I see no relevance for recursive parsing or the need for two definitions for every type. Take the following CIF fragment: ... _foo[bar]_blahxyz [elephant, cow, orangutang, [xxx]] A lexer will tokenize the first entry as 'dataname', with a value of '_foo[bar]_blahxyz', because it will continue eating characters until it gets to a disallowed character, or the token separator (whitespace). It then tokenises all whitespace the same way, by including all characters included in the definition of whitespace, and then tokenizes the single open square bracket. In what way has having an open square bracket inside the dataname complicated the parse? Would this be simpler without square brackets in the list of allowed characters for a dataname? Note that the parse is identical no matter what type of brackets are used to start the list, so why use braces anyway? Put another way, we are in the nice position that following a whitespace we can almost always predict the token based purely on the first character. If '_', then it is a dataname If <quote> or <double quote> it is a datavalue If alphanumeric then it is a non-delimited datavalue, unless the first characters are 'loop_' or 'data_' If <open bracket> then it is a list This is true whether or not brackets of any sort are included in the allowed characterset for a dataname. If you disagree with this, I would like to see an example of where having brackets in a dataname complicates the grammar compared to not having them. On Mon, Nov 9, 2009 at 7:26 PM, Nick Spadaccini <nick@csse.uwa.edu.au> wrote: > As I said in my previous email. The gain is that you can determine where you > are at a lexical level without having to go further in to the parsing. There > is a reason why languages use [] and {} separately, and that ease. > > If computer scientists have learnt one thing in the last 50 years, it is how > to design and specify languages so that you avoid ambiguity and complexity. Agreed that a different type of bracket for tables is preferable. -- T +61 (02) 9717 9907 F +61 (02) 9717 3145 M +61 (04) 0249 4148 _______________________________________________ ddlm-group mailing list ddlm-group@iucr.org http://scripts.iucr.org/mailman/listinfo/ddlm-group
Reply to: [list | sender only]
- Follow-Ups:
- Re: [ddlm-group] CIF-2 changes (Nick Spadaccini)
- References:
- Re: [ddlm-group] CIF-2 changes (Herbert J. Bernstein)
- Re: [ddlm-group] CIF-2 changes (Nick Spadaccini)
- Prev by Date: Re: [ddlm-group] Relationship of CIF2 to legacy platforms
- Next by Date: Re: [ddlm-group] Triple-quoted strings
- Prev by thread: Re: [ddlm-group] CIF-2 changes
- Next by thread: Re: [ddlm-group] CIF-2 changes
- Index(es):