[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Reply to: [list | sender only]
Re: [ddlm-group] THREAD 3: The alphabet of non-delimited strings.
- To: Group finalising DDLm and associated dictionaries <email@example.com>
- Subject: Re: [ddlm-group] THREAD 3: The alphabet of non-delimited strings.
- From: James Hester <firstname.lastname@example.org>
- Date: Fri, 16 Oct 2009 14:30:33 +0300
- In-Reply-To: <email@example.com>
- References: <C6FDA0C1.firstname.lastname@example.org><email@example.com>
Nick's latest draft looks promising. Remembering that standards involve being pedantic, note the following comments. Nick writes: However an appropriate separator is required between tokens to unambiguously parse a CIF2 document. The appropriate separator is defined by the context in which it is used. For example at the highest-level, a whitespace serves this purpose. In a List object the ASCII , serves this purpose. In the Associative Array object, the separators are ASCII : and ASCII ,. The absence of a separator or use of the incorrect separator will give rise to ambiguity and possible error. The coercion rules for these cases need to be argued by the “community”. As a result of the character set restrictions, the first line would more accurately read "However, an appropriate separator is *sometimes* required between tokens to unambiguously parse a CIF2 document.", and somewhere I would add something like: "For consistency, backwards compatibility and transferability into non-CIF applications, a separator must always appear between top-level tokens even when not strictly required in order to successfully scan the tokens". Regarding coercion rules, I would like option (1) "give error message and die" to always be legal behaviour, and the only sanctioned behaviour for a validating parser. That said, discussion of recovery strategies as per option (2) is appropriate in notes on the standard, and somewhere it should be noted that these changes have made CIF2 somewhat more robust against certain types of file corruption. My reason for insisting on (1) being legal is that this should be sufficient to ensure that CIF2 writers always pad between tokens, and the only times that approaches in (2) will be required are when files have been corrupted. I note also that STAR and CIF2 could diverge at this point without undue problems: e.g. STAR could adopt (2) and CIF2 could adopt (1), CIF2 could require whitespace, STAR could be less strict. Moving on to eliding apostrophes and quotes: I remain to see the need for doing this at all, given that we will have triple quoted and semicolon delimited strings for the pathological cases of single line strings which contain both quote and apostrophe characters. If we must have them I agree with where Simon's original thinking was going, and what Nick's latest email (as of this morning) mentioned. The source of the problem is that the elide character is overloaded: it fulfills a function on the lexical level and arbitrary functions at higher levels (IUCr, Latex, unicode...). To simplify things, you need to decouple it as follows (as Nick wrote): For single quote strings: \' -> ' delivered to the application \\ -> \ delivered to the application \\\' -> \' delivered to the application \x for any other character -> \x And why should the IUCr decide how to do things on this level - we give them a way to get an acute accent (or use Unicode). But frankly, I fail to see the need for this eliding, as I failed to see the need for optional whitespace. Perhaps an example, however artificial, where only this eliding can produce the required string? -- T +61 (02) 9717 9907 F +61 (02) 9717 3145 M +61 (04) 0249 4148 _______________________________________________ ddlm-group mailing list firstname.lastname@example.org http://scripts.iucr.org/mailman/listinfo/ddlm-group
Reply to: [list | sender only]
- Re: [ddlm-group] THREAD 3: The alphabet of non-delimited strings. (Herbert J. Bernstein)
- Re: [ddlm-group] THREAD 3: The alphabet of non-delimited strings. (Nick Spadaccini)
- Re: [ddlm-group] THREAD 3: The alphabet of non-delimited strings. (SIMON WESTRIP)
- Prev by Date: Re: [ddlm-group] THREAD 3: The alphabet of non-delimited strings.
- Next by Date: Re: [ddlm-group] [THREAD 4] UTF8
- Prev by thread: Re: [ddlm-group] THREAD 3: The alphabet of non-delimited strings.
- Next by thread: Re: [ddlm-group] THREAD 3: The alphabet of non-delimited strings.