[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
James.
--
T +61 (02) 9717 9907
F +61 (02) 9717 3145
M +61 (04) 0249 4148
Reply to: [list | sender only]
Re: [ddlm-group] A final non-delimited string definition.
- To: Nick.Spadaccini@uwa.edu.au, Group finalising DDLm and associated dictionaries <ddlm-group@iucr.org>
- Subject: Re: [ddlm-group] A final non-delimited string definition.
- From: James Hester <jamesrhester@gmail.com>
- Date: Mon, 8 Feb 2010 16:36:14 +1100
- In-Reply-To: <C795C1FC.12C51%nick@csse.uwa.edu.au>
- References: <C795BACB.12C49%nick@csse.uwa.edu.au><C795C1FC.12C51%nick@csse.uwa.edu.au>
James.
On Mon, Feb 8, 2010 at 4:28 PM, Nick Spadaccini <nick@csse.uwa.edu.au> wrote:
Until now the definition of a non delimited string was
-----------------------
A data value in CIF2 may be a non-delimited string of UTF-8 characters, but
excluding the ASCII characters, : { } [ ].
As with CIF1, the first character of a non-delimited string cannot be any of
the ASCII characters, " ' _ $, since these have special meaning. A
non-delimited string cannot exactly match any STAR keyword, loop_ global_
save_* stop_ data_*, where * refers to zero or more characters.
-----------------------
We accept white space delimited lists and tables, and if we restrict
ourselves to table indices being delimited strings then we can re-inject : {
} [ and ] as an allowed character with in a non-delimited string.
This will significantly minimise handling issues with legacy CIFs and the
need for remediation.
This means the "users" (and I never know who these people are, but they want
to be able to do everything) can have everything in a non-delimited string
except <whitespace> or { [ " ' _ $ at the beginning or } ] at the end and
can't exactly match a STAR keyword.
I actually don't think this is a good way to go but their seems to be a
propensity of belief that users want all of this freedom, so I am happy for
parser developers and future dREL implementers to deal with it if the rest
of the group think it desirable.
On 8/02/10 12:57 PM, "Nick Spadaccini" <nick@csse.uwa.edu.au> wrote:
> This is example is made more convoluted by including , in lists, which have
> no meaning. If I recall correctly we agreed on space delimited list values.
> If we stick to that then the confusions below disappear. It also removes the
> dangling comma and double comma problem.
>
> The parsing with a compound data type is exactly the same as outside a
> compound data type (hence you can build a simpler recursive descent parser).
>
> The correct definition of the list below is
>
> [1 #one
> 2 #two
> 3 #three
> 4 #four
> ]
>
> Which is the list [1 2 3 4] with embedded comments #one #two #three and
> #four.
>
> The list below is (I add the quotes for clarity)
> ["1,#one" "2," 3 ",4"] with the embedded quotes #two #three and #four
>
>
> On 24/12/09 2:58 AM, "Joe Krahn" <krahn@niehs.nih.gov> wrote:
>
>> James Hester wrote:
>>> I would answer as follows:
>> ...
>>> 2) What are the rules for comments within lists and tables?
>>>
>>> I would treat them as whitespace
>> One detail is whether the "#" starting a comment requires preceding
>> whitespace. Herbert's example is:
>>
>> [1,#one
>> 2, #two
>> 3 #three
>> ,4 #four
>> ]
>>
>> He suggests that preceding whitespace is there only if needed to
>> terminate the preceding token, and not a requirement in the actual
>> comment syntax. This looks OK in the above example, but may not be clear
>> after a quoted string:
>> "string"#comment
>>
>> Or, perhaps this is no less clear than the lack of whitespace in a list:
>> ["string"]
>>
>>
>>> 4) Why require single or double quotes for table index strings, rather
>>> than just follow the normal quoting rules?
>>>
>>>
>>> No good reason - so let's just follow the normal rules.
>>>
>> Should quotes be requires at all for the index string? Correct parsing
>> only requires quotes if the index string contains a colon. In the
>> current draft, that is imposed for all strings, not just table-index
>> strings. So, there is no need to mandate quotes here, unless the global
>> requirement to quote strings with : is dropped.
>>
>> Maybe the intention was to disallow multi-line index strings?
>>
>>
>>> Some of these are more technical details compared to the other issues.
>>> These came up while I was working on a big CIF2 regular-expression,
>>> where parsing details have to be considered more carefully.
>>>
>>> Actually, it would be rather good if you could post these regular
>>> expressions once we have a final specification, as they are likely to be
>>> useful to a broad audience.
>> I plan to do that. To be fully functional, it has to be done in Perl
>> syntax, which has a feature that allows recursion for table and list
>> values. Despite that caveat, it will be useful even where the full
>> recursive expression will not work.
>>
>> Joe
>> _______________________________________________
>> ddlm-group mailing list
>> ddlm-group@iucr.org
>> http://scripts.iucr.org/mailman/listinfo/ddlm-group
>
> cheers
>
> Nick
>
> --------------------------------
> Associate Professor N. Spadaccini, PhD
> School of Computer Science & Software Engineering
>
> The University of Western Australia t: +61 (0)8 6488 3452
> 35 Stirling Highway f: +61 (0)8 6488 1089
> CRAWLEY, Perth, WA 6009 AUSTRALIA w3: www.csse.uwa.edu.au/~nick
> MBDP M002
>
> CRICOS Provider Code: 00126G
>
> e: Nick.Spadaccini@uwa.edu.au
>
>
>
>
> _______________________________________________
> ddlm-group mailing list
> ddlm-group@iucr.org
> http://scripts.iucr.org/mailman/listinfo/ddlm-group
cheers
Nick
--------------------------------
Associate Professor N. Spadaccini, PhD
School of Computer Science & Software Engineering
The University of Western Australia t: +61 (0)8 6488 3452
35 Stirling Highway f: +61 (0)8 6488 1089
CRAWLEY, Perth, WA 6009 AUSTRALIA w3: www.csse.uwa.edu.au/~nick
MBDP M002
CRICOS Provider Code: 00126G
e: Nick.Spadaccini@uwa.edu.au
_______________________________________________
ddlm-group mailing list
ddlm-group@iucr.org
http://scripts.iucr.org/mailman/listinfo/ddlm-group
--
T +61 (02) 9717 9907
F +61 (02) 9717 3145
M +61 (04) 0249 4148
_______________________________________________ ddlm-group mailing list ddlm-group@iucr.org http://scripts.iucr.org/mailman/listinfo/ddlm-group
Reply to: [list | sender only]
- References:
- Re: [ddlm-group] CIF2 Syntax all wrapped up? (Nick Spadaccini)
- [ddlm-group] A final non-delimited string definition. (Nick Spadaccini)
- Prev by Date: [ddlm-group] A final non-delimited string definition.
- Next by Date: [ddlm-group] Datanames and [] - the final(?) outstanding syntaxissue.
- Prev by thread: [ddlm-group] A final non-delimited string definition.
- Next by thread: [ddlm-group] Comments and folding within lists and tables
- Index(es):