[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[ddlm-group] A final non-delimited string definition.

Until now the definition of a non delimited string was

-----------------------
A data value in CIF2 may be a non-delimited string of UTF-8 characters, but
excluding the ASCII characters, : { } [ ].

As with CIF1, the first character of a non-delimited string cannot be any of
the ASCII characters, " ' _  $, since these have special meaning. A
non-delimited string cannot exactly match any STAR keyword, loop_ global_
save_* stop_ data_*, where * refers to zero or more characters.
-----------------------

We accept white space delimited lists and tables, and if we restrict
ourselves to table indices being delimited strings then we can re-inject : {
} [ and ] as an allowed character with in a non-delimited string.

This will significantly minimise handling issues with legacy CIFs and the
need for remediation.

This means the "users" (and I never know who these people are, but they want
to be able to do everything) can have everything in a non-delimited string
except <whitespace> or { [ " ' _ $ at the beginning or } ] at the end and
can't exactly match a STAR keyword.

I actually don't think this is a good way to go but their seems to be a
propensity of belief that users want all of this freedom, so I am happy for
parser developers and future dREL implementers to deal with it if the rest
of the group think it desirable.

On 8/02/10 12:57 PM, "Nick Spadaccini" <nick@csse.uwa.edu.au> wrote:

> This is example is made more convoluted by including , in lists, which have
> no meaning. If I recall correctly we agreed on space delimited list values.
> If we stick to that then the confusions below disappear. It also removes the
> dangling comma and double comma problem.
> 
> The parsing with a compound data type is exactly the same as outside a
> compound data type (hence you can build a simpler recursive descent parser).
> 
> The correct definition of the list below is
> 
> [1 #one
>  2 #two
>  3 #three
>  4 #four
> ]
> 
> Which is the list [1 2 3 4] with embedded comments #one #two #three and
> #four.
> 
> The list below is (I add the quotes for clarity)
> ["1,#one" "2," 3 ",4"] with the embedded quotes #two #three and #four
> 
> 
> On 24/12/09 2:58 AM, "Joe Krahn" <krahn@niehs.nih.gov> wrote:
> 
>> James Hester wrote:
>>> I would answer as follows:
>> ...
>>>     2) What are the rules for comments within lists and tables?
>>> 
>>> I would treat them as whitespace
>> One detail is whether the "#" starting a comment requires preceding
>> whitespace. Herbert's example is:
>> 
>>     [1,#one
>>      2, #two
>>      3 #three
>>      ,4 #four
>>      ]
>> 
>> He suggests that preceding whitespace is there only if needed to
>> terminate the preceding token, and not a requirement in the actual
>> comment syntax. This looks OK in the above example, but may not be clear
>> after a quoted string:
>>     "string"#comment
>> 
>> Or, perhaps this is no less clear than the lack of whitespace in a list:
>>    ["string"]
>> 
>> 
>>>     4) Why require single or double quotes for table index strings, rather
>>>     than just follow the normal quoting rules?
>>> 
>>> 
>>> No good reason - so let's just follow the normal rules.
>>> 
>> Should quotes be requires at all for the index string? Correct parsing
>> only requires quotes if the index string contains a colon. In the
>> current draft, that is imposed for all strings, not just table-index
>> strings. So, there is no need to mandate quotes here, unless the global
>> requirement to quote strings with : is dropped.
>> 
>> Maybe the intention was to disallow multi-line index strings?
>> 
>> 
>>>     Some of these are more technical details compared to the other issues.
>>>     These came up while I was working on a big CIF2 regular-expression,
>>>     where parsing details have to be considered more carefully.
>>> 
>>> Actually, it would be rather good if you could post these regular
>>> expressions once we have a final specification, as they are likely to be
>>> useful to a broad audience.
>> I plan to do that. To be fully functional, it has to be done in Perl
>> syntax, which has a feature that allows recursion for table and list
>> values. Despite that caveat, it will be useful even where the full
>> recursive expression will not work.
>> 
>> Joe
>> _______________________________________________
>> ddlm-group mailing list
>> ddlm-group@iucr.org
>> http://scripts.iucr.org/mailman/listinfo/ddlm-group
> 
> cheers
> 
> Nick
> 
> --------------------------------
> Associate Professor N. Spadaccini, PhD
> School of Computer Science & Software Engineering
> 
> The University of Western Australia    t: +61 (0)8 6488 3452
> 35 Stirling Highway                    f: +61 (0)8 6488 1089
> CRAWLEY, Perth,  WA  6009 AUSTRALIA   w3: www.csse.uwa.edu.au/~nick
> MBDP  M002
> 
> CRICOS Provider Code: 00126G
> 
> e: Nick.Spadaccini@uwa.edu.au
> 
> 
> 
> 
> _______________________________________________
> ddlm-group mailing list
> ddlm-group@iucr.org
> http://scripts.iucr.org/mailman/listinfo/ddlm-group

cheers

Nick

--------------------------------
Associate Professor N. Spadaccini, PhD
School of Computer Science & Software Engineering

The University of Western Australia    t: +61 (0)8 6488 3452
35 Stirling Highway                    f: +61 (0)8 6488 1089
CRAWLEY, Perth,  WA  6009 AUSTRALIA   w3: www.csse.uwa.edu.au/~nick
MBDP  M002

CRICOS Provider Code: 00126G

e: Nick.Spadaccini@uwa.edu.au




_______________________________________________
ddlm-group mailing list
ddlm-group@iucr.org
http://scripts.iucr.org/mailman/listinfo/ddlm-group

Reply to: [list | sender only]