Discussion List Archives

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[ddlm-group] A final non-delimited string definition.

Until now the definition of a non delimited string was

-----------------------
A data value in CIF2 may be a non-delimited string of UTF-8 characters, but
excluding the ASCII characters, : { } [ ].

As with CIF1, the first character of a non-delimited string cannot be any of
the ASCII characters, " ' _  $, since these have special meaning. A
non-delimited string cannot exactly match any STAR keyword, loop_ global_
save_* stop_ data_*, where * refers to zero or more characters.
-----------------------

We accept white space delimited lists and tables, and if we restrict
ourselves to table indices being delimited strings then we can re-inject : {
} [ and ] as an allowed character with in a non-delimited string.

This will significantly minimise handling issues with legacy CIFs and the
need for remediation.

This means the "users" (and I never know who these people are, but they want
to be able to do everything) can have everything in a non-delimited string
except <whitespace> or { [ " ' _ $ at the beginning or } ] at the end and
can't exactly match a STAR keyword.

I actually don't think this is a good way to go but their seems to be a
propensity of belief that users want all of this freedom, so I am happy for
parser developers and future dREL implementers to deal with it if the rest
of the group think it desirable.

On 8/02/10 12:57 PM, "Nick Spadaccini" <nick@csse.uwa.edu.au> wrote:

> This is example is made more convoluted by including , in lists, which have
> no meaning. If I recall correctly we agreed on space delimited list values.
> If we stick to that then the confusions below disappear. It also removes the
> dangling comma and double comma problem.
> 
> The parsing with a compound data type is exactly the same as outside a
> compound data type (hence you can build a simpler recursive descent parser).
> 
> The correct definition of the list below is
> 
> [1 #one
>  2 #two
>  3 #three
>  4 #four
> ]
> 
> Which is the list [1 2 3 4] with embedded comments #one #two #three and
> #four.
> 
> The list below is (I add the quotes for clarity)
> ["1,#one" "2," 3 ",4"] with the embedded quotes #two #three and #four
> 
> 
> On 24/12/09 2:58 AM, "Joe Krahn" <krahn@niehs.nih.gov> wrote:
> 
>> James Hester wrote:
>>> I would answer as follows:
>> ...
>>>     2) What are the rules for comments within lists and tables?
>>> 
>>> I would treat them as whitespace
>> One detail is whether the "#" starting a comment requires preceding
>> whitespace. Herbert's example is:
>> 
>>     [1,#one
>>      2, #two
>>      3 #three
>>      ,4 #four
>>      ]
>> 
>> He suggests that preceding whitespace is there only if needed to
>> terminate the preceding token, and not a requirement in the actual
>> comment syntax. This looks OK in the above example, but may not be clear
>> after a quoted string:
>>     "string"#comment
>> 
>> Or, perhaps this is no less clear than the lack of whitespace in a list:
>>    ["string"]
>> 
>> 
>>>     4) Why require single or double quotes for table index strings, rather
>>>     than just follow the normal quoting rules?
>>> 
>>> 
>>> No good reason - so let's just follow the normal rules.
>>> 
>> Should quotes be requires at all for the index string? Correct parsing
>> only requires quotes if the index string contains a colon. In the
>> current draft, that is imposed for all strings, not just table-index
>> strings. So, there is no need to mandate quotes here, unless the global
>> requirement to quote strings with : is dropped.
>> 
>> Maybe the intention was to disallow multi-line index strings?
>> 
>> 
>>>     Some of these are more technical details compared to the other issues.
>>>     These came up while I was working on a big CIF2 regular-expression,
>>>     where parsing details have to be considered more carefully.
>>> 
>>> Actually, it would be rather good if you could post these regular
>>> expressions once we have a final specification, as they are likely to be
>>> useful to a broad audience.
>> I plan to do that. To be fully functional, it has to be done in Perl
>> syntax, which has a feature that allows recursion for table and list
>> values. Despite that caveat, it will be useful even where the full
>> recursive expression will not work.
>> 
>> Joe
>> _______________________________________________
>> ddlm-group mailing list
>> ddlm-group@iucr.org
>> http://scripts.iucr.org/mailman/listinfo/ddlm-group
> 
> cheers
> 
> Nick
> 
> --------------------------------
> Associate Professor N. Spadaccini, PhD
> School of Computer Science & Software Engineering
> 
> The University of Western Australia    t: +61 (0)8 6488 3452
> 35 Stirling Highway                    f: +61 (0)8 6488 1089
> CRAWLEY, Perth,  WA  6009 AUSTRALIA   w3: www.csse.uwa.edu.au/~nick
> MBDP  M002
> 
> CRICOS Provider Code: 00126G
> 
> e: Nick.Spadaccini@uwa.edu.au
> 
> 
> 
> 
> _______________________________________________
> ddlm-group mailing list
> ddlm-group@iucr.org
> http://scripts.iucr.org/mailman/listinfo/ddlm-group

cheers

Nick

--------------------------------
Associate Professor N. Spadaccini, PhD
School of Computer Science & Software Engineering

The University of Western Australia    t: +61 (0)8 6488 3452
35 Stirling Highway                    f: +61 (0)8 6488 1089
CRAWLEY, Perth,  WA  6009 AUSTRALIA   w3: www.csse.uwa.edu.au/~nick
MBDP  M002

CRICOS Provider Code: 00126G

e: Nick.Spadaccini@uwa.edu.au




_______________________________________________
ddlm-group mailing list
ddlm-group@iucr.org
http://scripts.iucr.org/mailman/listinfo/ddlm-group

Reply to: [list | sender only]
International Union of Crystallography

Scientific Union Member of the International Council for Science (admitted 1947). Member of CODATA, the ICSU Committee on Data. Member of ICSTI, the International Council for Scientific and Technical Information. Partner with UNESCO, the United Nations Educational, Scientific and Cultural Organization in the International Year of Crystallography 2014.

ICSU Scientific Freedom Policy

The IUCr observes the basic policy of non-discrimination and affirms the right and freedom of scientists to associate in international scientific activity without regard to such factors as ethnic origin, religion, citizenship, language, political stance, gender, sex or age, in accordance with the Statutes of the International Council for Science.