[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [ddlm-group] Summary of proposed CIF syntax changes

To: Group finalising DDLm and associated dictionaries <ddlm-group@iucr.org>
Subject: Re: [ddlm-group] Summary of proposed CIF syntax changes
From: Nick Spadaccini <nick@csse.uwa.edu.au>
Date: Wed, 09 Dec 2009 13:15:50 +0800
Authentication-Results: postfix;
In-Reply-To: <279aad2a0912051813p27ea25cdre57fe284efc81358@mail.gmail.com>




On 6/12/09 10:13 AM, "James Hester" <jamesrhester@gmail.com> wrote:

> On Sat, Dec 5, 2009 at 9:45 AM, Joe Krahn <krahn@niehs.nih.gov> wrote:
>> Semicolon and triple-quote strings do not emphasize that they cannot
>> contain embedded close-quotes, as done for single quotes.
> 
> That is, they cannot contain embedded triple quotes/embedded
> newline-semicolons.

Correct.

The wording in the document (pdf) Brian posted for me makes this clear for
all delimited strings, since the first subsequent terminating character
sequence delimits the token. Hence after the initialising """, the next
instance of """ terminates the string, so by definition it cannot contain
embedded """. Same for all the other string types.
 �
>> In change 9, this sentence is hard to understand: "That does NOT require
>> that whitespace is necessary between the beginning of one token and the
>> beginning of the next token...". the main problem is that "token" is not
>> defined. I the example "[[1 2 3] [4 5 6]]" does each inner list count as
>> a token when parsing the outer list, and the initial '[' does not? Maybe
>> describe it as: whitespace is required between all values within a list
>> or table, but not between the values and the begin/end token.
>> 
>> Was it decided that "[[1 2 3][4 5 6]]" is not allowed?
> 
> Yes, we were looking for a concise expression that encompassed the following
> cases:
> 
> 1. [[1 2 3][4 5 6]] is allowed and is equivalent to [ [ 1 2 3 ] [ 4 5 6 ] ]
> 2. [abc[1 2 3]qef] is allowed and is equivalent to [ abc [ 1 2 3 ] qef ]
> 3. [ "abc""qef" ] is not allowed
> 
> Perhaps someone can suggest a better formulation?

The current construction defines data tokens, and that they need a separator
between the end of a data token and the beginning of the next. If one if to
build a parser that strictly adheres to the specification then

[[1 2 3] [4 5 6]] = [ [ 1 2 3 ] [ 4 5 6 ] ] is allowed and [[1 2 3][4 5 6]]
is strictly illegal.

[abc [1 2 3] qef] = [ abc [ 1 2 3 ] qef ] is allowed and [abc[1 2 3]qef] is
strictly illegal.

["abc" "qef"] is allowed and [ "abc""qef" ] is strictly illegal.

This would be for the "pedantic" implementation of the specification.
However given that we now accept that the terminating sequence is one (or
more) characters, irrespective of the space an implementation of the parser
can be more liberal.

In [[1 2 3][4 5 6]] the first ] terminates by definition the inner list. It
SHOULD have a separator but doesn't. The next [ initiates by definition a
new list, hence the parser can choose to interpret this as [[1 2 3] [4 5
6]]. An application should ALWAYS write a CIF according to the
specification, that is output [[1 2 3] [4 5 6]] and never [[1 2 3][4 5 6]].

BTW James is next me an happy with this interpretation. The same sort of
interpretation can be made for the other two examples.
 
>> It is not clear whether white space is allowed adjacent to the
>> associative colon.
> 
> The plan was to disallow it for simplicity, although the parsing would be
> unambiguous even if whitespace were present

Agreed.

>> Why does the associative index require quotes? Are there any
>> restrictions on the string index such as maximum length, or whether it
>> can contain multiple lines? Is matching case sensitive?
> 
> No restrictions on length, multiple lines possible, case sensitive matching.�

Agreed though I can't fathom why someone would want multiple lines in a hash
index.

> Requirement of quotes for simplicity - should we drop this?

And ease of parsing.
 �
>> Also, the "smart quotes" in the PDF should be fixed to be normal ASCII.
>> 
>> 
>> Joe
>> _______________________________________________
>> ddlm-group mailing list
>> ddlm-group@iucr.org
>> http://scripts.iucr.org/mailman/listinfo/ddlm-group
> 
> 

cheers

Nick

--------------------------------
Associate Professor N. Spadaccini, PhD
School of Computer Science & Software Engineering

The University of Western Australia    t: +61 (0)8 6488 3452
35 Stirling Highway                    f: +61 (0)8 6488 1089
CRAWLEY, Perth,  WA  6009 AUSTRALIA   w3: www.csse.uwa.edu.au/~nick
MBDP  M002

CRICOS Provider Code: 00126G

e: Nick.Spadaccini@uwa.edu.au




_______________________________________________
ddlm-group mailing list
ddlm-group@iucr.org
http://scripts.iucr.org/mailman/listinfo/ddlm-group

Reply to: [list | sender only]

References:

Re: [ddlm-group] Summary of proposed CIF syntax changes (James Hester)

Prev by Date: Re: [ddlm-group] Data names

Next by Date: Re: [ddlm-group] Elide close quotes by doubling?

Prev by thread: Re: [ddlm-group] Summary of proposed CIF syntax changes

Next by thread: [ddlm-group] Syntax summary? Wiki?

Index(es):

Date

Thread

Discussion List Archives

Re: [ddlm-group] Summary of proposed CIF syntax changes