Discussion List Archives

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [ddlm-group] Summary of proposed CIF syntax changes




On 6/12/09 10:13 AM, "James Hester" <jamesrhester@gmail.com> wrote:

> On Sat, Dec 5, 2009 at 9:45 AM, Joe Krahn <krahn@niehs.nih.gov> wrote:
>> Semicolon and triple-quote strings do not emphasize that they cannot
>> contain embedded close-quotes, as done for single quotes.
> 
> That is, they cannot contain embedded triple quotes/embedded
> newline-semicolons.

Correct.

The wording in the document (pdf) Brian posted for me makes this clear for
all delimited strings, since the first subsequent terminating character
sequence delimits the token. Hence after the initialising """, the next
instance of """ terminates the string, so by definition it cannot contain
embedded """. Same for all the other string types.
  
>> In change 9, this sentence is hard to understand: "That does NOT require
>> that whitespace is necessary between the beginning of one token and the
>> beginning of the next token...". the main problem is that "token" is not
>> defined. I the example "[[1 2 3] [4 5 6]]" does each inner list count as
>> a token when parsing the outer list, and the initial '[' does not? Maybe
>> describe it as: whitespace is required between all values within a list
>> or table, but not between the values and the begin/end token.
>> 
>> Was it decided that "[[1 2 3][4 5 6]]" is not allowed?
> 
> Yes, we were looking for a concise expression that encompassed the following
> cases:
> 
> 1. [[1 2 3][4 5 6]] is allowed and is equivalent to [ [ 1 2 3 ] [ 4 5 6 ] ]
> 2. [abc[1 2 3]qef] is allowed and is equivalent to [ abc [ 1 2 3 ] qef ]
> 3. [ "abc""qef" ] is not allowed
> 
> Perhaps someone can suggest a better formulation?

The current construction defines data tokens, and that they need a separator
between the end of a data token and the beginning of the next. If one if to
build a parser that strictly adheres to the specification then

[[1 2 3] [4 5 6]] = [ [ 1 2 3 ] [ 4 5 6 ] ] is allowed and [[1 2 3][4 5 6]]
is strictly illegal.

[abc [1 2 3] qef] = [ abc [ 1 2 3 ] qef ] is allowed and [abc[1 2 3]qef] is
strictly illegal.

["abc" "qef"] is allowed and [ "abc""qef" ] is strictly illegal.

This would be for the "pedantic" implementation of the specification.
However given that we now accept that the terminating sequence is one (or
more) characters, irrespective of the space an implementation of the parser
can be more liberal.

In [[1 2 3][4 5 6]] the first ] terminates by definition the inner list. It
SHOULD have a separator but doesn't. The next [ initiates by definition a
new list, hence the parser can choose to interpret this as [[1 2 3] [4 5
6]]. An application should ALWAYS write a CIF according to the
specification, that is output [[1 2 3] [4 5 6]] and never [[1 2 3][4 5 6]].

BTW James is next me an happy with this interpretation. The same sort of
interpretation can be made for the other two examples.
 
>> It is not clear whether white space is allowed adjacent to the
>> associative colon.
> 
> The plan was to disallow it for simplicity, although the parsing would be
> unambiguous even if whitespace were present

Agreed.

>> Why does the associative index require quotes? Are there any
>> restrictions on the string index such as maximum length, or whether it
>> can contain multiple lines? Is matching case sensitive?
> 
> No restrictions on length, multiple lines possible, case sensitive matching. 

Agreed though I can't fathom why someone would want multiple lines in a hash
index.

> Requirement of quotes for simplicity - should we drop this?

And ease of parsing.
  
>> Also, the "smart quotes" in the PDF should be fixed to be normal ASCII.
>> 
>> 
>> Joe
>> _______________________________________________
>> ddlm-group mailing list
>> ddlm-group@iucr.org
>> http://scripts.iucr.org/mailman/listinfo/ddlm-group
> 
> 

cheers

Nick

--------------------------------
Associate Professor N. Spadaccini, PhD
School of Computer Science & Software Engineering

The University of Western Australia    t: +61 (0)8 6488 3452
35 Stirling Highway                    f: +61 (0)8 6488 1089
CRAWLEY, Perth,  WA  6009 AUSTRALIA   w3: www.csse.uwa.edu.au/~nick
MBDP  M002

CRICOS Provider Code: 00126G

e: Nick.Spadaccini@uwa.edu.au




_______________________________________________
ddlm-group mailing list
ddlm-group@iucr.org
http://scripts.iucr.org/mailman/listinfo/ddlm-group


Reply to: [list | sender only]
International Union of Crystallography

Scientific Union Member of the International Science Council (admitted 1947). Member of CODATA, the ISC Committee on Data. Partner with UNESCO, the United Nations Educational, Scientific and Cultural Organization in the International Year of Crystallography 2014.

International Science Council Scientific Freedom Policy

The IUCr observes the basic policy of non-discrimination and affirms the right and freedom of scientists to associate in international scientific activity without regard to such factors as ethnic origin, religion, citizenship, language, political stance, gender, sex or age, in accordance with the Statutes of the International Council for Science.