[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [ddlm-group] Revised version of syntax change summary document

To: Group finalising DDLm and associated dictionaries <ddlm-group@iucr.org>
Subject: Re: [ddlm-group] Revised version of syntax change summary document
From: Brian McMahon <bm@iucr.org>
Date: Wed, 9 Dec 2009 14:03:56 +0000
In-Reply-To: <20091209100252.GA6642@emerald.iucr.org>
References: <20091209100252.GA6642@emerald.iucr.org>

A few comments on the latest version of the CIF2 syntax changes
summary document.

I'm glad to see the explanation of tokens and separators. I was going
to ask for something of the sort. The visual aid is quite a good way
of doing this - and it does emphasise that the word "token" is a
rather dangerous (i.e. potentially ambiguous) one, since it can
apply promiscuously to a complete list or to lists contained in lists
or - n'est-ce pas? - to the individual elements within a list.

For the target audience for this document, this level of ambiguity,
normally resolved by context, is probably OK, but we should be very
careful in drafting the final complete specification document.

In similar vein, a complete specification should probably define
very carefully what is meant by phrases such as "lexical characters".
Again, I don't think that degree of pedantry is necessary for the
purposes of getting this out to the developer community.

A few more specific points.

1. Permitted character set (under "Terminology" and/or "Encoding").
CIF 1.1 explicitly EXCLUDES some of the characters in the ASCII set,
usually thought of as 'control characters'. Specifically, the excluded
characters are (decimal values) 00-08, 11, 12, 14-31 and 127. Should
this be restated clearly in this document for clarity?

[Possibly relevant: what are the "additional 20 UNICODE characters
that constitute whitespace" mentioned in the "Terminology section"?]

2. Encoding.
"UTF-8 directly supports an extensive range of printable objects that
are not accessible through ASCII." Not strictly true: acceptance of a
\uNNNN encoding would give you access to all of these using the ASCII
character set. Just drop this sentence. I suggest dropping the next
also. We haven't yet revisited my suggestion that the IUCr markup
conventions be disallowed in CIF 2 -  which, of course, isn't a
syntactic issue at this level of discourse.

3. Character set for data names.
States "A data name ... may be followed by any number of characters":
currently there's an implementation limit of 74 (plus the initial
underscore). I don't recall our discussing a proposal to change that,
specifically.

[Typo in the "Reasoning" paragraph - should be "they ARE excluded"]

4. Delimited strings. The descriptions of single- and double-quote
delimited strings use the term "newline character" - would be better
as "newline sequence" as used elsewhere.

5. List and Table data types. The phrase "In the context of being
outside of data tokens" is cumbersome, and I'm not sure I understand
how to parse it in an English grammatical sense. Would these
descriptions read better (but also be correct) if rephrased as:

A data value of type list is initiated by ... and terminated by ...
A data value of type table is initiated by ... and terminated by ...

Perhaps a simple example would also be useful, given that these
introduce the most disruptive syntax change, e.g.:

loop_
    _colour_name
    _colour_value_rgb
                         red    [1 0 0]
                         green  [0 1 0]

[In Change 8 there is a typo: "curly braces brackets" is redundant.]

Best wishes
Brian

On Wed, Dec 09, 2009 at 10:02:52AM +0000, Brian McMahon wrote:
> At Nick's request I have posted an updated version of the syntax
> change document which should clarify a few things in light of the
> most recent discussion. This is available at the URL
> 
> http://www.iucr.org/__data/assets/pdf_file/0017/27224/syntaxchangesproposed20091209.pdf
> 
> (Nick: perhaps an internal identifier - a date would do - would help to
> differentiate future versions if one prints them out and sets them side
> by side on one's desk?)
> 
> Cheers
> Brian
_______________________________________________
ddlm-group mailing list
ddlm-group@iucr.org
http://scripts.iucr.org/mailman/listinfo/ddlm-group

Reply to: [list | sender only]

Follow-Ups:

Re: [ddlm-group] Revised version of syntax change summary document (Nick Spadaccini)

Re: [ddlm-group] Revised version of syntax change summary document (Joe Krahn)

References:

[ddlm-group] Revised version of syntax change summary document (Brian McMahon)

Prev by Date: [ddlm-group] Revised version of syntax change summary document

Next by Date: [ddlm-group] Data-name character restrictions - one last time

Prev by thread: [ddlm-group] Revised version of syntax change summary document

Next by thread: Re: [ddlm-group] Revised version of syntax change summary document

Index(es):

Date

Thread

Discussion List Archives

Re: [ddlm-group] Revised version of syntax change summary document