Discussion List Archives

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [ddlm-group] Space as a list item separator

This was an attempt to find a description that encompassed what we've already come up with
but did extend the character set of non-delimited strings as far as possible.
Without going into the details of my conversation, one question concerned the necessity
of requiring that e.g. atom labals such as O1' now have to be wrapped in e.g. "O1'",
given that this is not uncommon. I know these are matters we have all discussed and
agreed upon, which is why I hesitated to suggest that there might be another way of describing the syntax
that could reduce the changes required and still realize the goals of CIF2.

To this end I had been about to suggest that my model could be altered to
define a special type of delimiter that also acts as a control character, and can be nested,
i.e. the list tokens, which would bring the description closer to CIF2.
But I have a few more messages to get through yet before deciding whether to take this any further.

From: James Hester <jamesrhester@gmail.com>
To: Group finalising DDLm and associated dictionaries <ddlm-group@iucr.org>
Sent: Monday, 30 November, 2009 1:39:59
Subject: Re: [ddlm-group] Space as a list item separator

OK: so could you take us through the advantages of what you are suggesting compared to what we have come up with?  And perhaps why 'the man who writes the cheques' has nudged you in this direction?

I would make the following point: if we add to your list the condition that:

"strings which have no meaning beyond their significance as tokens are not required to be separated by whitespace from the preceding or succeeding strings"

we remove the requirement for whitespace around brackets, commas and 'loop_'.  Of course, insofar as strings neighbouring these will require whitespace around them, this does not spoil our grammar at all.  (Note that in lexing/parsing terms, the condition that "strings are only significant as tokens" is supposed to be equivalent to discarding the 'value' assigned to a token when it is returned by the lexing stage.) 

The insight I'd draw out of this for our current discussion is that, by taking your manifesto plus my above condition, we have a general statement of what we would like the surface syntax of a CIF file to look like.  The only difference from our current discussion is that we have restricted the charactersets of the non-delimited string and dataname tag more than strictly necessary - is there some part of that characterset discussion that you'd like to reopen...in a different thread?

On Sun, Nov 29, 2009 at 9:29 PM, SIMON WESTRIP <simonwestrip@btinternet.com> wrote:
Yes that summarizes the differences. Unfortunately, the single-byte non-delimited strings have to be separated by
white space in this approach, which is perhaps counter-intuitive and mght have some legacy issues?

From: James Hester <jamesrhester@gmail.com>

To: Group finalising DDLm and associated dictionaries <ddlm-group@iucr.org>
Sent: Sunday, 29 November, 2009 3:45:18

Subject: Re: [ddlm-group] Space as a list item separator

Hi Simon: I'm trying to read between the lines here as to how the syntax we have been discussing diverges from what you have described, and have come up with the following list:

1. Presumably the []{} characters must be surrounded by whitespace in your version
2. We have restricted the character sets of the non-delimited strings and tags more than strictly necessary.
3. Comma might be included in the single-byte non-delimited string list

Are there any other differences that you would identify?

On Sat, Nov 28, 2009 at 10:58 PM, SIMON WESTRIP <simonwestrip@btinternet.com> wrote:
Dear all

I was chatting with the man who 'writes the cheques' yesterday about some of the
changes he might expect with CIF2, and based on this I feel I ought to at least have
a go at exploring a 'minimally disruptive' approach, so at the risk of being shouted at,
here goes at a slightly different way of looking at CIF:

CIF contains a list of strings separated by whitespace.

A string can be nondelimited or delimited.

Nondelimited strings have a restricted character set (minimally whitespace is excluded)

A nondelimited string cannot start with any of the delimiters (obviously)

Nondelimited strings can have special meaning governing what follows them:

    reserved words, e.g. loop_

    tags, e.g. data_ , _foo

    single-byte nondelimited strings, e.g. [ ] { } :

All other strings are treated as raw data values

There, least I can say I tried :-)



T +61 (02) 9717 9907
F +61 (02) 9717 3145
M +61 (04) 0249 4148
ddlm-group mailing list

Reply to: [list | sender only]
International Union of Crystallography

Scientific Union Member of the International Science Council (admitted 1947). Member of CODATA, the ISC Committee on Data. Partner with UNESCO, the United Nations Educational, Scientific and Cultural Organization in the International Year of Crystallography 2014.

International Science Council Scientific Freedom Policy

The IUCr observes the basic policy of non-discrimination and affirms the right and freedom of scientists to associate in international scientific activity without regard to such factors as ethnic origin, religion, citizenship, language, political stance, gender, sex or age, in accordance with the Statutes of the International Council for Science.