Discussion List Archives

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: A formal specification for CIF version 1.1 (Draft)

  • Subject: RE: A formal specification for CIF version 1.1 (Draft)
  • From: "Bollinger, John Clayton" <jobollin@xxxxxxxxxxx>
  • Date: Wed, 10 Jul 2002 19:15:55 +0100 (BST)

Brian McMahon [mailto:bm@iucr.org] wrote:

[...]

> One point should be made carefully: this specification is for 
> an extended
> version of CIF, not yet formally adopted by COMCIFS. The only 
> significant
> extensions to the existing standard are: restriction of the 
> line-length
> constraint from 80 to 2048 characters, and the introduction 
> of matching
> square brackets as additional delimiters for string values 
> containing white
> space.

I think there are quite a few other differences, and no small number
of them incompatibilities.  Many of the incompatibilities are corner
cases, but there are some more important ones.

Here are the differences I detected on my read through the syntax
description:

How about that the formally reserved but unused stop_ and save_
keywords are now used in CIF 1.1, albeit the latter only in
dictionaries.  And speaking of dictionaries, that they are now
written in CIF rather than in their own STAR dialect.  (Well,
really they're still a slightly different dialect in that only
they can have save frames, but the draft spec says they are CIFs.)

And what about data values beginning with a substring matching a
reserved word?  (Paragraph 10)  In CIF 1.0 it was reasonably clear
that something like this applied to data_ because such a construct
had its own semantics defined, but it was not clear that this was
a general restriction applied to all the reserved words.  Did I
just miss it somewhere, or is this one of those points of 1.0 that
is being clarified via the 1.1 spec?  If the latter, then let me
throw in that I don't like it.  I think that's because it is a
departure from the normal sense of the term "reserved word."  In any
case, it makes a parser that incremental bit trickier to write.

In paragraph 17: "The end-of-line associated with the closing semicolon
does not form part of the data value."  Is this another
change/clarification, or another published detail that had previously
escaped me?  I had thought that that last eol was part of the value.

In paragraphs 22 and 41: Exclusion of ASCII characters 11 and 12
decimal is a departure from and incompatibility with CIF 1.0.  Not
that I particularly object -- handling these appropriately is a pain.

In paragraph 29: the data name length restriction to 75 characters is
another incompatibility with CIF 1.0 (as revised) where the data name
length was restricted only indirectly by the line length restriction.
Thus in CIF 1.0 data names could be 80 characters long.

Paragraph 42 makes it optional to support line termination semantics
different from the host OS'.  That would be another departure
from CIF 1.0, I think, and, in my opinion, an all-around bad idea if
CIFs are supposed to be portable.  As far as I can tell, the pseudo-
production presented for <eol> is in fact the required implementation
for a fully-conformant CIF 1.0 parser.

Paragraph 43: In combination with the formal grammar presented earlier,
the definitions of the <eol> and <noteol> non-terminals in fact seems
to _preclude_ CIF parsers from handling non-native line termination
semantics.  Even if that's not a departure from CIF 1.0, it's still
a bad idea.

According to paragraph 60, a file containing only whitespace and
comments but no data block is not a valid 1.1 CIF.  That is another
departure from CIF 1.0 if it is really the intent.  One of the ciftest
trip files actually tests this case, in fact.

Paragraph 61: this is another departure from CIF 1.0, which did allow
data blocks without data items.  Another of the ciftest trip files
tests this case.  (vcif evidently produces a warning, which seems
reasonable, but this is not an error.)

Regards,

John Bollinger
jobollin@indiana.edu

Reply to: [list | sender only]
International Union of Crystallography

Scientific Union Member of the International Science Council (admitted 1947). Member of CODATA, the ISC Committee on Data. Partner with UNESCO, the United Nations Educational, Scientific and Cultural Organization in the International Year of Crystallography 2014.

International Science Council Scientific Freedom Policy

The IUCr observes the basic policy of non-discrimination and affirms the right and freedom of scientists to associate in international scientific activity without regard to such factors as ethnic origin, religion, citizenship, language, political stance, gender, sex or age, in accordance with the Statutes of the International Council for Science.