Discussion List Archives

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Problems with CIF BNF

  • To: "Discussion list of the IUCr Committee for the Maintenance of the CIFStandard (COMCIFS)" <comcifs@iucr.org>
  • Subject: Problems with CIF BNF
  • From: Joe Krahn <krahn@niehs.nih.gov>
  • Date: Mon, 12 Mar 2007 13:44:42 -0400
Some parts of CIF are vague. I hoped that the BNF syntax would be a
precise syntax specification, but it has problems. It is central to
properly defining the CIF format, and should therefore be very accurate.

First, there are some plain syntax errors, like unbalanced braces in the
production of <Float>, and an empty token in the TokenizedComments

There are also a few hacks like <noteol>, and the lack of rules for the
content of quoted strings. I think it is also a hack for a production
unit to be defined for two elements, like "<eol><UnquotedString>".

Does EOF count as whitespace? Normally, a text file ends with an <eol>
on the last line, so it is not a problem. With Fortran, you may not be
able to distinguish between them, so it seems that EOF probably should
count as a whitespace token.

There are also places where the grammar could be simplified, such as:

  { {'e' | 'E' } | {'e' | 'E' } { '+' | '- ' } } <UnsignedInteger>

written as:
  {'e' | 'E' } { '+' | '-' }?  <UnsignedInteger>

Also note the error in the first form copied from the web page: the
minus sign has a space included.

Should the logical-OR symbol always be contained within braces? This
appears to be inconsistent, but maybe the rule is to require braces when
the members include a quoted character element.

I will try to edit my own version of the BNF to produce what I think it
is supposed to mean. Answers to some of the above questions will be
helpful in getting it right.

Joe Krahn

Reply to: [list | sender only]