Discussion List Archives

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: Backus-Naur descriptions for STAR and CIF

  • Subject: RE: Backus-Naur descriptions for STAR and CIF
  • From: "Bollinger, John Clayton" <jobollin@xxxxxxxxxxx>
  • Date: Wed, 17 May 2000 20:19:18 +0100 (BST)

Richard Ball wrote:
> Isn't the 80 chars/line something that is only relevant for 
> CIF writing
>  (including dictionary creating) programs? The BNF spec. for 
> the parser
> doesn't need to worry about it since the parsing of the 
> incoming file can be
> record length independent. Once the stream has been tokenized 
> then such
> additional restrictions as length of datanames or datavalues 
> can be applied
> and error condtions raised. Or am I missing something?

You are missing two things; or actually, two sides of the same thing.

1) Part of the purpose of BNF is to provide an authoritative
   description of exactly which constructs are valid "utterances"
   of the language described.  It is not just a description of how to
   write a parser for that language that provides the expected
   answers in a particular software context.  

   Why might we want such a thing?  Exactly for the reason that Nick
   provided: English prose, though highly expressive, is open to
   misinterpretation.  Forgetting completely about software for a
   moment, we humans have to be able to agree about exactly what is and
   what is not a valid CIF.  If the full CIF specification could be
   expressed in BNF then there would be no room for doubt on any
   question of the validity of any CIF.  Once the humans agree, the
   BNF has the secondary benefit of being useful as a guide for
   writing parsing software.

2) Because a putative CIF with one or more lines longer than 80
   characters is not valid, a correct parser must be able to identify
   it as erroneous.  If the line-length restriction cannot be
   formulated in BNF then no correct CIF parser can be written from
   any BNF-only description.

It will be clear now that I have come around in my thinking about
whether a full BNF description of CIF would be desirable.  I definitely
think it would be.  For instance, I suspect that there may be some
disagreement on the correct answers to these questions: if a line in a
putative CIF contains a space character at position 81, then can it
be a valid CIF?  The accepted interpretation of the 80 char/line rule
seems to allow line termination characters past position 80; are line
termination characters special in that regard, or does the same apply
to all whitespace characters?  If the former, then does that mean that
if a line in my file ends with a [CR][LF] pair, with the [CR] at
position 81, then that file is a valid CIF on some platforms but not
on others?  A full BNF description would answer these questions.

Or how about an example where the existing BNF provides an answer?  If
Nick's latest BNF for CIF were accepted as authoritative, then I could
say with absolute certainty that vcif handles Brian's ciftest10
test file incorrectly by interpreting the ^Z at the end as a data
value in the preceding loop.  It is absolutely clear from that
BNF that a ^Z character cannot be part (or all) of a data value.  I
would argue for that being the correct interpretation regardless, but
clearly Brian thought otherwise -- he wrote comments to that effect
into the file.

As a side note, when it comes to checking line length restrictions
after tokenizing the stream, you have to be exceedingly careful to get
it right.  You must pay attention to the positions of newlines, of
course, but you must also count all the whitespace (in a way consistent
with the correct answers to the above questions).  It does not suffice
to just look at lengths of data name and value strings, because it is
without doubt the case that a line consisting of the tag '_t', 78
spaces, and the value '?' (for example) is not valid in a CIF.  You
also have to pay special attention to lines in a text block, which are
no exception to the 80 char/line rule.

Gee, this got pretty lengthy.  Sorry about that.


Regards,

John

Reply to: [list | sender only]
International Union of Crystallography

Scientific Union Member of the International Science Council (admitted 1947). Member of CODATA, the ISC Committee on Data. Partner with UNESCO, the United Nations Educational, Scientific and Cultural Organization in the International Year of Crystallography 2014.

International Science Council Scientific Freedom Policy

The IUCr observes the basic policy of non-discrimination and affirms the right and freedom of scientists to associate in international scientific activity without regard to such factors as ethnic origin, religion, citizenship, language, political stance, gender, sex or age, in accordance with the Statutes of the International Council for Science.