Discussion List Archives

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: parser validation tools

  • Subject: RE: parser validation tools
  • From: "Bollinger, John Clayton" <jobollin@xxxxxxxxxxx>
  • Date: Fri, 12 May 2000 20:49:25 +0100 (BST)

Richard Ball wrote:
> On May 11,  4:05pm, Bollinger, John Clayton wrote:
> > The common theme among all of those items seems to be that they are
> > aimed at easing parsing via _record-oriented_ file handling, ala
> > Fortran.  They are no special boon when using stream-oriented file
> > handling, ala C, Java, and others.
> 
> Yes and no. It would help the record-oriented readers but it 
> also would
> assist the grabbing tokens from a stream. My parser is in 
> perl and it almost
> handles all the cases in Brian's ciftest5 file but I could 
> simplify the
> matching expressions and other token/data extraction code if 
> I could make
> some assumptions about the stream's structure.

I admit that Perl is not my best language, which probably explains
why I don't understand how the kind of CIF reorganization that you
describe is more than a minor win.
 
> Now I am not a lex programmer so I don't know how difficult 
> it would be to
> take a complete BNF descriptor for CIF and turn it into a 
> parser with a nice
> set of C/F77/perl/? bindings. Anyone?

I do happen to be a sometime lex and yacc programmer, and the answer
is "not hard."  At least until we get to the "/Perl/?" bindings part --
I don't know much about doing mixed-language programming with Perl.
Of course, much also depends on what you mean when you say "parser."
If all you want is a software component to read the input stream and
return tokens and token types then it's pretty easy -- I worked up a
working prototype in an hour or two.  The more you want it to do for
you, of course, the more complicated it gets, but I think it remains
pretty manageable.  For what it's worth, the lex source is only 50
lines long, and the definitions and rules are less than half of
that.

What I really need is an authoritative syntactic definition.  I got
a copy of Hall's JCICS article on STAR, but the BNF representation
is a bit at odds with the rest of the text, especially with regard to
which characters are allowed, which are whitespace, and which are not
recognized.  Hopefully the upcoming BNF representation that Brian
mentioned will be better.


Cheers,

John Bollinger
Indiana University
Molecular Structure Center

jobollin@indiana.edu

Reply to: [list | sender only]
International Union of Crystallography

Scientific Union Member of the International Science Council (admitted 1947). Member of CODATA, the ISC Committee on Data. Partner with UNESCO, the United Nations Educational, Scientific and Cultural Organization in the International Year of Crystallography 2014.

International Science Council Scientific Freedom Policy

The IUCr observes the basic policy of non-discrimination and affirms the right and freedom of scientists to associate in international scientific activity without regard to such factors as ethnic origin, religion, citizenship, language, political stance, gender, sex or age, in accordance with the Statutes of the International Council for Science.