Discussion List Archives

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: parser validation tools

  • Subject: Re: parser validation tools
  • From: "Richard G. Ball" <richard_ball@xxxxxxxxx>
  • Date: Mon, 15 May 2000 14:22:36 +0100 (BST)
On May 12,  8:49pm, Bollinger, John Clayton wrote:
>
> I admit that Perl is not my best language, which probably explains
> why I don't understand how the kind of CIF reorganization that you
> describe is more than a minor win.


For an example: if a loop is structured so all the datanames are at the start
of the loop, with no intervening gotchas (dataname-looking stuff in a comment
for instance), then a single regexp will return all the datanames

@datanames = ($loopheader =~ /(_\S+)\s/g)  #where \S is non-whitespace and \s
                                           #is whitespace

a slightly more complicated loop over the loopbody can be used to extract all
the dataitems. The simpler the regexps needed the faster the whole process is
and if the amount of bactracking and forward tracking can be minimized by
anchoring the regexps to easily located substrings then the whole
dataname-dataitem extraction process goes very quickly indeed.


>
> > Now I am not a lex programmer so I don't know how difficult
> > it would be to
> > take a complete BNF descriptor for CIF and turn it into a
> > parser with a nice
> > set of C/F77/perl/? bindings. Anyone?
>
> I do happen to be a sometime lex and yacc programmer, and the answer
> is "not hard."  At least until we get to the "/Perl/?" bindings part --
> I don't know much about doing mixed-language programming with Perl.

Very easy (he says from a point of mild ingnorance <g>). Perl is designed to
be easy to call C routines (or be called from C or embedded in C) and so the
interface is well structured and well documented. I have looked at numerous
examples but I haven't had the need to write any of my own.

> Of course, much also depends on what you mean when you say "parser."
> If all you want is a software component to read the input stream and
> return tokens and token types then it's pretty easy -- I worked up a
> working prototype in an hour or two.


Ideally what I would ask of the lexical analysis routine would be for it to
return an array of all the datanames in the CIF tagged with a loop indicater
(0 for non-looped, otherwise a number for which loop that dataname was in), a
second array for all the non-looped dataitems, a third array containing all
the looped dataitems (with an id for which loop they came from so I know
which datanames go with these dataitems). No other processing, no other
validation. Given those three arrays my existing routines would handle all
the other processing needed. Is that something your prototype could do?


>
> What I really need is an authoritative syntactic definition.  I got
> a copy of Hall's JCICS article on STAR, but the BNF representation
> is a bit at odds with the rest of the text, especially with regard to
> which characters are allowed, which are whitespace, and which are not
> recognized.  Hopefully the upcoming BNF representation that Brian
> mentioned will be better.

Brian, do you know how close Nick is to having the revised BNF ready?

Regards,
Richard


--
Dr R.G. Ball                    |  voice: 732-594-5341
Merck Research Laboratories     |  fax: 732-594-6793 or 6100
PO Box 2000, R50-105            |  email: Richard_Ball@merck.com
Rahway, NJ  07065   USA


Reply to: [list | sender only]
International Union of Crystallography

Scientific Union Member of the International Science Council (admitted 1947). Member of CODATA, the ISC Committee on Data. Partner with UNESCO, the United Nations Educational, Scientific and Cultural Organization in the International Year of Crystallography 2014.

International Science Council Scientific Freedom Policy

The IUCr observes the basic policy of non-discrimination and affirms the right and freedom of scientists to associate in international scientific activity without regard to such factors as ethnic origin, religion, citizenship, language, political stance, gender, sex or age, in accordance with the Statutes of the International Council for Science.