Re: parser validation tools

  • Subject: Re: parser validation tools
  • From: "Richard G. Ball" <richard_ball@xxxxxxxxx>
  • Date: Thu, 11 May 2000 14:30:37 +0100 (BST)
On May 11,  9:59am, Brian McMahon wrote:
> So the following is a valid STAR File:
>       data_is_this_a_valid_#STAR_File?
>            _the_answer_#_is   'yes'
> COMCIFS discussed some time ago whether restrictions should be imposed
> on non-alphanumeric characters in data names and datablock names within
> CIFs specifically. The conclusion was "no".
> (http://www.iucr.org/iucr-top/cif/comcifs/minutes/msg00017.html)
> Admittedly this does make life harder for regular-expression parsing,

Indeed :-)

I think I won't try and claim formal CIF-compliant status for my parser then
since it appears that it needs to be aware of too many special cases. As both
a user and writer I see no need for the extreme looseness of the syntax but I
am sure COMCIFS has discussed this ad nauseum :-)

How about a single-source COMCIFS-approved tool, along the lines of vcif,
that'd rewrite (very quickly so you don't pay a big penalty to use it) a
pathalogical STAR/CIF file into a more easily parsed/used style? It'd not do
any dictionary checking or data validation, that'd be delegated to the
downstream parser, it would just structure things better:

 - all datablocks would start on a newline with no whitespace
 - strip comments or put them on their own lines
 - each dataname of a loop starts a line with no embedded blank lines
 - the ; marking the end of a text block is the only character on the line
 - only one dataname/dataitem pair per line
 - others?

This tool would allow local parsers to be somewhat smaller and more efficient
and, when used in conjuction with the rewriter, even be fully CIF-compliant.


Dr R.G. Ball                    |  voice: 732-594-5341
Merck Research Laboratories     |  fax: 732-594-6793 or 6100
PO Box 2000, R50-105            |  email: Richard_Ball@merck.com
Rahway, NJ  07065   USA

