Discussion List Archives

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: parser validation tools

  • Subject: Re: parser validation tools
  • From: "Richard G. Ball" <richard_ball@xxxxxxxxx>
  • Date: Thu, 11 May 2000 14:30:37 +0100 (BST)
On May 11,  9:59am, Brian McMahon wrote:
[snip]
>
> So the following is a valid STAR File:
>
>       data_is_this_a_valid_#STAR_File?
>            _the_answer_#_is   'yes'
>
> COMCIFS discussed some time ago whether restrictions should be imposed
> on non-alphanumeric characters in data names and datablock names within
> CIFs specifically. The conclusion was "no".
> (http://www.iucr.org/iucr-top/cif/comcifs/minutes/msg00017.html)
>
> Admittedly this does make life harder for regular-expression parsing,

Indeed :-)

I think I won't try and claim formal CIF-compliant status for my parser then
since it appears that it needs to be aware of too many special cases. As both
a user and writer I see no need for the extreme looseness of the syntax but I
am sure COMCIFS has discussed this ad nauseum :-)

How about a single-source COMCIFS-approved tool, along the lines of vcif,
that'd rewrite (very quickly so you don't pay a big penalty to use it) a
pathalogical STAR/CIF file into a more easily parsed/used style? It'd not do
any dictionary checking or data validation, that'd be delegated to the
downstream parser, it would just structure things better:

 - all datablocks would start on a newline with no whitespace
 - strip comments or put them on their own lines
 - each dataname of a loop starts a line with no embedded blank lines
 - the ; marking the end of a text block is the only character on the line
 - only one dataname/dataitem pair per line
 - others?

This tool would allow local parsers to be somewhat smaller and more efficient
and, when used in conjuction with the rewriter, even be fully CIF-compliant.

Richard

--
Dr R.G. Ball                    |  voice: 732-594-5341
Merck Research Laboratories     |  fax: 732-594-6793 or 6100
PO Box 2000, R50-105            |  email: Richard_Ball@merck.com
Rahway, NJ  07065   USA


Reply to: [list | sender only]
International Union of Crystallography

Scientific Union Member of the International Science Council (admitted 1947). Member of CODATA, the ISC Committee on Data. Partner with UNESCO, the United Nations Educational, Scientific and Cultural Organization in the International Year of Crystallography 2014.

International Science Council Scientific Freedom Policy

The IUCr observes the basic policy of non-discrimination and affirms the right and freedom of scientists to associate in international scientific activity without regard to such factors as ethnic origin, religion, citizenship, language, political stance, gender, sex or age, in accordance with the Statutes of the International Council for Science.