Discussion List Archives

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: parser validation tools

  • Subject: Re: parser validation tools
  • From: Brian McMahon <bm@xxxxxxxx>
  • Date: Thu, 11 May 2000 09:59:43 +0100 (BST)
> Are datanames and datablock names really allowed to have the comment
> indicator (#) as a valid character in the name (as indicated in the ciftest5
> file)?

Yup. The published STAR BNF [Hall, S. R. & Spadaccini, N. (1994), J. Chem.
Inf. Comput. Sci. 34, 505-508] has the following relevant entries:

     <data_heading>   ::=  data_<non_blank_char>+
     <data_name>      ::=  _<non_blank_char>+
     <non_blank_char> ::=  ! shriek character -> ~ tilde character
                                                     (ASCII 33 - 126)

So the following is a valid STAR File:

           _the_answer_#_is   'yes'

COMCIFS discussed some time ago whether restrictions should be imposed
on non-alphanumeric characters in data names and datablock names within
CIFs specifically. The conclusion was "no".

Admittedly this does make life harder for regular-expression parsing,
which is a useful tool in shell, perl, tcl, python and similar languages.
For exaample, if you're matching regexps within a line of text in order to
identify and discard a comment, you can't just scan for 

You need at least white space before the hash mark:
                   / #.*/

But in fact you need also to check that the hash isn't a legitimate
character within a text string, e.g.  'this is a # legal data value'
I offer as a challenge to anyone who is interested the problem of
supplying a regexp that will definitely match a comment on a line.


Reply to: [list | sender only]
International Union of Crystallography

Scientific Union Member of the International Science Council (admitted 1947). Member of CODATA, the ISC Committee on Data. Partner with UNESCO, the United Nations Educational, Scientific and Cultural Organization in the International Year of Crystallography 2014.

International Science Council Scientific Freedom Policy

The IUCr observes the basic policy of non-discrimination and affirms the right and freedom of scientists to associate in international scientific activity without regard to such factors as ethnic origin, religion, citizenship, language, political stance, gender, sex or age, in accordance with the Statutes of the International Council for Science.