[IUCr Home Page]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: Fine-tuning CIF dictionary regexes




Regarding these two specific REs from mm_cif:

> floating point numbers:
> 
> '-?(([0-9]+)[.]?|([0-9]*[.][0-9]+))([(][0-9]+[)])?([eE][+-]?[0-9]+)?'

This RE does not appear to agree with the CIF 1.1 formal grammar, which
puts the standard uncertainty after the exponent rather than before it.
(See the productions for <Numeric>, <Number>, and <Float>.)  Which is
right?

> symmetry operations
> '([1-9]|[1-9][0-9]|1[0-8][0-9]|19[0-2])(_[1-9][1-9][1-9])?'

I think it's overkill to use the pattern to so specifically restrict the
possible symop number.  Which numbers are actually valid in any
particular case (and to what specific operation they correspond) depends
on other data in the CIF.  Since there needs to be validation after the
match anyway, then, making the RE a bit looser would allow a processor
to recognize errors more specifically.  I might write the symop RE like
this: '[1-9][0-9]*(_[1-9]{3,3})?'.  (That also happens to remove the
alternation problem, though that was not my objective.)  That way, if I
accidentally write 244_555 instead of 24_555, a processor can tell me
"bad symop number" instead of "unrecognized token".


Regards,

John Bollinger

-- 

John C. Bollinger, Ph.D.
Indiana University
Molecular Structure Center

jobollin@indiana.edu 
_______________________________________________
cif-developers mailing list
cif-developers@iucr.org
http://scripts.iucr.org/mailman/listinfo/cif-developers


Reply to: [list | sender only]


Copyright © International Union of Crystallography

IUCr Webmaster