[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: Fine-tuning CIF dictionary regexes

Subject: RE: Fine-tuning CIF dictionary regexes
From: "Bollinger, John Clayton" <jobollin@xxxxxxxxxxx>
Date: Mon, 18 Apr 2005 10:09:24 -0500


Regarding these two specific REs from mm_cif:

> floating point numbers:
> 
> '-?(([0-9]+)[.]?|([0-9]*[.][0-9]+))([(][0-9]+[)])?([eE][+-]?[0-9]+)?'

This RE does not appear to agree with the CIF 1.1 formal grammar, which
puts the standard uncertainty after the exponent rather than before it.
(See the productions for <Numeric>, <Number>, and <Float>.)  Which is
right?

> symmetry operations
> '([1-9]|[1-9][0-9]|1[0-8][0-9]|19[0-2])(_[1-9][1-9][1-9])?'

I think it's overkill to use the pattern to so specifically restrict the
possible symop number.  Which numbers are actually valid in any
particular case (and to what specific operation they correspond) depends
on other data in the CIF.  Since there needs to be validation after the
match anyway, then, making the RE a bit looser would allow a processor
to recognize errors more specifically.  I might write the symop RE like
this: '[1-9][0-9]*(_[1-9]{3,3})?'.  (That also happens to remove the
alternation problem, though that was not my objective.)  That way, if I
accidentally write 244_555 instead of 24_555, a processor can tell me
"bad symop number" instead of "unrecognized token".


Regards,

John Bollinger

-- 

John C. Bollinger, Ph.D.
Indiana University
Molecular Structure Center

[email protected] 
_______________________________________________
cif-developers mailing list
[email protected]
http://scripts.iucr.org/mailman/listinfo/cif-developers

Reply to: [list | sender only]

Prev by Date: RE: Fine-tuning CIF dictionary regexes

Next by Date: Re: Fine-tuning CIF dictionary regexes

Prev by thread: RE: Fine-tuning CIF dictionary regexes

Next by thread: Re: Fine-tuning CIF dictionary regexes

Index(es):

Date

Thread

Discussion List Archives

RE: Fine-tuning CIF dictionary regexes