[IUCr Home Page]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: Fine-tuning CIF dictionary regexes



 
James Hester wrote:

> The point I want to discuss boils down to: should the regular 
> expressions in the CIF dictionary be find-tuned to be 
> compatible not only with POSIX-compliant regular expression engines?

It seems to me that it is desirable for the REs to be as general as
possible.  POSIX does have the advantage of being a formal (series of)
standard(s).  Perl-compatible REs, on the other-hand, have the advantage
of widespread use, support, and acceptance, to the extent that I'd have
to call them a defacto standard.  POSIX compliance is attractive from
the formal standards point of view, but Perl compatibility is more
likely to be useful to software developers.  If a particular RE in the
dictionary must choose only one, then the Perl direction is the one I
think I favor.

> The following two constructs from mm_cif, although POSIX 
> compliant, will not correctly match in a Perl or Python or 
> Tcl regular expression (and any other NFA engine)
> 
> floating point numbers:
> 
> '-?(([0-9]+)[.]?|([0-9]*[.][0-9]+))([(][0-9]+[)])?([eE][+-]?[0-9]+)?'
> 
> symmetry operations
> '([1-9]|[1-9][0-9]|1[0-8][0-9]|19[0-2])(_[1-9][1-9][1-9])?'
> 
> The problem is that the non-POSIX engines will go through the 
> alternations (separated by |) in the above expressions from 
> left to right, returning the first match, and as the second 
> part is optional, there is no requirement to match it.  In 
> contrast, a POSIX engine must return the longest match.  So 
> e.g. if Python is fed the number 78.456(22), "78." will be 
> matched by the floating point expression, as this satisfies 
> the first part of the alternation, and everything else in the 
> regular expression is optional.

Isn't it implied that the provided RE's must match an entire input
token?  As far as I can tell, that makes the (particular) distinction
between RE semantics moot.


Regards,

John Bollinger

-- 

John C. Bollinger, Ph.D.
Indiana University
Molecular Structure Center

jobollin@indiana.edu 
_______________________________________________
cif-developers mailing list
cif-developers@iucr.org
http://scripts.iucr.org/mailman/listinfo/cif-developers


Reply to: [list | sender only]


Copyright © International Union of Crystallography

IUCr Webmaster