James Hester wrote: [...] > My question came up in connection with validating a CIF against a > dictionary: all I want is to be able to determine whether or > not a given string matches the regexp, so rather than > throwing a series of regexps at a string to get a token, I'm > throwing a string corresponding to a data item value at a > single regexp. I had hoped to be able to read the regexps > from the dictionary rather than hard code them. For your particular case, it seems that you ought to be able to read a regex from the dictionary, prepend a '^', append a '$', and go. Alternatively, some regex engines (e.g. Java's) allow you to exert control at the API level over whether or not the whole string, the beginning of the string, or just any old part of the string needs to match. > >> One suggestion is that these two regular expressions are > re-ordered > >> so that those alternatives in an alternation which are a subset of > >> other alternatives come later. This remains POSIX-compliant and > >> means many non-POSIX engines will find the longest match. > > > Are you sure you can order the rules such that it eliminates all > > instances of the problem you allude to? > > Not at all. However, such a reordering will increase the > number of regexp engines which will match the entire string. > POSIX correctness is maintained, so nothing is lost and > something (not necessarily all the > time) practical is gained in that Perl/Python/Tcl/? > programmers can automate type checking. To the extent it is feasible, I agree that it is useful to arrange the regexes so that they exhibit favorable behavior in the widest possible range of regex engines. Some standard needed to be chosen to unambiguously establish the meaning of the regexes, however, and it may not be possible to arrange all the regexes so that they have the same meaning to regex engines that do not conform to the chosen standard (POSIX). One could document how the regexes used in the dictionary are affected by the different regex semantics of some other engine(s) (e.g. Perl's), and that might be useful, but one cannot write a generic document of that nature. -- John C. Bollinger, Ph.D. Indiana University Molecular Structure Center jobollin@indiana.edu _______________________________________________ cif-developers mailing list cif-developers@iucr.org http://scripts.iucr.org/mailman/listinfo/cif-developers
Copyright © International Union of Crystallography
IUCr Webmaster