Hi,
first of all:
1. Please use the mailing list.
http://weitz.de/cl-ppcre/#mail
2. It's called CL-PPCRE and not PCRE. PCRE is something else... :)
On Tue, 13 Feb 2007 12:23:36 -0800 (PST), Brent Fulgham bfulg@pacbell.net wrote:
Although your Lisp implementation of a Perl-compatible regular expression engine already handily beats the original Perl version, it could be modified to be even faster for expressions that do not contain back-references. See the following article that discusses the 1960's-era algorithm used in Awk/Grep that discusses this (http://swtch.com/~rsc/regexp/regexp1.html).
In my testing, CL-PCRE isn't quite faster than Perl, though it makes a very creditable showing (http://shootout.alioth.debian.org/debian/benchmark.php?test=regexdna%E2%8C%A...). Tcl, which uses the "Thompson DFA" algorithm discussed in the paper I referenced is nearly an order of magnitude faster on this benchmark than Perl.
Please let me know if you have any interest exploring this. I might try to play with this and see if I can make any headway...
I'm aware of the advantages of DFAs over NFAs for "simple" regular expressions, but I shied away from them until now because having two engines in CL-PPCRE would make the code base even bigger and more complicated than it already is. (And having /only/ a DFA engine wouldn't be enough, right? I haven't read the article yet, but I'm pretty sure you'd have to let go some of Perl's more advanced regex features.)
Also, although I boast about CL-PPCRE's performance on its web site, I'm not too concerned about its speed anymore. It's fast enough for what I'm doing with it.
Having said that, the idea of automatically switching to a fast DFA engine if possible (I guess this is what you want to do) is kind of tempting. If you come up with something that's really a big improvement and adheres to CL-PPCREs current coding and documentation standards, I'd be willing to review and possibly integrate it. Right now, I'm to busy to help with that, though.
Cheers, Edi.