[cl-ppcre-devel] Re: Question: PCRE -> Thompson NFA Implementation

13 Feb 2007


      Hi,
first of all:
1. Please use the mailing list.
http://weitz.de/cl-ppcre/#mail
2. It's called CL-PPCRE and not PCRE.  PCRE is something else... :)
On Tue, 13 Feb 2007 12:23:36 -0800 (PST), Brent Fulgham bfulg@pacbell.net wrote:
...
Although your Lisp implementation of a Perl-compatible regular
expression engine already handily beats the original Perl version,
it could be modified to be even faster for expressions that do not
contain back-references.  See the following article that discusses
the 1960's-era algorithm used in Awk/Grep that discusses this
(http://swtch.com/~rsc/regexp/regexp1.html).
In my testing, CL-PCRE isn't quite faster than Perl, though it makes
a very creditable showing
(http://shootout.alioth.debian.org/debian/benchmark.php?test=regexdna%E2%8C%A...).
Tcl, which uses the "Thompson DFA" algorithm discussed in the paper
I referenced is nearly an order of magnitude faster on this
benchmark than Perl.
Please let me know if you have any interest exploring this.  I might
try to play with this and see if I can make any headway...
I'm aware of the advantages of DFAs over NFAs for "simple" regular
expressions, but I shied away from them until now because having two
engines in CL-PPCRE would make the code base even bigger and more
complicated than it already is.  (And having /only/ a DFA engine
wouldn't be enough, right?  I haven't read the article yet, but I'm
pretty sure you'd have to let go some of Perl's more advanced regex
features.)
Also, although I boast about CL-PPCRE's performance on its web site,
I'm not too concerned about its speed anymore.  It's fast enough for
what I'm doing with it.
Having said that, the idea of automatically switching to a fast DFA
engine if possible (I guess this is what you want to do) is kind of
tempting.  If you come up with something that's really a big
improvement and adheres to CL-PPCREs current coding and documentation
standards, I'd be willing to review and possibly integrate it.  Right
now, I'm to busy to help with that, though.
Cheers,
Edi.

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

[cl-ppcre-devel] Re: Question: PCRE -> Thompson NFA Implementation