In order to fix issue of montezuma, http://code.google.com/p/montezuma/issues/detail?id=3, I suppose I found a bug of cl-ppcre.
CL-USER> (cl-ppcre:scan (cl-ppcre:create-scanner "(\w+)*\@\w+") "______________________________________" :start 0) ;; Evaluation aborted.
It hangs when the number of underscore hit a critical value. I speculate that '\w' includes underscore in regular expression would account for this bug. and replace with other character of '\w' also has this problem.
CL-USER> (cl-ppcre:scan (cl-ppcre:create-scanner "(a\w+)*\@\w+") "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa" :start 0) ;; Evaluation aborted.
but if I eliminate the last \w, it is OK. CL-USER> (cl-ppcre:scan (cl-ppcre:create-scanner "(_\w+)*\@") "_______________________________________" :start 0) NIL
I also check it in perl, Maybe perl is more efficient in regular expression operation, I raise the number of underscores, but it is OK.
$str = "john._______________________________________ __________________________________";
if ($str =~ m/(_*\w+)*@\w+/) { print "ok\n"; }
Please check it and give your comment.
片云天共远永夜月同孤
Hi,
On Thu, Jun 25, 2009 at 4:31 AM, Xiangjun Wunetawater@gmail.com wrote:
"(\w+)*\@\w+"
That's the type of regular expression that typically leads to a combinatorial explosion in regex engines unless they use specific "tricks" to deal with this. Recent versions of Perl are pretty clever in this regard (they look for "floating" substrings) while CL-PPCRE isn't, but - frankly - I don't really see the point of this. I think this is mainly so that the regex engine looks good in benchmarks. I definitely wouldn't call this a bug.
The question is - what do you want to achieve with this regular expression? Can't you write it in a simpler way?
Cheers, Edi.
cl-ppcre-devel@common-lisp.net