Sorry for the delay, I had moved this email into the wrong IMAP folder... :(
On Sun, 20 Jun 2004 12:07:08 +0200, Daniel Skarda 0rfelyus@ucw.cz wrote:
After more regexp experiments I found that the main difference between Perl and GNU Regexp is not the syntax of regexps (as I naively thought), but the definition of "the best match" (especially for `|' alt node).
One can agree with Perl man pages, that Perl definition could be better (and more comprehensible) for handwritten regexps. Is "first match" strategy also better for writing lexers? I doubt.
Consider languages where some word (token) can be prefix of another word. This is not unusual: remember that in Lisp `12345' is number and `12345a' is symbol :)
While writing "first match" lexer (and your deflexer macro is "first match" lexer) one has to be careful with rules ordering and think about possible prefix ambiguity:
Yes. But if you prefer not to be careful you'll definitely sacrifice performance...
My conclusion is, that 'success node is meaningful only for "longest match" regexps engines, because one can expect, that such engine could do better than match all 'alt nodes in sequence and return the longest match.
My new question is: how hard it would be to add :longest-match option to create-scanner?
Pretty hard. This is not going to be done by me. However, if you manage to add this yourself without breaking the rest of CL-PPCRE (and without making it slower) I'll gladly accept your patches.
ps: I am not subscribed to cl-ppcre-devel mailing list. Please "Cc:" me your replies.
Subscribing to the list is easy and the list is low-volume. If you'd like to continue this discussion please either subscribe to the list or use it via nntp:
http://common-lisp.net/nntp.shtml
Cheers, Edi.
cl-ppcre-devel@common-lisp.net