[cl-ppcre-devel] Re: cl-ppcre - cl-ppcre-devel@common-lisp.net - mailman3.common-lisp.net

7 Jul 2004


      Sorry for the delay, I had moved this email into the wrong IMAP
folder... :(
On Sun, 20 Jun 2004 12:07:08 +0200, Daniel Skarda 0rfelyus@ucw.cz wrote:
...
After more regexp experiments I found that the main difference
between Perl and GNU Regexp is not the syntax of regexps (as I
naively thought), but the definition of "the best match" (especially
for `|' alt node).
One can agree with Perl man pages, that Perl definition could be
better (and more comprehensible) for handwritten regexps. Is "first
match" strategy also better for writing lexers? I doubt.
Consider languages where some word (token) can be prefix of
another word. This is not unusual: remember that in Lisp `12345' is
number and `12345a' is symbol :)
While writing "first match" lexer (and your deflexer macro is
"first match" lexer) one has to be careful with rules ordering and
think about possible prefix ambiguity:
Yes. But if you prefer not to be careful you'll definitely sacrifice
performance...
...
My conclusion is, that 'success node is meaningful only for
"longest match" regexps engines, because one can expect, that such
engine could do better than match all 'alt nodes in sequence and
return the longest match.
My new question is: how hard it would be to add :longest-match
option to create-scanner?
Pretty hard. This is not going to be done by me. However, if you
manage to add this yourself without breaking the rest of CL-PPCRE (and
without making it slower) I'll gladly accept your patches.
...
ps: I am not subscribed to cl-ppcre-devel mailing list. Please "Cc:"
me your replies.
Subscribing to the list is easy and the list is low-volume. If you'd
like to continue this discussion please either subscribe to the list
or use it via nntp:
http://common-lisp.net/nntp.shtml
Cheers,
Edi.