On Mon, 12 Feb 2007 16:18:43 -0800, Harold Lee harold@hotelling.net wrote:
For an :ALTERNATION, is there an O(1) way to tell which alternate matched? e.g. for a regexp (a)|(b)|(c)|(d)|... I don't want the O(n) performance of scanning the list returned by SCAN or SCAN-TO-STRINGS. Instead, I'd just like a number saying which one matched.
You could use filters:
http://weitz.de/cl-ppcre/#filters
However, are you really sure that the O(n) operation of looping through the registers is causing performance problems? Have you profiled the code? This looks like premature optimization to me.
I'm pretty sure that whatever you'll do to "improve" this (filters or changing CL-PPCRE internally to return more information for example) you'll end up with something even slower.
For a quick and dirty scanner (similar to lex) I thought I could use CL-PPCRE. The "rules" (in lex terminology) are just regular expressions, and so I combine them into a set of alternatives, e.g.
\s+ -> whitespace [0-9]+ -> number [a-zA-Z][a-zA-Z0-9]* -> identifier
=> "(\s+)|([0-9]+)|([a-zA-Z][a-zA-Z0-9]*)"
Actually, I use CL-PPCRE::PARSE-STRING to get a parse tree for each expression, change any :REGISTER tags to :GROUP tags, and combine them under an :ALTERNATION tag:
That doesn't feel right. I wouldn't create regex strings first just to parse them into s-expressions afterwards. Why don't you start with s-expressions right away?
Cheers, Edi.