For an :ALTERNATION, is there an O(1) way to tell which alternate matched? e.g. for a regexp (a)|(b)|(c)|(d)|... I don't want the O(n) performance of scanning the list returned by SCAN or SCAN-TO-STRINGS. Instead, I'd just like a number saying which one matched. I understand that there can normally be nested groups, alternatives that are not groups, and other complications - so maybe a list/vector of groups that were matched would work as a general interface.
More background:
For a quick and dirty scanner (similar to lex) I thought I could use CL-PPCRE. The "rules" (in lex terminology) are just regular expressions, and so I combine them into a set of alternatives, e.g.
\s+ -> whitespace [0-9]+ -> number [a-zA-Z][a-zA-Z0-9]* -> identifier
=> "(\s+)|([0-9]+)|([a-zA-Z][a-zA-Z0-9]*)"
Actually, I use CL-PPCRE::PARSE-STRING to get a parse tree for each expression, change any :REGISTER tags to :GROUP tags, and combine them under an :ALTERNATION tag:
(defun combine-regexps (&rest regexps) "Combine several regexp strings into a CL-PPCRE parse tree of alternatives." ;; All registers are changed into groups, i.e. (x) -> (?:x) in regexp syntax. ;; That keeps the registers from messing up the scanner's expectation that ;; each register is one of the rules, and allows the list of rules to use ;; () instead of (?:) throughout for readability. (let ((registers (mapcar (lambda (r) `(:register ,(subst :group :register (cl-ppcre::parse-string r)))) regexps))) `(:sequence :start-anchor (:group (:alternation ,@registers)))))