Hi Daniel!
On Sat, 12 Jun 2004 15:54:11 +0200, Daniel Skarda 0rfelyus@ucw.cz wrote:
today I explored the possibilities of regular expressions implementations in various Debian Common Lisp packages. I really liked your library - thank you for writing cl-ppcre library.
You're welcome.
I also looked into elegant cl-lexer package built on top of cl-regex library. What I missed in cl-ppcre is a parse-tree node similar to cl-regex's 'success node, which defines return value of match/scan functions. With 'success node one can build `deflexer' macro on top of cl-ppcre as easy as on top of cl-regex package.
Is it possible to extend cl-ppcre with similar feature?
I might look into this for a future version but see below.
Footnote: In cl-lexer, deflexer macro
(deflexer foo ("regexp" some action) ; 0 ("another regexp" another action) ; 1 ...))
numbers each pair of regexp and action, then combine regexp parse trees into one big parse tree
`(alt (seq (regexp tree) (success 0)) (seq (another regexp tree) (success 1)) ...)
and use return value from match (ie regexp serial number) to select an action associated to matching regexp)
I've recently written demo code like this for another CL-PPCRE user who also wanted to build a lexer:
(in-package :cl-user)
(eval-when (:compile-toplevel :load-toplevel :execute) (defmacro with-unique-names ((&rest bindings) &body body) ;; see http://www.cliki.net/Common%20Lisp%20Utilities `(let ,(mapcar #'(lambda (binding) (check-type binding (or cons symbol)) (if (consp binding) (destructuring-bind (var x) binding (check-type var symbol) `(,var (gensym ,(etypecase x (symbol (symbol-name x)) (character (string x)) (string x))))) `(,binding (gensym ,(symbol-name binding))))) bindings) ,@body)))
(defmacro deflexer (name &body body) (with-unique-names (regex-table regex token sexpr-regex anchored-regex string start scanner next-pos) `(let ((,regex-table (loop for (,regex . ,token) in (list ,@(loop for (regex token) in body collect `(cons ,regex ,token))) for ,sexpr-regex = (etypecase ,regex (function (error "Compiled scanners are not allowed here")) (string (cl-ppcre::parse-string ,regex)) (list ,regex)) for ,anchored-regex = (cl-ppcre:create-scanner `(:sequence :modeless-start-anchor ,,sexpr-regex)) collect (cons ,anchored-regex ,token)))) (defun ,name (,string &key ((:start ,start) 0)) (loop for (,scanner . ,token) in ,regex-table for ,next-pos = (nth-value 1 (cl-ppcre:scan ,scanner ,string :start ,start)) when ,next-pos do (return (values ,token ,next-pos)))))))
You should be able to use it like this:
* (deflexer mylexer ("'.*'" 'string) ("#.*$" 'comment) ("[ \t\r\f]+" 'ws) (":=" 'assign) ("[[]" 'lbrack) ("[]]" 'rbrack) ("[,]" 'comma) ("[:]" 'colon) ("[;]" 'semicolon) ("[+-]?[0-9]*[.][0-9]+([eE][+-]?[0-9]+)?" 'float) ("[+-]?[0-9]+" 'integer) ("[a-zA-Z0-9_]+" 'id) ("." 'unknown)) ; Converted MYLEXER.
MYLEXER * (mylexer "a:=123.4?")
ID 1 * (mylexer "a:=123.4?" :start 1)
ASSIGN 3 * (mylexer "a:=123.4?" :start 3)
FLOAT 8 * (mylexer "a:=123.4?" :start 8)
UNKNOWN 9
This one only returns tokens but it should be trivial to change the macro such that the newly-defined lexer invokes functions instead. Wouldn't that already do what you want? I'm not sure what the approach you sketched above would buy you compared to this one.
Cheers, Edi.
PS: Please, if possible, continue this conversation on the mailing list. Thanks.