Working with cl-ppcre, I have found that I increasingly use the s-expr representation rather than the traditional string representation with its infix operators. To make it easier to work with the s-expressions, I've developed 'defpatt' - a package which implements a notation for defininig and referring to regular expressions in terms of cl-ppcre s-expressions. I thought it might interest the readers of this list.
The package can be downloaded from http://www.harbo.net/downloads/defpatt-0.2.tar.gz .
Suggestions, comments, improvements are welcome.
best regards,
-Klaus.
------ defpatt examples (from defpatt.lisp): ------
#| EXAMPLES
; If you want to try the examples, be sure to evaluate the ; expression below first - otherwise the other ones won't work.
(defpatt:defpatt-set-default-macro-char)
; Defines #\¤ as macro character
=> T
(cl-ppcre:all-matches-as-strings ¤(alt "a" "c" "f") "abcdefghi")
; Note: Equivalent to "a|c|f"
=> ("a" "c" "f")
; That's all very well, but doesn't buy us very much. ; However `defpatt' (as per cl-ppcre's sexpr-based ; representation of REs) enables us to both document ; the patterns much better by letting us insert comments ; into REs...
(cl-ppcre:scan-to-strings
¤(seq digit+ ; used space ws+ digit+ ; available space ws+ digit+ ; remaining space ) "123 4567 7887")
; Note: `ws+' and `digit+' are defined above, in `defpatt-initialize'.
=> "123 4567 7887", #()
; ...as well as lets us capture data in a structured fashion...
(cl-ppcre:register-groups-bind (used avail remain)
(¤(seq (reg digit+) ; used space ws+ (reg digit+) ; available space ws+ (reg digit+) ; remaining space ) "123 4567 7887") (mapcar #'parse-integer (list used avail remain)))
; Note: `(reg ...)' creates a register binding
=> (123 4567 7887)
; ...but also lets us _FIRST_ define and document the abstraction...
(defpatt match-nums ()
¤(seq (reg digit+) ; used space ws+ (reg digit+) ; available space ws+ (reg digit+) ; remaining space )) => MATCH-NUMS
; ...and _THEN_ use it...
(cl-ppcre:register-groups-bind (used avail remain)
(¤match-nums "123 4567 7887") (mapcar #'parse-integer (list used avail remain)))
=> (123 4567 7887)
; which is a lot more easily understood, as I am sure you will ; agree.
(cl-ppcre:scan-to-strings ¤(upto "efg") "abcdefghi")
=> "abcd", #()
(cl-ppcre:scan-to-strings ¤(upto+ "efg") "abcdefghi")
=> "abcdefg", #()
; To see the raw cl-ppcre expansion of a `defpatt' expression, ; simply enter it:
¤(seq (reg digit+) ; used space
ws+ (reg digit+) ; available space ws+ (reg digit+) ; remaining space ) => (:SEQUENCE (:REGISTER (:GREEDY-REPETITION 1 NIL (:CHAR-CLASS (:RANGE #\0 #\9)))) (:GREEDY-REPETITION 1 NIL :WHITESPACE-CHAR-CLASS) (:REGISTER (:GREEDY-REPETITION 1 NIL (:CHAR-CLASS (:RANGE #\0 #\9)))) (:GREEDY-REPETITION 1 NIL :WHITESPACE-CHAR-CLASS) (:REGISTER (:GREEDY-REPETITION 1 NIL (:CHAR-CLASS (:RANGE #\0 #\9)))))
; To see _HOW_ `defpatt' expands an expression use `macroexpand':
(macroexpand-1 '¤(seq (reg digit+) ; used space
ws+ (reg digit+) ; available space ws+ (reg digit+) ; remaining space )) => (LABELS ((++ (PATT) (REP PATT 1 NIL)) (UPTO (PATT) `(:SEQUENCE (:FLAGS :SINGLE-LINE-MODE-P) (:GREEDY-REPETITION 0 NIL (:SEQUENCE :EVERYTHING (:NEGATIVE-LOOKAHEAD ,PATT))) :EVERYTHING)) (?? (PATT) (REP PATT 0 1)) (UPTO+ (PATT) `(:SEQUENCE ,(UPTO PATT) ,PATT)) (ALT (&REST ARGS) `(:ALTERNATION ,@ARGS)) (** (PATT) (REP PATT 0 NIL)) (SEQ (&REST ARGS) `(:SEQUENCE ,@ARGS)) (REG (&REST ARGS) `(:REGISTER ,@ARGS)) (REP (PATT &OPTIONAL (MIN 0) (MAX NIL)) `(:GREEDY-REPETITION ,MIN ,MAX ,PATT))) (SYMBOL-MACROLET ((WS+ '(:GREEDY-REPETITION 1 NIL :WHITESPACE-CHAR-CLASS)) (WS* '(:GREEDY-REPETITION 0 NIL :WHITESPACE-CHAR-CLASS)) (DIGIT '(:CHAR-CLASS (:RANGE #\0 #\9))) (DIGIT+ (++ DIGIT)) (MATCH-NUMS (DEFPATT-PATTERN (SEQ (REG DIGIT+) WS+ (REG DIGIT+) WS+ (REG DIGIT+)))) (DIGIT* (** DIGIT))) (SEQ (REG DIGIT+) WS+ (REG DIGIT+) WS+ (REG DIGIT+))))
; `upto' and `upto+' are good examples of how having an abstraction ; mechanism helps keep maintainable and understandable REs. See ; their definitions above.
|#
cl-ppcre-devel@common-lisp.net