Hi!
Sorry for the delay, I was on vacation.
Oh, and please use the mailing list - see Cc.
On Sat, 3 Jun 2006 22:28:29 +0200, "Alexander Kjeldaas" alexander.kjeldaas@gmail.com wrote:
I'm a big fan of register-groups-bind. For some longer texts that I'm parsing, I have code that looks like this:
The idiom is - you want to go through the text matching various regexps, but you want to keep on to the end-position in order to do incremental matching. So I added an :end-holder keyword parameter that is the name of a variable that is set to the end-match (I also added :start-holder, but I don't use it).
To me, this type of parsing is relatively simple to debug, but it is also not too sloppy in that we don't start from the beginning all the time.
:end-place and :start-place might be better names, I don't know..
I think that the idea is basically OK, but I'm not sure if I like that seemingly END-POS is supposed to be an existing variable which is set - I think it'd be more Lisp-y to bind within the body of the form. Admittedly, the indentation would get quite nasty then.
Cheers, Edi.
(defun parse-stuff (string) "Parse various account information" (let ((result nil) end-pos) (register-groups-bind (foo) ("Kontoinformasjon for (.*)\s+" string :end-holder end-pos) (push `(:username ,foo) result)) (register-groups-bind (foo) ("Navn:[ \t]+(.*)\s+" string :start end-pos :end-holder end-pos) (push `(:name ,foo) result)) ;; Use (?=Telefon) to not affect the end-position. (register-groups-bind (addr1 addr2 addr3) ("Adresse:[ \t]+(.*)\s+(?:(.*)\s+)?(?:(.*)\s+)?(?=Telefon)" string :start end-pos :end-holder end-pos) (push (remove-if-not #'identity `(:address ,addr1 ,addr2 ,addr3)) result)) (register-groups-bind (foo) ("Telefon:[ \t]+(\+?\d+)" string :start end-pos :end-holder end-pos) (push `(:phone ,foo) result)) (register-groups-bind (foo) ("Mobil:[ \t]+(\+?\d+)" string :start end-pos :end-holder end-pos) (push `(:mobile-phone ,foo) result)) (register-groups-bind (foo) ("Epost:[ \t]+(.*)\n" string :start end-pos :end-holder end-pos) (push `(:email ,foo) result)) (register-groups-bind (foo) ("Epost:[ \t]+(.*)\n" string :start end-pos :end-holder end-pos) (push `(:email ,foo) result)) (nreverse result)))
"Alexander Kjeldaas" alexander.kjeldaas@gmail.com wrote:
The idiom is - you want to go through the text matching various regexps, but you want to keep on to the end-position in order to do incremental matching.
I agree with Edi that the idea is basically sound, but finding a good syntax will be challenging.
In other languages (Perl mostly), I've used the same idiom for a quick and dirty parser. The difference is that I've always used a replace on the target string instead of just a match. The replacement text is always the empty string "". That way, I remove what I match and can continue on.
As an example, I'll invent REPLACE-REGISTER-GROUPS-BIND and use it to parse the name, rank, and serial number out of a string.
(defun parse-stuff (string) (replace-register-groups-bind (name) ("Name: (\S+)" string "") (process-name name))
(replace-register-groups-bind (rank) ("Rank: (\S+)" string "") (process-rank rank))
(replace-register-groups-bind (sn) ("Serial Number: (\S+)" string "") (process-sn sn)))
I probably should have copied the string first, but you get the idea.
If you want to go with a non-destructive solution I think the syntax is tough. The best I could come up with in the 30 seconds of contemplation was the mythical REGISTER-GROUPS-BIND-2 form that binds the start and end of the match:
(defun parse-stuff (string) (let ((last-end 0)) (register-groups-bind-2 (name) (match-start match-end) (*name-re* string :start last-end) (process-name name) (setf last-end match-end))
(register-groups-bind-2 (rank) (match-start match-end) (*rank-re* string :start last-end) (process-rank rank) (setf last-end match-end))
(register-groups-bind-2 (sn) (match-start match-end) (*sn-re* string :start last-end) (process-sn sn) (setf last-end match-end))))
BTW, Perl has some anchoring meta characters (\G), but I don't think that is what you are looking for.
Cheers, Chris Dean
cl-ppcre-devel@common-lisp.net