Hi!
On Wed, 3 Aug 2005 11:59:48 -0700, Derek Peschel dpeschel@eskimo.com wrote:
I've been reading the CL-PPCRE docs and code to get a clear specification of the syntax.
Uh, I think there is no clear specification of the syntax. Your best bets probably are `man perlre' and the Camel Book but these are moving targets.
Ultimately I'd like to add syntax highlighting for CL-PPCRE regexps to the Climacs text editor.
Cool...
But there seems to be a certain amount of defensive or sloppy programming (things being done in more than one place).
I wouldn't be surprised.
The scanner knows something about skipping # comments but the lexer does too.
See below.
The lexer has code to ignore \E markers but I get the impression the scanner removes them before the lexer starts. If this kind of duplication does exist, is there a useful reason for it?
The \Q\E stuff (*allow-quoting*) was added pretty late, almost a year after CL-PPCRE's first release. The problem with \Q\E and friends is that they're not really part of Perl's regex syntax either - they're part of Perl's string syntax:
edi@vmware:~$ perl -le '$a = "\Q*\E"; print $a' *
That's why I ignored them first and later implemented them as a kind of "pre-parsing" of the regex string (which itself uses regular expressions). In the process of doing this it is possible that a dangling \E remains in the regex string and that's why the lexer is instructed to specifically ignore these. Maybe this can be done in api.lisp as well but at that time it seemed easier to me to do that in the lexer. (The lexer is pretty ugly anyway because it has to cope with a very ugly syntax.)
If you have a patch to make the code cleaner without breaking it I'd be happy to incorporate it.
Cheers, Edi.