Hello Edi,
I've followed your list of suggestions and am sending you the patch [note: ASDF recognizes the new system as :cl-ppcre-testing], parser extension is now user-controllable through *ALLOW-NAMED-REGISTERS* switch, changes are documented in the source and html doc.
I've also discovered a subtle problem - according to *ALLOW-QUOTING* documentation:
* (let ((cl-ppcre:*allow-quoting* t))
(cl-ppcre:scan "^\\Qa+\\E$" "a+"))
0
2
#()
#()
but my SBCL simply returns NIL. It will be immediately obvious what's happening from the following code:
(let ((cl-ppcre:*allow-named-registers* t))
(cl-ppcre:scan "(?<reg>.*)" "abc"))
=> error
...
; (LOAD-TIME-VALUE (CL-PPCRE:CREATE-SCANNER "(?<reg>.*)"))
;
; caught ERROR:
; (during EVAL of LOAD-TIME-VALUE)
; Character 'r' may not follow '(?<' at position 3 in string "(?<reg>.*)"
; ==>
; (CL-PPCRE:SCAN (LOAD-TIME-VALUE (CL-PPCRE:CREATE-SCANNER (?<reg>.*)")) "abc")
...
SCAN function has a compiler-macro, which precompiles constant Perl regexes at load time. But LOAD-TIME-VALUE doesn't know about any runtime bindings (of course) affecting the scanner closure creation. Since compiler-macros may or may not get expanded, it is implementation dependent what happens. This code is likely to work in an interpreted REPL (but SBCL compiles all forms by default, hence it doesn't work here), but less likely to work when compiled. The situation probably affects more special variables than the mentioned two.
Again, this is a rather subtle problem and unsuspecting user can get quite puzzled by it. I can think of the following remedies:
1. Clearly mention the pitfall in the doc and warn users to always explicitly use CREATE-SCANNER when binding special variables
affecting closure generation. They can even use LOAD-TIME-VALUE, provided that it contains the desired binding inside.
2. Don't use LOAD-TIME-VALUE in the SCAN compiler-macro (I think there are more similar places that have to be fixed too, but haven't investigated them), but rather some kind of "FIRST-TIME-VALUE" - I mean, some simple sort of memoization, which would compute a scanner closure when it is needed for the first time, remembering it afterwards. This would fix the problem with binding of specials (safe only for constant values, though, as only the first-time encountered binding would be remembered and effective). It would also have the effect of spreading closure creation through program execution time. This could be seen as a benefit sometimes,
e.g. when a program uses lots of constant regexes, which cause a noticeable start-up pause while compiling them during load time (hypothetically, I haven't run across such a case).
Maybe there are some other possibilities, that's why I have just mentioned this issue and haven't done anything to fix it.
I hope this helped.
Regards,
Ondrej
[Cc to mailing list.]
Hi Ondrej,
On Sat, 17 Mar 2007 00:31:46 +0100, "Ondrej Svitek" < ondrej.svitek@gmail.com> wrote:
> I've written a little extension to your wonderful CL-PPCRE library -
> support for named registers and back-references. I don't know if
> Perl has them (never used it), but ACL does and they proved useful
> for me in certain situations.
>
> [...]
>
> Feel free to incorporate this change, if you like it. Or not, if not
> :)
Thanks for the code. I'd be interested to incorporate this, but for
that I'd like you to do the following:
1. Send a "unified diff" (diff -u) of your changes instead of a full
tarball.
2. Make sure to (if necessary) update all docstrings of functions that
changed their behaviour and to add docstrings for functions,
classes, or slots you added.
3. Add a user-visible switch to turn this new behaviour on or off, so
users can opt to have the old, Perl-compatible syntax instead. The
default should be off.
4. Update the HTML documentation accordingly.
Thanks in advance,
Edi.