On Tue, 23 Jun 2026 20:47:32 +0200, Marco Antoniotti said:
Hi
I am sure I missed something in the CL-PPCRE documentation, but here it is.
I am fooling around with a C tokenizer and I am running into the following problem.
The string esre contains the "escape sequences" for C (and Perl, etc etc)
CL Prompt > esre "(\\\\(['\\\"\\?\\\\abfnrtv]|[0-7]{1,3}|x[a-fA-F0-9]+))"
If I use CL-PPCRE:REPLACE-ALL I get the following (I tried to boil it down):
CL Prompt > (regex-replace-all "\\$<ES>" "/$<ES>/" esre) "/(\\(['\\\"\\?\\abfnrtv]|[0-7]{1,3}|x[a-fA-F0-9]+))/" T
Apart from the bounding slashes, I would have expected the result of CL-PPCRE:REPLACE-ALL not to return a shorter string.
What exactly (backslash escaping for sure) am I missing?
It looks like an undocumented feature of the replacement string: \\ is converted to a single backslash by CL-PPCRE::BUILD-REPLACEMENT-TEMPLATE. That means it will always return a shorter string when \\ is present in the replacement string and if you want \\ in the output then you'll need to use \\\\ (i.e. 8 backslashes in the Lisp syntax of that string!). Also, to make it more confusing, if \ is followed by anything other than \ & ` ' digits or {digits} then it is also treated as a single backslash. i.e. \a is \a. -- Martin Simmons LispWorks Ltd http://www.lispworks.com/