On Sun, 17 Jul 2005 20:02:05 -0400, pete-cl-ppcre@kazmier.com wrote:
How hard would it be to modify cl-ppcre to work on byte vectors instead of strings? I'm trying to obtain faster performance when parsing large log files. Most of the time spent processing the logs is wasted on the creation of strings. I want to use read-sequence with unsigned-byte as the external format to avoid that processing. Of course, this means I need a regexp library that can handle byte vectors.
As a newbie, is it even worth hacking cl-ppcre to use byte vectors or is the difficulty level too high? I am also considering learning FFI and just making an interface to a standard C regexp library which will work with bytes. However, if I can use cl-ppcre, I'd prefer as its written in CL.
Hi Pete!
If I'm not mistaken this has already been done. I seem to remember someone patched CL-PPCRE to work on arbitrary sequences and this was done for the CLIMACS project. If you can't find it in the CLIMACS sources which should be online somewhere you could ask Robert Strandh - he should know about it. Google will find his homepage. Maybe there's also an initial conversation about this topic in the archives of this mailing list.
Sorry that I can't be more helpful at the moment but I'm in a hurry.
Cheers, Edi.
PS: And in case you have to do it yourself: It shouldn't be /too/ hard but maybe a bit tedious.