Hi Edi,
How hard would it be to modify cl-ppcre to work on byte vectors instead of strings? I'm trying to obtain faster performance when parsing large log files. Most of the time spent processing the logs is wasted on the creation of strings. I want to use read-sequence with unsigned-byte as the external format to avoid that processing. Of course, this means I need a regexp library that can handle byte vectors.
As a newbie, is it even worth hacking cl-ppcre to use byte vectors or is the difficulty level too high? I am also considering learning FFI and just making an interface to a standard C regexp library which will work with bytes. However, if I can use cl-ppcre, I'd prefer as its written in CL.
Thanks, Pete
On Sun, 17 Jul 2005 20:02:05 -0400, pete-cl-ppcre@kazmier.com wrote:
How hard would it be to modify cl-ppcre to work on byte vectors instead of strings? I'm trying to obtain faster performance when parsing large log files. Most of the time spent processing the logs is wasted on the creation of strings. I want to use read-sequence with unsigned-byte as the external format to avoid that processing. Of course, this means I need a regexp library that can handle byte vectors.
As a newbie, is it even worth hacking cl-ppcre to use byte vectors or is the difficulty level too high? I am also considering learning FFI and just making an interface to a standard C regexp library which will work with bytes. However, if I can use cl-ppcre, I'd prefer as its written in CL.
Hi Pete!
If I'm not mistaken this has already been done. I seem to remember someone patched CL-PPCRE to work on arbitrary sequences and this was done for the CLIMACS project. If you can't find it in the CLIMACS sources which should be online somewhere you could ask Robert Strandh - he should know about it. Google will find his homepage. Maybe there's also an initial conversation about this topic in the archives of this mailing list.
Sorry that I can't be more helpful at the moment but I'm in a hurry.
Cheers, Edi.
PS: And in case you have to do it yourself: It shouldn't be /too/ hard but maybe a bit tedious.
On Mon, 18 Jul 2005 02:09:07 +0200, Edi Weitz edi@agharta.de wrote:
If I'm not mistaken this has already been done. I seem to remember someone patched CL-PPCRE to work on arbitrary sequences and this was done for the CLIMACS project.
Googling for "CLIMACS CL-PPCRE" revealed this one:
http://common-lisp.net/pipermail/climacs-devel/2005-May/000210.html
On Mon, Jul 18, 2005 at 02:09:07AM +0200, Edi Weitz wrote:
If I'm not mistaken this has already been done. I seem to remember someone patched CL-PPCRE to work on arbitrary sequences and this was done for the CLIMACS project. If you can't find it in the CLIMACS sources which should be online somewhere you could ask Robert Strandh
- he should know about it. Google will find his homepage. Maybe
there's also an initial conversation about this topic in the archives of this mailing list.
Sorry that I can't be more helpful at the moment but I'm in a hurry.
Great! I should have read the archives before posting (sorry). I'll investigate further. Thanks for the suggestions!
Pete
cl-ppcre-devel@common-lisp.net