Hi!
On Sat, 29 Jan 2005 16:16:50 +0000, Lawrence Mitchell wencel@gmail.com wrote:
I'm looking at trying to use cl-ppcre to add regular expression support to the Climacs editor (URL: http://common-lisp.net/project/climacs/).
Sounds cool... )
A few things spring to mind:
o Licensing differences. Climacs is released under the LGPL, while cl-ppcre is under as BSD-style license. I don't think this is a problem (as far as I can tell from reading the licenses), but if you know otherwise, I'd be grateful to hear.
I don't see a problem but IANAL. It is my understanding that the BSD license basically means that you can do with CL-PPCRE whatever you want as long as you credit my original work - this is what I intended. So you could, e.g., incorporate it into a LPGL project without a problem. Of course, the original CL-PPCRE will still be available under the old license.
o How to best match up cl-ppcre's matching on strings with climacs' idea of a buffer.
A climacs buffer is a sequence of objects (which may or may not be characters, but we'll ignore that for the moment). Now, I can easily generate a string of the contents of the buffer, and call SCAN (or whatever) on the string. However, this is going to be slow for large buffers (especially if we find something just after point, we've still constructed the whole buffer-string).
The "obvious" solution to this is to use streams instead (probably), so, I wonder if cl-ppcre would be amenable to something like this?
Well, supporting all of Perl's regex facilities implies that you need to have random access to the target - I don't think you can fit streams into this picture. I'm not a CS guy but my understanding is that CL-PPCRE is based on an NFA and you can't change that easily. You can build a DFA that implements a subset of CL-PPCRE and that would work with streams but that wouldn't be CL-PPCRE anymore... :)
Now, using another kind of structures (like, say, your buffers) that aren't strings but are random-access - that wouldn't be /too/ hard. It would involve going through three or four files and change SCHAR to something else but basically I don't really see a problem. However, as CL-PPCRE has a reputation for being quite fast I wouldn't want to sacrifice this for greater flexibility (buffers instead of strings, arbitrary objects instead of characters - you name it). I think the right way to do it would be to offer the ability to build different versions of CL-PPCRE based on *FEATURES*, i.e. at compile time you decide whether you want a fast regex engine for strings or if you want a not-so-fast regex engine for, say, buffers. Would that be OK for you?
On another, somewhat unrelated note. One thing that one would like to do is regexp search and replace, now, if I know how many groups the user is going to input into their regexp before the fact, I can use REGISTER-GROUPS-BIND to get at the substring matches via variables. I guess there isn't any way to do this without knowing the input beforehand. So is the idea then just to use SCAN and then manually grab the substrings via REG-STARTS and REG-ENDS, or have I missed something obvious?
No, I don't see a better way. If you don't now the regex then you have to check the return value and see how long the register arrays are.
Cheers, Edi.