On 23/10/06, Brad Beveridge brad.beveridge@gmail.com wrote:
On 23/10/06, Brad Beveridge brad.beveridge@gmail.com wrote:
If it's not nailed down, steal it. That's my motto. Vial originally started life as a fun way to learn Common Lisp, which meant doing some stuff myself that maybe I shouldn't have. Particularly the buffer stuff. It might be worthwhile moving to Flexichain (http://common-lisp.net/project/flexichain/), which is the underlying implementation that Climacs uses for buffer management.
I'm going to look at it harder tonight, and if I do decide to change then it will probably mean big changes for Vial's internal code - but probably changes that clean the code up quite a bit. If we do move to flexichain, the first step will be to abstract out all the code that currently plays with the internals of buffer or cursor objects, then re-implement those functions using Flexichain.
Thoughts?
Cheers Brad
I've looked at Flexichain tonight. It is a good idea, and worth moving to. However I want to get something done, and changing out the buffer, cursor and undo system is a good chunk of work. So Flexichains are not going to happen anytime soon. However, we will start tightening up abstraction. Right now lots of modules directly use the internal bits of various classes, that is going to stop. The next few changes I make will probably be with cleaning up intermodule interfaces.
Cheers Brad
Flexichain is a buffer API used by Climacs and refactored into a library. Flexichain provides CLOS classes and generic methods that abstract a sequence that has insert/delete/get functions. Cursorchain provides the concept of a point or mark into a Flexichain, with the semantics that you may insert or delete at the cursor. It also moves cursors so that they maintain their position when text is inserted or deleted. Flexi* also provides an implementation for the protocol in the form of a circular gap buffer. It is obvious that we could wrap our own List of lines method with Flexi* protocols, so I will just discuss the advantages/disadvantages of a gap buffer implementation and a list of lines implementation. Note - flexichains only provides for accessing single elements in an abstract way, but if you wish to write implementation specific code you can make performance improvements.
The main slow operations in a gap buffer are 1) Moving the gap 2) Moving by lines as the protocol does not know about lines.
I timed 1 & 2 on Clisp on a 2Ghz Pentium-M: a middle of the road CPU and the slowest Lisp. I didn't actually use Flexichains, but did the kind of operations it would need to do. Moving the gap for a 1Mb buffer takes ~0.3 of a second. Scanning a 1Mb buffer for the whole length of the buffer takes ~0.5 of a second. This I think is absolute worst case behaviour, on a compiled Lisp, on a decent machine, on a reglar sized file (even on a big 300K file) these operations will take no time at all. I have no idea how a LineList implementation would stack up, but I suspect it would be slightly worse for scanning for chars. LL = List of lines FC = Flexichain
Random Access: * FC lets you treat the whole buffer as an array, providing O(1) access. * LL is worst case O(n), though average case can be better. (N = num lines in buffer)
Sequential Access: * Both are O(1). * LL is O(n) for the first access (N = num lines in buffer)
Moving by Line: * LL is O(1) * FC is O(n) (N = chars in line) * Note, here N is small (< 100) and moving by char is VERY fast
Moving to specific line: * LL is O(n) (N = line to move to) * FC is O(n*m) (N = line to move to, M = average chars per line) * Note, here M is small (< 100) and moving by char is VERY fast
Converting buffer to a string (read only access): * FC is O((n*m)/2), it needs to simply move the gap to the end of the buffer, a very fast and non-consing operation. (N = line to move to, M = average chars per line) * LL is O(n), but must make an entire copy of the buffer, consing horribly. (N = number of lines in buffer) ** FC is the CLEAR winner here **
Converting part of a buffer to a read only string: * Both are O(n), where N is chars in the destination string. * FC moves the gap and won't need to cons. * It may be better to cons up a new string rather than move the gap. * You will only need to move the gap to just before or after the sequence you are taking out. * The gap will need to move back when you continue editing. * LL will need to cons for any string that spans lines.
Converting part of a buffer to a new string: * Both about O(n) + book keeping. N is chars in the destination string. Consing the new string is the biggest issue.
Sequential inserting into buffer: * Both are O(1), plus book keeping (move the gap, or find the line first)
Marks: * A clever mark implementation for LL will have no cost for marks when doing ops. * FC will be O(n) every time the gap buffer is moved. (N = number of marks to move)
Use cases: All in all, both methods are similar orders of magnitude in most operations, you could pick either and be satisifed with performance for most operations. The only clear difference in operations is converting to read only strings, and only then when the read-only string involves large sections of the buffer. So, at what times would we like to deal with large strings? Saving: * LL, needs to loop over lines and use WRITE-LINE * FC, move the gap and save the array in one call
Searching the whole buffer: * LL, needs to loop over lines * FC, move the gap, one call to CL-PPCRE * FC will be much easier for multiline search strings * On text changing (active highlighting): * much the same for both, need to rescan the text as a string * LL, always keeps the line as a plain string, just pass it to CL-PPCRE * FC, need to reconstruct the line: move gap to end of line, pass to CL-PPCRE, move gap back * Neither matters that much, it is happening at human speed - so just whatever works.
Coping from one buffer/register to another: * FC, move the gap and do a single call to string copy * LL, loop over lines, copying to new buffer * FC is easier for partial lines - it has no concept of lines. * LL needs to do fiddly work at the start/end of a region to copy partial lines.
Partial lines: * FC has no concept of lines, so there is no special handling of "lines" you just copy chars around. * LL has "lines" ingrained in it. Any time you want to access text, you need to * consider you may only want to deal with a partial line * account for moving over line chars and do something special
Conclusion: I think that using FC makes for a simpler system, that may have performance gains and will certainly not have large performance losses. I now believe that switching to Flexichains is a prudent move. Unfortuantly, it will mean a re-write of perhaps 70% of Vial, but that 70% should shrink significantly. About the only thing that can stay is the command handling and ncurses input layer. I will write up a plan on how to move to FC soon.
Thoughts?
Cheers Brad