[vial-devel] Re: Flexichain and Cursorchain

24 Oct 2006

      On 23/10/06, Brad Beveridge <brad.beveridge@gmail.com> wrote:
...
On 23/10/06, Brad Beveridge <brad.beveridge@gmail.com> wrote:
...
If it's not nailed down, steal it.  That's my motto.  Vial originally
started life as a fun way to learn Common Lisp, which meant doing some
stuff myself that maybe I shouldn't have.
Particularly the buffer stuff.
It might be worthwhile moving to Flexichain
(http://common-lisp.net/project/flexichain/), which is the underlying
implementation that Climacs uses for buffer management.
I'm going to look at it harder tonight, and if I do decide to change
then it will probably mean big changes for Vial's internal code - but
probably changes that clean the code up quite a bit.
If we do move to flexichain, the first step will be to abstract out
all the code that currently plays with the internals of buffer or
cursor objects, then re-implement those functions using Flexichain.
Thoughts?
Cheers
Brad
I've looked at Flexichain tonight.  It is a good idea, and worth
moving to.  However I want to get something done, and changing out the
buffer, cursor and undo system is a good chunk of work.  So
Flexichains are not going to happen anytime soon.  However, we will
start tightening up abstraction.  Right now lots of modules directly
use the internal bits of various classes, that is going to stop.  The
next few changes I make will probably be with cleaning up intermodule
interfaces.
Cheers
Brad
Flexichain is a buffer API used by Climacs and refactored into a
library.  Flexichain provides CLOS classes and generic methods that
abstract a sequence that has insert/delete/get functions.
Cursorchain provides the concept of a point or mark into a Flexichain,
with the semantics that you may insert or delete at the cursor.  It
also moves cursors so that they maintain their position when text is
inserted or deleted.
Flexi* also provides an implementation for the protocol in the form of
a circular gap buffer.  It is obvious that we could wrap our own List
of lines method with Flexi* protocols, so I will just discuss the
advantages/disadvantages of a gap buffer implementation and a list of
lines implementation.
Note - flexichains only provides for accessing single elements in an
abstract way, but if you wish to write implementation specific code
you can make performance improvements.

The main slow operations in a gap buffer are
1) Moving the gap
2) Moving by lines as the protocol does not know about lines.

I timed 1 & 2 on Clisp on a 2Ghz Pentium-M: a middle of the road CPU
and the slowest Lisp.  I didn't actually use Flexichains, but did the
kind of operations it would need to do.
Moving the gap for a 1Mb buffer takes ~0.3 of a second.  Scanning a
1Mb buffer for the whole length of the buffer takes ~0.5 of a second.
This I think is absolute worst case behaviour, on a compiled Lisp, on
a decent machine, on a reglar sized file (even on a big 300K file)
these operations will take no time at all.
I have no idea how a LineList implementation would stack up, but I
suspect it would be slightly worse for scanning for chars.
LL = List of lines
FC = Flexichain

Random Access:
 * FC lets you treat the whole buffer as an array, providing O(1) access.
 * LL is worst case O(n), though average case can be better. (N = num
lines in buffer)

Sequential Access:
 * Both are O(1).
 * LL is O(n) for the first access (N = num lines in buffer)

Moving by Line:
 * LL is O(1)
 * FC is O(n) (N = chars in line)
   * Note, here N is small (< 100) and moving by char is VERY fast

Moving to specific line:
 * LL is O(n)   (N = line to move to)
 * FC is O(n*m) (N = line to move to, M = average chars per line)
   * Note, here M is small (< 100) and moving by char is VERY fast

Converting buffer to a string (read only access):
 * FC is O((n*m)/2), it needs to simply move the gap to the end of the
buffer, a very fast and non-consing operation.  (N = line to move to,
M = average chars per line)
 * LL is O(n), but must make an entire copy of the buffer, consing
horribly.  (N = number of lines in buffer)
 ** FC is the CLEAR winner here **

Converting part of a buffer to a read only string:
 * Both are O(n), where N is chars in the destination string.
 * FC moves the gap and won't need to cons.
   * It may be better to cons up a new string rather than move the gap.
   * You will only need to move the gap to just before or after the
sequence you are taking out.
   * The gap will need to move back when you continue editing.
 * LL will need to cons for any string that spans lines.

Converting part of a buffer to a new string:
 * Both about O(n) + book keeping.  N is chars in the destination
string.  Consing the new string is the biggest issue.

Sequential inserting into buffer:
 * Both are O(1), plus book keeping (move the gap, or find the line first)

Marks:
 * A clever mark implementation for LL will have no cost for marks
when doing ops.
 * FC will be O(n) every time the gap buffer is moved.  (N = number of
marks to move)

Use cases:
 All in all, both methods are similar orders of magnitude in most
operations, you could pick either and be satisifed with performance
for most operations.
The only clear difference in operations is converting to read only
strings, and only then when the read-only string involves large
sections of the buffer.
So, at what times would we like to deal with large strings?
Saving:
 * LL, needs to loop over lines and use WRITE-LINE
 * FC, move the gap and save the array in one call

Searching the whole buffer:
 * LL, needs to loop over lines
 * FC, move the gap, one call to CL-PPCRE
 * FC will be much easier for multiline search strings
 * On text changing (active highlighting):
  * much the same for both, need to rescan the text as a string
  * LL, always keeps the line as a plain string, just pass it to CL-PPCRE
  * FC, need to reconstruct the line: move gap to end of line, pass to
CL-PPCRE, move gap back
  * Neither matters that much, it is happening at human speed - so
just whatever works.

Coping from one buffer/register to another:
 * FC, move the gap and do a single call to string copy
 * LL, loop over lines, copying to new buffer
 * FC is easier for partial lines - it has no concept of lines.
 * LL needs to do fiddly work at the start/end of a region to copy
partial lines.

Partial lines:
 * FC has no concept of lines, so there is no special handling of
"lines" you just copy chars around.
 * LL has "lines" ingrained in it.  Any time you want to access text,
you need to
  * consider you may only want to deal with a partial line
  * account for moving over line chars and do something special

Conclusion:
I think that using FC makes for a simpler system, that may have
performance gains and will certainly not have large performance
losses.
I now believe that switching to Flexichains is a prudent move.
Unfortuantly, it will mean a re-write of perhaps 70% of Vial, but that
70% should shrink significantly.  About the only thing that can stay
is the command handling and ncurses input layer.  I will write up a
plan on how to move to FC soon.

Thoughts?

Cheers
Brad