[slime-devel] Fuzzy completion

11 Jun 2004

      Hello Slime developers,

For the past while, I've been using a Mac OS X program called
Quicksilver, which is sort of a multi-purpose quick access tool.  It
features a fuzzy completion algorithm that I've quite grown to like.
For example I can hit Command-Space and type "liswp<return>", and it
will match and launch "LISpWorks Personal".

I decided I wanted to add something like this (only better :) to Slime,
to go alongside the existing completion algorithm.  It can't really be a
replacement as they are good at different things.  The existing one is
good for incrementally typing and completing a symbol.  The fuzzy one is
good for typing a short version of the symbol, running the completer,
and then picking the right result, which hopefully scored at or near the
top.

While I've attached a patch, this is more a request for comments than a
request to commit the code.  As I'll discuss, the Emacs interface is
incredibly lacking, mostly on account of this being my first time using
elisp for anything but configuration.  It's very ugly.  Also, the
completion core code in Swank should be cleaned up - it still looks
somewhat hacky.  Finally, everything needs documentation.

The elisp code has some chunks from isearch.el.  Since they are both
GPL, I figured this is okay.  Besides, it will probably change anyway.
:)

----------------------------------------------------------------------

The low level core of the algorithm are implemented in Swank, just like
the old completer.  The interface looks very similar:

; SLIME 2004-06-10
CL-USER> (setq *print-pretty* t)
T
CL-USER> (swank::completions "m-v-" "CL-USER")
(("multiple-value-bind"
  "multiple-value-call"
  "multiple-value-list"
  "multiple-value-prog1"
  "multiple-value-setq"
  "multiple-values-limit")
 "multiple-value-")
CL-USER> (swank::fuzzy-completions "mvb" "CL-USER")
(("multiple-value-bind" 26.588236 ((0 "m") (9 "v") (15 "b")))
 ("most-negative-double-float" 12.416667 ((0 "m") (11 "v") (17 "b")))
 ("most-positive-double-float" 12.416667 ((0 "m") (11 "v") (17 "b")))
 ("named-variable" 10.833333 ((2 "m") (6 "v") (11 "b")))
 ("*compile-verbose*" 10.666667 ((3 "m") (9 "v") (12 "b")))
 ("environment-variable" 10.555555 ((7 "m") (12 "v") (17 "b")))
 ("remove-symbol-profiler" 3.5 ((2 "m") (4 "v") (10 "b"))))

The return value is obviously a bit more complicated.  :)  The fields are
(completion score chunks), where "chunks" are a description of where the
completion matched.

There's a convenience function in Swank that is not used by the Emacs
interface to print this information in a human readable form.  I used
this to tune the scoring algorithm.  This shows a little more clearly
how the algorithm matches:

CL-USER> (swank::format-fuzzy-completions
          (swank::fuzzy-completions "mvb" "CL-USER"))
Multiple-Value-Bind         score    26.59  ((0 m) (9 v) (15 b))
Most-negatiVe-douBle-float  score    12.42  ((0 m) (11 v) (17 b))
Most-positiVe-douBle-float  score    12.42  ((0 m) (11 v) (17 b))
naMed-VariaBle              score    10.83  ((2 m) (6 v) (11 b))
*coMpile-VerBose*           score    10.67  ((3 m) (9 v) (12 b))
environMent-VariaBle        score    10.56  ((7 m) (12 v) (17 b))
reMoVe-symBol-profiler      score     3.50  ((2 m) (4 v) (10 b))
NIL

Here are some more examples, which might show why I think something like
this can be useful.  (There is an optional third argument to
swank::fuzzy-completions, limit.  This limits the number of results,
which is important for this completer since completing, say, "a" will
match many, many symbols.  This isn't currently used from Emacs
however.)

CL-USER> (swank::format-fuzzy-completions
          (swank::fuzzy-completions "stdout" "CL-USER" 5))
*STanDard-OUtpuT*            score    34.83  ((1 st) (5 d) (10 ou) (15 t))
moST-negative-DOUble-floaT   score    22.48  ((2 st) (14 dou) (25 t))
moST-positive-DOUble-floaT   score    22.48  ((2 st) (14 dou) (25 t))
leaST-positive-DOUble-floaT  score    22.45  ((3 st) (15 dou) (26 t))
leaST-negative-DOUble-floaT  score    22.45  ((3 st) (15 dou) (26 t))
NIL
CL-USER> (swank::format-fuzzy-completions
          (swank::fuzzy-completions "prp" "CL-USER" 5))
*PRint-Pretty*           score    23.83  ((1 pr) (7 p))
PRint-Profile-list       score    23.62  ((0 pr) (6 p))
*PRint-Pprint-dispatch*  score    23.48  ((1 pr) (7 p))
*step-PRint-Pretty*      score    20.59  ((6 pr) (12 p))
*trace-PRint-Pretty*     score    20.56  ((7 pr) (13 p))
NIL
CL-USER> (swank::format-fuzzy-completions
          (swank::fuzzy-completions "norm-df" "CL-USER" 5))
least-positive-NORMalized-Double-Float  score    32.31  ((15 norm) ...)
least-negative-NORMalized-Double-Float  score    32.31  ((15 norm) ...)
NIL
CL-USER> (swank::format-fuzzy-completions
          (swank::fuzzy-completions "udc" "CL-USER" 1))
Update-instance-for-Different-Class  score    26.30  ((0 u) (20 d) (30 c))
NIL
CL-USER> (swank::format-fuzzy-completions
          (swank::fuzzy-completions "urc" "CL-USER" 1))
Update-instance-for-Redefined-Class  score    26.30  ((0 u) (20 r) (30 c))
NIL

Despite being expensive (it tries all combinations of matches up to a
relatively high cutoff per symbol to maximize the score) and not
optimized at all, the completion speed doesn't seem bad at all on
compiled Lisps on my 500mhz Powerbook.  I haven't tried it on
interpreted Lisps, but I imagine it wouldn't be too fun.

There are some scoring configuration parameters in swank.lisp that are
mostly self-explanitory.  Adjust to fit your own symbol naming schemes;
these seemed to be the five-minute opinion of #lisp.  Some way to set
these from Emacs would probably be nice in the future.

(defparameter *fuzzy-completion-symbol-prefixes* "*+-%&?<")
(defparameter *fuzzy-completion-symbol-suffixes* "*+->")
(defparameter *fuzzy-completion-word-separators* "-/.")

The scoring algorithm is currently completely static.  It would probably
be a nifty project to implement some machine learning on it.

----------------------------------------------------------------------

Now, the Emacs interface.  Well, it kind of sucks.  I bound
slime-fuzzy-complete-symbol to C-c M-i in the Slime map.  The problem is
that since the first letters of the short form aren't necessarily the
first letters of the matches, the existing Emacs completion system will
fail miserably with this algorithm.

What I did was, if there was more than one completion, pop up a window
with the results (formatted nicely like above, but with a bold face
instead of upcasing).  Point starts on the highest scoring result, and
the result is immediately placed into the Lisp buffer in place of the
short form.  Then the user can hit 'n' and 'p' to navigate the list of
completions.  As they do, the replacement is updated.  To pick a
completion and keep it, hit Return.  To abort and revert to the short
form, hit almost anything else.  It is a recursive edit, like isearch,
so doing anything else in Emacs will abort the completion.  It's also
implemented very poorly.

I don't really like this, but I'm not sure what a better option is.  If
you have the completion buffer come up normally, it will have embedded
state from the Lisp source buffer.  If the Lisp source buffer changes
before a completion is picked, it may modify the wrong position.  And
again, it can't really work like the (stateless) normal completion.

----------------------------------------------------------------------

So anyway, I'd appreciate it if people could try it out.  If anyone has
good interface ideas (or even better, would implement them :) I'd like
to hear them, since I'm not so happy with what I've got.

Thanks,
-bcd
-- 
*** Brian Downing <bdowning at lavos dot net>