Hey, aren't you supposed to be writing your book? ;-)
What do you guys mean by "race condition"? (In the context of SLIME, that is--I know what a race condition is in general.)
I guess you've seen my recent description of the push-down automaton that keeps track of protocol state. This can be seen as a representation in Emacs of relevant parts of the Lisp stack. For example, when we push into the EVALUATING state in Emacs we ask Lisp to call this (abbreviated) function:
(defslimefun eval-string (string buffer-package) (let (ok result) (unwind-protect (setq result (eval (read-form string))) (setq ok t)) (send-to-emacs (if ok `(:ok ,result) '(:aborted))))))
Before the corresponding Lisp stack frame returns it will send either (:ok RESULT) or (:aborted) to Emacs. When Emacs receives either message, it will pop its EVALUATING state off the stack. In this way the two stacks stay synchronized, and Emacs knows whether it is in the debugger, or waiting on an RPC result, etc.
The protocol is free of race conditions provided that at any given time only one of Lisp and Emacs is able to cause a state change (or otherwise perform a state-dependent operation). If both are able to, then they could do them at the same time, and then they would each push/pop their stacks in a different order and lose synchronization.
That's the sort of race condition we mean. When the stacks go out of sync, chaos ensues (or would if not caught by assertions).
We've considered three ways to cope with this:
Ensure that only one of Emacs and Lisp is allowed to talk at a time. This worked well in the beginning, but it's now breaking down.
Remove enough state from Emacs so that races can be tollerated. This is an appealing ideal, but no specifics have been discussed.
Add a mechanism to the protocol to detect out-of-order events and resolve them in some deterministic way. This has been the subject of recent mails.
The thing that seems hard about removing state from Emacs is that some operations are state-dependent. For example:
If Lisp is "busy" evaluating an RPC, we won't bother with things like fetching arglists. That would just create a backlog of requests that probably aren't interesting by the time they're done. (Though Helmut's idea of sending TCP-OOB requests to be served by a signal handler sounds like fun :-)
Our debugger wants to know if Lisp is sitting in the debugger loop. If it has started doing something else then the backtrace we present in our debug buffer is wrong.
Possibly these could be solved in some simple and clever way.
I'm not sure right now whether all of our race conditions span short time frames (as in network latency). There aren't any documented cases of them occuring in the wild as far as I know. Still, we must have a correct protocol (non-robust ones are well known to piss people off royally), and already people are running SLIME with Emacs and Lisp on separate machines so latency isn't necessarily on the order of a millisecond.
-Luke