Luke Gorrie luke@member.fsf.org writes:
Helmut Eller e9626484@stud3.tuwien.ac.at writes:
What do you think about Gary Byers comment that multi-threading support could be seen as a special case of multi-session support?
Would it be easier if each thread had its own connection to Emacs? Each connection with its own state machine and buffers etc.
I hadn't taken it in before -- I reread it and it does sound appealing. With each Lisp thread having a separate socket, they shouldn't have to do any synchronization amongst themselves for our benefit.
The first question that occurs is how to do the flow control. I wonder if it could be done neatly by putting the "foreground" Emacs network-process into asynchronous/process-filter mode, and switch the others to synchronous I/O and not read from them?
Hmm, not sure if I understand the problem. I think, if every thread has it's own connection with a separate state machine and associated buffers, we can do almost everything like we do now. We just have to make sure the we do it in the right buffer, something like per-session variables. And of course, we need a way to allow the thread to initiate the new connection.
Also, by allowing more things to happen asynchronously, it seems like we're getting more race-conditions and it could complicate the state machine quite a lot. I'm starting to wonder if we could move some of the state out of Emacs and into Lisp alone, so that it doesn't have to be synchronized. I haven't got any good examples yet.
Yes, if we have only one state machine on the Emacs side, it's probably better to move the protocol checking to the Lisp side.
BTW, I notice you replied privately -- better not to air too much dirty laundry on the list, do you think?
This was not intended, I must have pressed the wrong key.
Helmut.
Helmut Eller e9626484@stud3.tuwien.ac.at writes:
Hmm, not sure if I understand the problem. I think, if every thread has it's own connection with a separate state machine and associated buffers, we can do almost everything like we do now. We just have to make sure the we do it in the right buffer, something like per-session variables. And of course, we need a way to allow the thread to initiate the new connection.
Perhaps I didn't understand the idea. I was thinking of "if ten threads hit the debugger at once, we don't want the user to be preempted with ten different debugger buffers".
But now I see the light! We could create ten debugger buffers, but not actually pop them up unless it's from the nominated current/foreground thread. Then the whole notion of foreground/background threads is just a window-management issue, and doesn't involved any fancy mutex hacking or any of that road to hell I embarked on over the weekend :-)
Sounds good so far :-)
-Luke
Luke Gorrie luke@bluetail.com writes:
Helmut Eller e9626484@stud3.tuwien.ac.at writes:
Hmm, not sure if I understand the problem. I think, if every thread has it's own connection with a separate state machine and associated buffers, we can do almost everything like we do now. We just have to make sure the we do it in the right buffer, something like per-session variables. And of course, we need a way to allow the thread to initiate the new connection.
Perhaps I didn't understand the idea. I was thinking of "if ten threads hit the debugger at once, we don't want the user to be preempted with ten different debugger buffers".
But now I see the light! We could create ten debugger buffers, but not actually pop them up unless it's from the nominated current/foreground thread. Then the whole notion of foreground/background threads is just a window-management issue, and doesn't involved any fancy mutex hacking or any of that road to hell I embarked on over the weekend :-)
Sounds good so far :-)
Yes, this sounds (after being passed through my own biases) like what I was thinking of. To be clear, I was thinking of something like this:
Have one (or maybe two) threads dedicated to talking to emacs. (Two if the thread that reads expressions *from* emacs is going to blocked on IO). When an expression comes in from emacs it is dispatched to the appropriate "execution" thread by putting it on a queue associated with that thread. (These queues are implemented as monitors since it is going to have at least two threads touching it--one producer and one consumer). Similarly there is a queue for messages bound for emacs.
Other threads go about their business. Any thread that needs to communicate with emacs creates the queue mentioned above. When it expects something from emacs (say it's entered the debugger) it does a blocking GET on its queue. Eventually the user does whatever thing is required to cause emacs to send the event addressed to that thread that will allow that thread to take a step in the protocol. That event will be dropped on the thread's queue by the IO thread, the GET will return, and the thread does whatever it does. When it wants to send something *to* emacs, it puts an event on the to-emacs queue which the outbound-IO thread picks up and sends to emacs.
On the emacs side a similar dispatching is happening except instead of messages being handed off to threads via queues they are handed to a per-buffer statemachine.
The nice thing about this approach is that it allows arbitrary threads to send events to emacs just be throwing them on the outbound (toward emacs) queue. Thus I can spin up a thread that periodically sends a message to emacs to be dropped in the associate buffer even if that thread never gets any events *from* emacs.
And the bits that need to be thread-safe are limited to the queues. The only tricky bit (depending on the primitives for blocking or non-blocking IO) is coordinating the input-from and output-to emacs threads. In the worst case scenario you simply have two, one that blocks on reads and dispatches events to the appropriate thread queues and another that blocks in a GET on the to-emacs queue and then writes on the socket when an event is available. Depending on how the socket data structure is implemented that may require another mutex to keep those two threads from stomping on each other. Or not. I'd actually expect not but who knows.
-Peter
P.S. I'm only contributing this stuff on the hope that it will be useful or at least will spark some ideas among the folks actually cutting code--until I have some time to put my hacking where my mouth is I realize I'm a pure kibitzer.
Helmut Eller e9626484@stud3.tuwien.ac.at writes:
Also, by allowing more things to happen asynchronously, it seems like we're getting more race-conditions and it could complicate the state machine quite a lot. I'm starting to wonder if we could move some of the state out of Emacs and into Lisp alone, so that it doesn't have to be synchronized. I haven't got any good examples yet.
Yes, if we have only one state machine on the Emacs side, it's probably better to move the protocol checking to the Lisp side.
Been thinking about this a bit.
Even in the single-threaded case the state machine is becoming awkward as we get more ambitious. Originally our protocol push-down automaton was very simple, with just these states and transitions:
IDLE: push into EVALUATING when Emacs makes an RPC EVALUATING: pop on result (or abort) arriving from Lisp push into DEBUGGING if Lisp enters the debugger DEBUGGING: pop when restart causes exit from debugger push into EVALUATING when Emacs makes an RPC
Very simple, and it has been very good during debugging that illegal transitions are nicely detected.
The trouble is that it didn't cope with Lisp initiating "asynchronous" transitions, as does happen in real life. In general, we end up either disallowing these things, or introducing protocol-level race conditions to permit them with extra transitions.
Some examples:
IDLE -> DEBUGGING (added for asynchronously occuring errors in Lisp) Introduces a protocol race condition between Emacs sending an RPC and Lisp entering the debugger.
Lack of: DEBUGGING -> DEBUGGING DEBUGGING -> READ-STRING IDLE -> READ-STRING READ-STRING -> DEBUGGING Without these transitions, certain things that can actually happen due to asynchronous activities (such as SERVE-EVENT, signal handlers, etc) will cause a protocol violation panic.
:ONEWAY-EVALUATE event in IDLE, DEBUGGING, READ-STRING More of a code factoring issue perhaps, but adding a new way to evaluate expressions meant adding transitions separately to three states.
One trend I see is that states increasingly have a lot of transitions in common, e.g. anything could transition into the debugger. We may end up with so many common transitions that it's easier to say which transitions _aren't_ allowed in each state -- in which case maybe we end up rewriting the state machine as a single function.
The other thing is that there are fundamental race conditions, because in reality both Emacs and Lisp can initiate transitions, potentially at the same time. We need to cook up a scheme for resolving these races in a sane way.
-Luke
Luke Gorrie luke@bluetail.com writes:
[...]
Very simple, and it has been very good during debugging that illegal transitions are nicely detected.
The trouble is that it didn't cope with Lisp initiating "asynchronous" transitions, as does happen in real life. In general, we end up either disallowing these things, or introducing protocol-level race conditions to permit them with extra transitions.
Yes, that's a good observation. Also, disallowing asynchronous things has not worked very well. People happily use the *inferior-lisp* buffer and multi-threading, even if we say it's not supported :-)
[...]
One trend I see is that states increasingly have a lot of transitions in common, e.g. anything could transition into the debugger.
Yeah, I noticed this too.
We may end up with so many common transitions that it's easier to say which transitions _aren't_ allowed in each state -- in which case maybe we end up rewriting the state machine as a single function.
Interesting point of view. The first question that arises is "what transition should be disabled?" We probably shouldn't allow evaluation requests when Lisp is already busy. It seems to be preferable to do this check on the Emacs side, because it isn't cool if slime-space makes a RPC just to discover that Lisp is busy. OTOH, this is difficult with asynchronous events, because the automaton on the Emacs side may be out of sync with Lisp's actual state, e.g., Lisp may execute an endless loop in an fd-handler, but Emacs thinks Lisp is idle. It is not easy to tell Emacs everything what Lisp is doing.
The other thing is that there are fundamental race conditions, because in reality both Emacs and Lisp can initiate transitions, potentially at the same time. We need to cook up a scheme for resolving these races in a sane way.
Good point. I think, it will be easier resolve these race conditions on the Lisp side, because it is easier to access the actual state.
One idea is that we include a "stack descriptor" in each message. A stack descriptor could be something like (idle eval debugging) or a string with the initial letters of the states "IED". To detect race condition we compare the stack descriptor in the message with the local stack. We know there's a race condition (or some other problem) if the descriptors don't match. The problem can than be handled depending on the current state and the message.
Another idea that might be useful: we could use signal driven I/O on the on the socket, so that Lisp responds to our messages even when it is busy. This might also be useful if Lisp is on a different host. Hemlock uses oob data in this situation, but alas, Emacs doesn't support that.
Helmut.
Helmut Eller e9626484@stud3.tuwien.ac.at writes:
One idea is that we include a "stack descriptor" in each message. A stack descriptor could be something like (idle eval debugging) or a string with the initial letters of the states "IED".
I haven't thought this through yet -- busily cleaning out my office in preparation for a move and fun stuff like that :-). But lemme float an idea I had been thinking of:
Suppose we designate that Lisp has the "master" state and Emacs is essentially a replica, so race conditions are resolved by Emacs backing off its changes and syncing with Lisp. A problem statement is then:
If Lisp and Emacs both make transitions simultaneously, Lisp should ignore the transition from Emacs, and Emacs should abort/rollback its own transition and then process the transition from Lisp.
(I have no argument for the correctness of that prepared, but it sounds right.. :-))
This could be implemented with sequence numbers. Suppose each side numbers the transitions it initiates in sequence 0, 1, 2, etc. Each transition message includes two numbers: its own sequence number, and an acknowledgement number -- the highest sequence number received from the other side at the time the message was sent.
We could add this to the current protocol without changing its semantics. Then we could detect race conditions: if you get a message with an acknowledgement number lower than the last message that you sent, then both messages were "on the wire" at the same time and you need to resynchronize.
An algorithm to resynchronize is:
If Lisp receives a message from Emacs with an old acknowledgement number, it discards it.
If Emacs receives such a message, it aborts (rolls back) all state from transitions it has initiated with higher sequence numbers than the acknowledgement number. Now Emacs is in the same state as Lisp was when it sent the message, so the message can be safely processed.
The overall effect should be that Emacs and Lisp can only reach inconsistent states temporarily.
It requires Emacs to keep track of all unacknowledged transitions it has made, and to have a way of rolling them back. At a glance this doesn't seem very hard -- EVALUTING state transitions could be rolled back as with the :ABORT event; DEBUGGING and READ-STRING have simple analogous "Lisp returned; pop the stack" transitions.
Just an idea...
Luke Gorrie luke@bluetail.com writes:
Helmut Eller e9626484@stud3.tuwien.ac.at writes:
One idea is that we include a "stack descriptor" in each message. A stack descriptor could be something like (idle eval debugging) or a string with the initial letters of the states "IED".
I haven't thought this through yet -- busily cleaning out my office in preparation for a move and fun stuff like that :-). But lemme float an idea I had been thinking of:
Suppose we designate that Lisp has the "master" state and Emacs is essentially a replica, so race conditions are resolved by Emacs backing off its changes and syncing with Lisp. A problem statement is then:
What do you guys mean by "race condition"? (In the context of SLIME, that is--I know what a race condition is in general.)
-Peter
Hey, aren't you supposed to be writing your book? ;-)
What do you guys mean by "race condition"? (In the context of SLIME, that is--I know what a race condition is in general.)
I guess you've seen my recent description of the push-down automaton that keeps track of protocol state. This can be seen as a representation in Emacs of relevant parts of the Lisp stack. For example, when we push into the EVALUATING state in Emacs we ask Lisp to call this (abbreviated) function:
(defslimefun eval-string (string buffer-package) (let (ok result) (unwind-protect (setq result (eval (read-form string))) (setq ok t)) (send-to-emacs (if ok `(:ok ,result) '(:aborted))))))
Before the corresponding Lisp stack frame returns it will send either (:ok RESULT) or (:aborted) to Emacs. When Emacs receives either message, it will pop its EVALUATING state off the stack. In this way the two stacks stay synchronized, and Emacs knows whether it is in the debugger, or waiting on an RPC result, etc.
The protocol is free of race conditions provided that at any given time only one of Lisp and Emacs is able to cause a state change (or otherwise perform a state-dependent operation). If both are able to, then they could do them at the same time, and then they would each push/pop their stacks in a different order and lose synchronization.
That's the sort of race condition we mean. When the stacks go out of sync, chaos ensues (or would if not caught by assertions).
We've considered three ways to cope with this:
Ensure that only one of Emacs and Lisp is allowed to talk at a time. This worked well in the beginning, but it's now breaking down.
Remove enough state from Emacs so that races can be tollerated. This is an appealing ideal, but no specifics have been discussed.
Add a mechanism to the protocol to detect out-of-order events and resolve them in some deterministic way. This has been the subject of recent mails.
The thing that seems hard about removing state from Emacs is that some operations are state-dependent. For example:
If Lisp is "busy" evaluating an RPC, we won't bother with things like fetching arglists. That would just create a backlog of requests that probably aren't interesting by the time they're done. (Though Helmut's idea of sending TCP-OOB requests to be served by a signal handler sounds like fun :-)
Our debugger wants to know if Lisp is sitting in the debugger loop. If it has started doing something else then the backtrace we present in our debug buffer is wrong.
Possibly these could be solved in some simple and clever way.
I'm not sure right now whether all of our race conditions span short time frames (as in network latency). There aren't any documented cases of them occuring in the wild as far as I know. Still, we must have a correct protocol (non-robust ones are well known to piss people off royally), and already people are running SLIME with Emacs and Lisp on separate machines so latency isn't necessarily on the order of a millisecond.
-Luke
Helmut Eller writes:
has not worked very well. People happily use the *inferior-lisp* buffer and multi-threading, even if we say it's not supported :-)
I'd be happy to dump *inferior-lisp*, but it's the only way I can make mp::startup-idle-and-top-level-loops work with CMUCL. If I evaluate it from *slime-repl*, I no longer get the prompt back.
Paolo
Paolo Amoroso amoroso@mclink.it writes:
I'd be happy to dump *inferior-lisp*, but it's the only way I can make mp::startup-idle-and-top-level-loops work with CMUCL. If I evaluate it from *slime-repl*, I no longer get the prompt back.
The thing is that mp::startup-idle-and-top-level-loops never actually returns. It goes into an endless SERVE-EVENT loop, and spawns a separate thread to take over the top-level. You can see this change of processes from a fresh lisp:
* (mp:current-process) #<Process Initial {48006B35}> * (mp::startup-idle-and-top-level-loops) * (mp:current-process) #<Process Top Level Loop {485F2FA5}> *
or in the absence of a result here
* (progn (mp::startup-idle-and-top-level-loops) 'done) *
This breaks in the REPL because we call the function as an RPC, and damned well expect to get a return value :-)
But if you (setq slime-multiprocessing t) in Emacs, mp::startup-idle-and-top-level-loops will get called automatically (via *inferior-lisp*) during slime startup.
-Luke