TL;DR; Doesn't seem to work reliably. Interrupts, for example in slime, don't reliably get acted on. Sometimes threads are killed instead of interrupted. Some effort to amend this is discussed below.
Background:
There are two interrupt systems in ABCL, one defined on org.armedbear.lisp.Lisp, and one on org.armedbear.lisp.LispThread. It is unclear why there are two parallel systems.
The interrupt defined on org.armedbear.lisp.Lisp works as follows:
Call (interrupt-lisp) A object variable "interrupted" is set. At several places in the interpreter (eval, tagbody) the variable is checked. When compiling code - typically at branch points - code is emitted to check the variable, and if set, handleInterrupts is called. handleInterrupts starts a break loop. It's not clear which thread that will happen in, since there is a delay between when an interrupt is signaled and when the break is called, so you might be in a different thread than the one in which interrupt-lisp is called.
This interrupt system is *not* the one used in slime or bordeaux threads.
The second interrupt system, which *is* used by slime and bordeaux uses an instance variable on LispThread called "threadInterrupted". One calls (interrupt-thread thread function &rest args) to interrupt. At that point the variable threadInterrupted is set to true and the function is queued for eventual (hopefully prompt) execution. Then the java thread built-in interrupt call is made, which also sets some state in the thread (JVM internal) indicating a request for interrupt.
Java's interrupt is checked by Java in some set of internals that wait, such as when sleep is called, or presumably when there is blocking IO, or by explicit checks by user code. When detected by java an InterruptedException is thrown. When doing user checked the user has to throw the exception. It is intended that there are exception handlers to process the interrupts.
In several places ABCL explicitly catches the exception and calls processThreadInterrupts which executes the queued set of interrupt functions and should then proceed in whatever it was doing. Presumably the issue with threads dying is when an Interrupted Exception is not handled.
Handling of InterruptException happens at only a few points - thread-join, sleep, object-wait. Unlike as with lisp-interrupt the compiler does *not* generate code to check. As a consequence if a thread is doing anything else it will not notice the exception. This leads to a poor interactive experience - slime's control-c often does not work in a timely manner.
Handling the interrupt can be tricky if called at the wrong time as there is no guarantee that other lisp state is consistent.
Related to this the implementation of destroy thread is suboptimal. Bordeaux threads documents that it is implementation-defined whether unwind-protected forms are handled on destroy. Practically it seems many implementations do, as the package lparallel (highly recommended!) depends on that and implements the ability to kill worker threads on all supported platforms *except* than ABCL.
Thread destroying is implemented in a similar manner to interrupts. There is a variable that is set when the destruction is requested, with a single place where it is checked an acted on (beginning of eval). When detected a ThreadDestroyed exception is thrown. ThreadDestroyed is caught at the top of the thread's execution, which seems to explain why unwind protects are not handled. That the only check is in eval means that if you are executing only compiled code the thread will not actually be destroyed.
---
As you can see, this is something of a mess. I've made some initial attempts to remedy it but am not confident enough that they are the correct way of doing things, or whether they will work reliably. The changes are:
1) Whenever Lisp interrupts are checked, also check for thread interrupts. There are a few places in the java code that do this and I simply add a check for thread interrupts at those points as well. In addition I modify the code generation so that when checks for lisp interrupts are generated as part of compiled code, I also generate a check for thread interrupts.
2) Don't call JVMs thread.interrupt. The benefit of not calling it is that you remove the possibility that the lisp will be in an inconsistent state when it handles the interrupt. The disadvantage is that you won't be able to interrupt anything that's not in lisp code.
3) Have destroy-thread use interrupt-thread to throw to a new catch tag which is set around the thread run function. There is already a provision for defining a wrapper around a thread's run function - a lisp function called (unsurprisingly) THREAD-FUNCTION-WRAPPER. Currently it has an abort restart handler. However destroying a thread is not necessarily the same as abort.
Since the throw works correctly wrt unwind protect and other lisp state, behavior of destroy is predictable - active unwind-protections are run before the thread exits.
Note: the ThreadDestroyed exception is never called now.
--- The above seem to work "ok" but haven't been extensively tested. Responsiveness to slime's control-c is often fast, and lparallel can kill worker threads.
So the first question I have is: Is there something I've missed, or does it seem like the strategy above should work. Is anybody doing thread-heavy work that you could test this against.
Having done the above and verified that it passes the smoke test, I'd like to enable using java's thread.interrupt(), so that interrupts can happen even when JVM or foreign java code is running. I've enabled it to see what would happen and things work ok in some cases, but there are gaps. For example, in one case lparallel code complained about working with a lock.
So the second question is: What needs to be done to have thread.interrupt be reliable.
My current speculation is that the way to handle it is in code generation. The code generation for unwind protect already uses exception catching to for unwind protect and some other cases. One thought is to, effectively, automatically add more cleanup code wherever the compiler generates and (JVM Exception) catch. However the compiler is complicated and I don't understand enough of how it works yet.
The differences, as they stand now, can be seen at https://github.com/armedbear/abcl/compare/master...alanruttenberg:thread-int...
Comments, suggestions, help, would very much be welcomed.
Regards, Alan Ruttenberg
armedbear-devel@common-lisp.net