Next message by Gabor on same thread...
[ François-René ÐVB Rideau | Reflection&Cybernethics | http://fare.tunes.org ] In reviewing the history of the English Government, its wars and its taxes, a bystander, not blinded by prejudice nor warped by interest, would declare that taxes were not raised to carry on wars, but that wars were raised to carry on taxes. -- Thomas Paine, "Rights of Man"
---------- Forwarded message ---------- From: Gábor Melis mega@hotpop.com Date: 15-Sep-2005 11:08 Subject: Re: [Sbcl-devel] async signals To: sbcl-devel@lists.sourceforge.net
Time flies. With #lisp becoming immune to async unwind, it's now sbcl-devil's time to think.
The following might become part of the internals manual if the patch goes in, hence the repetition from earlier posts.
* Issues with asynchronous unwinds (AUs)
Consider the following example:
(with-timeout 1 (let (fd) (unwind-protect (progn (setq fd (open ...)) ...) (when fd (close fd)))))
There are several ways things can go wrong:
** (SETQ FD (OPEN ...)) can be interrupted and unwound just after OPEN returns but before SETQ is done
** foreign code may not like being aborted: imagine a Berkeley db update call in the second ..., it will not like being unwound in the middle
** cleanups may not be run: if the cleanup is interrupted it can be unwound without closing the fd
** AUs can clobber each other
Suppose we have a thread that can be terminated cleanly:
(defun greet () (unwind-protect (loop (write-line "Hello, world") (sleep 1)) (write-line "Good bye")))
(let ((thread (make-thread #'greet))) (sleep (random 6)) (terminate-thread thread))
So far so good. But what if two other threads try to terminate it at the same time? The second terminate can hit while the first unwinding is in still progress. This is not a problem since the target of the two throws is the same. Now, what happens if there is another AU with a different target?
(defun greet-impatiently () (handler-case (sb-ext:with-timeout 3 (greet)) (sb-ext:timeout ())))
Let's try terminating it:
(let ((thread (make-thread #'greet-impatiently))) (sleep (random 6)) (terminate-thread thread))
There are several possible outcomes, but the most interesting is when TERMINATE-THREAD starts unwinding then the timeout hits does another NLX to the HANDLER-CASE and the thread termination request is lost. Note that the UNWIND-PROTECT in GREET is not strictly needed for this scenario to occur.
In general it is very hard to write reliable programs when multiple AUs play and can cancel or steer away (see http://www.lisp.org/HyperSpec/Issues/iss152-writeup.html) an ongoing unwind.
* Implementation
** Interruption
By definition an interruption is a function that is invoked asynchronously.
** AU unsafe zone
A thread is said to be in AU unsafe zone iff execution is within a WITHOUT-ASYNCHRONOUS-UNWIND form, an UNWIND-PROTECT cleanup or it's unwinding.
Note that there can be multiple simultaneous unwinds:
(catch 'aaa (unwind-protect (throw 'aaa 'a) (catch 'bbb (throw 'bbb 'b))))
The implementation (on x86 only for the time being) keeps track of the outermost unwind-protect block that's being unwound to.
If an AU occurs in an unsafe zone then the current AU handlers get to decide what happens.
** AU handlers
Interruptions are run in a pretty normal dynamic environment with all the condition handlers that were setup in the thread. When an NLX occurs that leaves the interruption UNSAFE-ASYNCHRONOUS-UNWIND (a SERIOUS-CONDITION) is signalled if the interrupted thread is in an unsafe zone.
Now if this condition was handled by the normal condition handlers a simple HANDLER-CASE for SERIOUS-CONDITION could unknowingly unwind the stack when it's unsafe. Hence, UNSAFE-ASYNCHRONOUS-UNWIND is signalled with a different set of condition handlers active. These handlers can be established by ASYNCHRONOUS-UNWIND-HANDLER-BIND or from within the interruption with PUSH-INTERRUPTION-UNWIND-HANDLERS, POP-INTERRUPTION-UNWIND-HANDLERS, the setfable INTERRUPTION-UNWIND-HANDLERS. (Bleh. I can see no way around this.)
These handlers are run, protected by a WITHOUT-INTERRUPTS, when the interruption is about to be left (from an UNWIND-PROTECT cleanup around the interruption, in fact). The *only* restarts available are RETRY-LATER, ABORT (aborts the unwind) and CONTINUE (forces the unsafe unwind to continue). The AU handler may do an NLX by any of the usual suspects (THROW, RETURN-FROM, GO), but if it signals a condition only the AU handlers are there to help.
Behind the scenes interruptions are run by INVOKE-INTERRUPTION that on a NLX from the interruption signals an UNSAFE-ASYNCHRONOUS-UNWIND (a SERIOUS-CONDITION).
** The RETRY-LATER restart
The default AU handler invokes this restart.
This is similar to the ABORT restart, but it sets up a timer to run the whole interruption again in a short time (~0.1s, randomized). Alternatively the implementation could poll for interruptions when the unsafe zone is potentially exited. This is not done since detecting unsafe zones is currently expensive, because the stack is searched for cleanup frames.
** Foreign code
Foreign code is to be wrapped in WITHOUT-ASYNCHRONOUS-UNWIND by default. System calls are a different: they can be interrupted without ill effects.
** Examples
Consider the slightly modified example:
(with-timeout 1 (let (fd) (unwind-protect (progn (without-asynchronous-unwind () (setq fd (open ...))) ...) (when fd (close fd)))))
This is bullet-proof wrt to open/close. The problems outlined above are gone: the fd cannot be lost by unwinding just after OPEN returns but before the SETQ is completed; the cleanup and CLOSE within it is guaranteed to run without much disturbance.
But an unwind due to timeout cannot happen in the AU unsafe zones, and that means OPEN and maybe CLOSE should take a timeout argument.
Also note that WITH-TIMEOUT doesn't work in cleanup forms (!), so this will not return in one second if CLOSE runs into problems:
(unwind-protect ... (with-timeout 1 (close fd)))
One can work around that by:
(unwind-protect ... (asynchronous-unwind-handler-bind ((unsafe-asynchronous-unwind #'continue)) (with-timeout 1 (close fd))))
but more care is needed as it allows an ongoing unwind to be lost.
* An alternative: double cross implementation
It can be argued that an AU becomes a problem only if it would cross a WITHOUT-ASYNCHRONOUS-UNWIND, a cleanup, or cancel/steer away an ongoing unwind (i.e. it crosses an border of an interruption and an unsafe zone delimiter).
Ultimately, it is about user expectations about what can go wrong and when. It's unclear if such a weakening of semantic the guarantees is worthwhile. In the above example if CLOSE has a HANDLER-CASE for ERROR then an AU can still make it fail without closing the fd.
* TODO
** detection of cleanups is slow (walks the stack)
** detection of cleanups is racy
And here is the patch: http://retes.hu/~mega/au.patch
It has just reached a state where one can experiment with C-c-ing:
(without-asynchronous-unwind () (sleep 5)) (unwind-protect t (sleep 5))
and get reasonable behaviour, so it must be ready for comments.
Cheers, Gábor
------------------------------------------------------- SF.Net email is sponsored by: Tame your development challenges with Apache's Geronimo App Server. Download it for free - -and be entered to win a 42" plasma tv or your very own Sony(tm)PSP. Click here to play: http://sourceforge.net/geronimo.php _______________________________________________ Sbcl-devel mailing list Sbcl-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/sbcl-devel