[erlisp-devel] Fwd: [Sbcl-devel] async signals

15 Sep 2005

      Next message by Gabor on same thread...

[ François-René ÐVB Rideau | Reflection&Cybernethics | http://fare.tunes.org ]
In reviewing the history of the English Government, its wars and its taxes,
a bystander, not blinded by prejudice nor warped by interest, would declare
that taxes were not raised to carry on wars, but that wars were raised
to carry on taxes.              -- Thomas Paine, "Rights of Man"

---------- Forwarded message ----------
From: Gábor Melis <mega@hotpop.com>
Date: 15-Sep-2005 11:08
Subject: Re: [Sbcl-devel] async signals
To: sbcl-devel@lists.sourceforge.net

Time flies. With #lisp becoming immune to async unwind, it's now
sbcl-devil's time to think.

The following might become part of the internals manual if the patch
goes in, hence the repetition from earlier posts.

* Issues with asynchronous unwinds (AUs)

  Consider the following example:

    (with-timeout 1
      (let (fd)
        (unwind-protect
             (progn
               (setq fd (open ...))
               ...)
          (when fd
            (close fd)))))

   There are several ways things can go wrong:

** (SETQ FD (OPEN ...)) can be interrupted and unwound just after OPEN
   returns but before SETQ is done

** foreign code may not like being aborted: imagine a Berkeley db
   update call in the second ..., it will not like being unwound in
   the middle

** cleanups may not be run: if the cleanup is interrupted it can be
   unwound without closing the fd

** AUs can clobber each other

  Suppose we have a thread that can be terminated cleanly:

    (defun greet ()
      (unwind-protect
          (loop (write-line "Hello, world") (sleep 1))
        (write-line "Good bye")))

    (let ((thread (make-thread #'greet)))
     (sleep (random 6))
     (terminate-thread thread))

  So far so good. But what if two other threads try to terminate it at
  the same time? The second terminate can hit while the first unwinding
  is in still progress. This is not a problem since the target of the
  two throws is the same. Now, what happens if there is another
  AU with a different target?

    (defun greet-impatiently ()
      (handler-case
          (sb-ext:with-timeout 3
            (greet))
        (sb-ext:timeout ())))

  Let's try terminating it:

    (let ((thread (make-thread #'greet-impatiently)))
     (sleep (random 6))
     (terminate-thread thread))

  There are several possible outcomes, but the most interesting is when
  TERMINATE-THREAD starts unwinding then the timeout hits does another
  NLX to the HANDLER-CASE and the thread termination request is
  lost. Note that the UNWIND-PROTECT in GREET is not strictly needed for
  this scenario to occur.

  In general it is very hard to write reliable programs when multiple
  AUs play and can cancel or steer away (see
  http://www.lisp.org/HyperSpec/Issues/iss152-writeup.html) an ongoing
  unwind.

* Implementation

** Interruption

  By definition an interruption is a function that is invoked
  asynchronously.

** AU unsafe zone

  A thread is said to be in AU unsafe zone iff execution is within a
  WITHOUT-ASYNCHRONOUS-UNWIND form, an UNWIND-PROTECT cleanup or it's
  unwinding.

  Note that there can be multiple simultaneous unwinds:

    (catch 'aaa
      (unwind-protect
           (throw 'aaa 'a)
        (catch 'bbb
          (throw 'bbb 'b))))

  The implementation (on x86 only for the time being) keeps track of
  the outermost unwind-protect block that's being unwound to.

  If an AU occurs in an unsafe zone then the current AU handlers get to
  decide what happens.

** AU handlers

  Interruptions are run in a pretty normal dynamic environment with
  all the condition handlers that were setup in the thread. When an
  NLX occurs that leaves the interruption UNSAFE-ASYNCHRONOUS-UNWIND
  (a SERIOUS-CONDITION) is signalled if the interrupted thread is in
  an unsafe zone.

  Now if this condition was handled by the normal condition handlers a
  simple HANDLER-CASE for SERIOUS-CONDITION could unknowingly unwind
  the stack when it's unsafe. Hence, UNSAFE-ASYNCHRONOUS-UNWIND is
  signalled with a different set of condition handlers active. These
  handlers can be established by ASYNCHRONOUS-UNWIND-HANDLER-BIND or
  from within the interruption with PUSH-INTERRUPTION-UNWIND-HANDLERS,
  POP-INTERRUPTION-UNWIND-HANDLERS, the setfable
  INTERRUPTION-UNWIND-HANDLERS. (Bleh. I can see no way around this.)

  These handlers are run, protected by a WITHOUT-INTERRUPTS, when the
  interruption is about to be left (from an UNWIND-PROTECT cleanup
  around the interruption, in fact). The *only* restarts available are
  RETRY-LATER, ABORT (aborts the unwind) and CONTINUE (forces the
  unsafe unwind to continue). The AU handler may do an NLX by any of
  the usual suspects (THROW, RETURN-FROM, GO), but if it signals a
  condition only the AU handlers are there to help.

  Behind the scenes interruptions are run by INVOKE-INTERRUPTION that
  on a NLX from the interruption signals an UNSAFE-ASYNCHRONOUS-UNWIND
  (a SERIOUS-CONDITION).

** The RETRY-LATER restart

  The default AU handler invokes this restart.

  This is similar to the ABORT restart, but it sets up a timer to run
  the whole interruption again in a short time (~0.1s,
  randomized). Alternatively the implementation could poll for
  interruptions when the unsafe zone is potentially exited. This is
  not done since detecting unsafe zones is currently expensive,
  because the stack is searched for cleanup frames.

** Foreign code

  Foreign code is to be wrapped in WITHOUT-ASYNCHRONOUS-UNWIND by
  default. System calls are a different: they can be interrupted
  without ill effects.

** Examples

   Consider the slightly modified example:

    (with-timeout 1
      (let (fd)
        (unwind-protect
             (progn
               (without-asynchronous-unwind ()
                 (setq fd (open ...)))
               ...)
          (when fd
            (close fd)))))

  This is bullet-proof wrt to open/close. The problems outlined above
  are gone: the fd cannot be lost by unwinding just after OPEN returns
  but before the SETQ is completed; the cleanup and CLOSE within it is
  guaranteed to run without much disturbance.

  But an unwind due to timeout cannot happen in the AU unsafe zones, and
  that means OPEN and maybe CLOSE should take a timeout argument.

  Also note that WITH-TIMEOUT doesn't work in cleanup forms (!), so this
  will not return in one second if CLOSE runs into problems:

    (unwind-protect
         ...
      (with-timeout 1
        (close fd)))

  One can work around that by:

    (unwind-protect
         ...
      (asynchronous-unwind-handler-bind
          ((unsafe-asynchronous-unwind #'continue))
        (with-timeout 1
          (close fd))))

  but more care is needed as it allows an ongoing unwind to be lost.

* An alternative: double cross implementation

  It can be argued that an AU becomes a problem only if it would cross
  a WITHOUT-ASYNCHRONOUS-UNWIND, a cleanup, or cancel/steer away an
  ongoing unwind (i.e. it crosses an border of an interruption and an
  unsafe zone delimiter).

  Ultimately, it is about user expectations about what can go wrong
  and when. It's unclear if such a weakening of semantic the
  guarantees is worthwhile. In the above example if CLOSE has a
  HANDLER-CASE for ERROR then an AU can still make it fail without
  closing the fd.

* TODO

  ** detection of cleanups is slow (walks the stack)

  ** detection of cleanups is racy

And here is the patch: http://retes.hu/~mega/au.patch

It has just reached a state where one can experiment with C-c-ing:

  (without-asynchronous-unwind () (sleep 5))
  (unwind-protect t (sleep 5))

and get reasonable behaviour, so it must be ready for comments.

Cheers, Gábor

-------------------------------------------------------
SF.Net email is sponsored by:
Tame your development challenges with Apache's Geronimo App Server. Download
it for free - -and be entered to win a 42" plasma tv or your very own
Sony(tm)PSP.  Click here to play: http://sourceforge.net/geronimo.php
_______________________________________________
Sbcl-devel mailing list
Sbcl-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/sbcl-devel

Faré

tags

participants (1)