[slime-devel] Shutting down Swank within an Image.

So I've been looking into the problem of restarting a lisp image saved from Slime. It appears to me that Swank is still blocked on the socket listening for connections in the restarted image in some cases (I'm not sure why it's inconsistent). As a result when the image is restarted it's stuck listening on a connection that will never receive data and can't execute the Swank initialization process to connect to the new socket which Slime has chosen. I was thinking that I would implement swank-backend:save-image to shutdown Swank before dumping the image (or possibly have the restart function shutdown any existing Swank processes) so there's nothing in the way of Swank being re-initialized when the image is loaded again. (Does this sounds like a reasonable thing to do?) I'm not sure how to do this though, there only seem to be methods for shutting down the Swank server, not individual connections. Looking at other implementations of save-image wasn't very illuminating so I seem to be missing something. It looks like the connection struct and the *connections* defvar might have what I need in it, but there is only sanctioned access to the most recently opened connection via `default-connection`. Presumably I would need to close _all_ *connections*? Can anyone point me in the right direction? Thanks, Andrew

As data point I can reliably restart images in which I've executed this form: (setf *restart-init-function* (lambda () (dolist (connection swank::*connections*) (format t "closing ~a~%" connection) (swank::close-connection connection nil nil)))) Without that the reliability ranges from "works most of the time" to "never ever works" depending on the OS I'm testing on. This solution uses a number of non exported swank features and I'm still not sure if it's the right direction to go. Any thoughts? Thanks, Andrew On Fri, Nov 4, 2011 at 8:56 AM, Andrew Myers <asm198@gmail.com> wrote:
So I've been looking into the problem of restarting a lisp image saved from Slime. It appears to me that Swank is still blocked on the socket listening for connections in the restarted image in some cases (I'm not sure why it's inconsistent). As a result when the image is restarted it's stuck listening on a connection that will never receive data and can't execute the Swank initialization process to connect to the new socket which Slime has chosen.
I was thinking that I would implement swank-backend:save-image to shutdown Swank before dumping the image (or possibly have the restart function shutdown any existing Swank processes) so there's nothing in the way of Swank being re-initialized when the image is loaded again. (Does this sounds like a reasonable thing to do?) I'm not sure how to do this though, there only seem to be methods for shutting down the Swank server, not individual connections. Looking at other implementations of save-image wasn't very illuminating so I seem to be missing something.
It looks like the connection struct and the *connections* defvar might have what I need in it, but there is only sanctioned access to the most recently opened connection via `default-connection`. Presumably I would need to close _all_ *connections*?
Can anyone point me in the right direction?
Thanks, Andrew

On Fri, 04 Nov 2011 06:13:25 -0700, Andrew Myers <asm198@gmail.com> wrote:
As data point I can reliably restart images in which I've executed this form:
(setf *restart-init-function* (lambda () (dolist (connection swank::*connections*) (format t "closing ~a~%" connection) (swank::close-connection connection nil nil))))
Without that the reliability ranges from "works most of the time" to "never ever works" depending on the OS I'm testing on. This solution uses a number of non exported swank features and I'm still not sure if it's the right direction to go. Any thoughts? Thanks, Andrew
On Fri, Nov 4, 2011 at 8:56 AM, Andrew Myers <asm198@gmail.com> wrote:
So I've been looking into the problem of restarting a lisp image saved from Slime. It appears to me that Swank is still blocked on the socket listening for connections in the restarted image in some cases (I'm not sure why it's inconsistent). As a result when the image is restarted it's stuck listening on a connection that will never receive data and can't execute the Swank initialization process to connect to the new socket which Slime has chosen.
I was thinking that I would implement swank-backend:save-image to shutdown Swank before dumping the image (or possibly have the restart function shutdown any existing Swank processes) so there's nothing in the way of Swank being re-initialized when the image is loaded again. (Does this sounds like a reasonable thing to do?) I'm not sure how to do this though, there only seem to be methods for shutting down the Swank server, not individual connections. Looking at other implementations of save-image wasn't very illuminating so I seem to be missing something.
It looks like the connection struct and the *connections* defvar might have what I need in it, but there is only sanctioned access to the most recently opened connection via `default-connection`. Presumably I would need to close _all_ *connections*?
Can anyone point me in the right direction?
I'm not sure this will help, but here's the way I create an executable image with swank that works (100% of the time). It's SBCL specific but I'm guessing you can generalize it. ;; Shut down Swank and anyone else by terminating all threads (dolist (thread (sb-thread:list-all-threads)) (unless (equal sb-thread:*current-thread* thread) (sb-thread:terminate-thread thread))) ;; Set the function to run on startup of the core executable (setf sb-ext:*init-hooks* (list #'start-the-servers)) ;; Dump core and exit (sb-ext:save-lisp-and-die *core* :executable t) Note that since all the threads have to be shut down first, you cannot do this from within Slime, but from the command line it is: sbcl make-core.lisp Jeff

I've got an updated version that seems to work quite well: (defun aux-save-image (image-name) (let ((old-restart excl:*restart-init-function*)) (setf excl:*restart-init-function* (lambda () (dolist (connection swank::*connections*) (format t "closing ~a~%" connection) (swank::close-connection connection nil nil)) (when old-restart (funcall old-restart)))) (excl:dumplisp :name image-name) (setf excl:*restart-init-function* old-restart))) Is there anything wrong with doing things this way? I get the same number of threads after I restart an image saved this way (i.e. no left overs from prior runs) and things don't hang. If this seems like a clean solution I may add a (close-all-swank-connections) method to the swank package and make the swank-backend:save-image implementation for Allegro do something like this. Does this sounds like a valid solution to the problem that other Lispers would be happy with? Andrew On Fri, Nov 4, 2011 at 10:39 AM, Jeffrey Cunningham <jeffrey@jkcunningham.com> wrote:
On Fri, 04 Nov 2011 06:13:25 -0700, Andrew Myers <asm198@gmail.com> wrote:
As data point I can reliably restart images in which I've executed this form:
(setf *restart-init-function* (lambda () (dolist (connection swank::*connections*) (format t "closing ~a~%" connection) (swank::close-connection connection nil nil))))
Without that the reliability ranges from "works most of the time" to "never ever works" depending on the OS I'm testing on. This solution uses a number of non exported swank features and I'm still not sure if it's the right direction to go. Any thoughts? Thanks, Andrew
On Fri, Nov 4, 2011 at 8:56 AM, Andrew Myers <asm198@gmail.com> wrote:
So I've been looking into the problem of restarting a lisp image saved from Slime. It appears to me that Swank is still blocked on the socket listening for connections in the restarted image in some cases (I'm not sure why it's inconsistent). As a result when the image is restarted it's stuck listening on a connection that will never receive data and can't execute the Swank initialization process to connect to the new socket which Slime has chosen.
I was thinking that I would implement swank-backend:save-image to shutdown Swank before dumping the image (or possibly have the restart function shutdown any existing Swank processes) so there's nothing in the way of Swank being re-initialized when the image is loaded again. (Does this sounds like a reasonable thing to do?) I'm not sure how to do this though, there only seem to be methods for shutting down the Swank server, not individual connections. Looking at other implementations of save-image wasn't very illuminating so I seem to be missing something.
It looks like the connection struct and the *connections* defvar might have what I need in it, but there is only sanctioned access to the most recently opened connection via `default-connection`. Presumably I would need to close _all_ *connections*?
Can anyone point me in the right direction?
I'm not sure this will help, but here's the way I create an executable image with swank that works (100% of the time). It's SBCL specific but I'm guessing you can generalize it.
;; Shut down Swank and anyone else by terminating all threads (dolist (thread (sb-thread:list-all-threads)) (unless (equal sb-thread:*current-thread* thread) (sb-thread:terminate-thread thread))) ;; Set the function to run on startup of the core executable (setf sb-ext:*init-hooks* (list #'start-the-servers)) ;; Dump core and exit (sb-ext:save-lisp-and-die *core* :executable t)
Note that since all the threads have to be shut down first, you cannot do this from within Slime, but from the command line it is:
sbcl make-core.lisp Jeff

On 4 November 2011 17:54, Andrew Myers <asm198@gmail.com> wrote:
I've got an updated version that seems to work quite well:
(defun aux-save-image (image-name) (let ((old-restart excl:*restart-init-function*)) (setf excl:*restart-init-function* (lambda () (dolist (connection swank::*connections*) (format t "closing ~a~%" connection) (swank::close-connection connection nil nil)) (when old-restart (funcall old-restart)))) (excl:dumplisp :name image-name) (setf excl:*restart-init-function* old-restart)))
Is there anything wrong with doing things this way? I get the same
Closing a connection on initialization is risky. Consider: what if the underlying FD has is already in use in the new image? You should clean up before you save, not when you initialize. Cheers, -- Nikodemus

Is this a problem in Lisp? I know in C that's an issue since you're just calling close on an integer but I had thought lisp would handle that more gracefully. Although I guess the behavior isn't specified since saving an image is outside the spec? The problem with cleaning up _before_ saving the image is that you can't use the image any more. I was hoping to have a solution that didn't require an exit and restart every time an image was dumped. Andrew On Fri, Nov 4, 2011 at 12:03 PM, Nikodemus Siivola <nikodemus@random-state.net> wrote:
On 4 November 2011 17:54, Andrew Myers <asm198@gmail.com> wrote:
I've got an updated version that seems to work quite well:
(defun aux-save-image (image-name) (let ((old-restart excl:*restart-init-function*)) (setf excl:*restart-init-function* (lambda () (dolist (connection swank::*connections*) (format t "closing ~a~%" connection) (swank::close-connection connection nil nil)) (when old-restart (funcall old-restart)))) (excl:dumplisp :name image-name) (setf excl:*restart-init-function* old-restart)))
Is there anything wrong with doing things this way? I get the same
Closing a connection on initialization is risky. Consider: what if the underlying FD has is already in use in the new image?
You should clean up before you save, not when you initialize.
Cheers,
-- Nikodemus

* Andrew Myers [2011-11-04 17:24] writes:
Is this a problem in Lisp? I know in C that's an issue since you're just calling close on an integer but I had thought lisp would handle that more gracefully. Although I guess the behavior isn't specified since saving an image is outside the spec?
Lisp and C use the same API to the OS so it's the same problem and there is no general solution.
The problem with cleaning up _before_ saving the image is that you can't use the image any more. I was hoping to have a solution that didn't require an exit and restart every time an image was dumped.
On Unix, one trick is to fork before saving. In the child process, clean up OS resources, dump the image and exit. The parent process continues normally. Helmut

On 4 November 2011 19:24, Andrew Myers <asm198@gmail.com> wrote:
Is this a problem in Lisp? I know in C that's an issue since you're just calling close on an integer but I had thought lisp would handle that more gracefully. Although I guess the behavior isn't specified since saving an image is outside the spec?
Well... It is possible that your implementation keeps track of all open "system" streams, and cleans up such objects on its own when you save a core. I would not know if it does that, but I would not rely on it either. If it doesn't do that, at some point closing a socket means calling close(2) or equivalent system call on the file descriptor / handle -- which means is risky on init.
The problem with cleaning up _before_ saving the image is that you can't use the image any more. I was hoping to have a solution that didn't require an exit and restart every time an image was dumped.
Unless you're on Windows, the time honored solution is to fork(2) before saving. If you can't do that, then digging into implementation specifics you may be able to nuke the connections without closing the file descriptors / handles. ...or you can just live dangerously. Just be aware that you're doing something potentially tricksy, so that if and when something mysteriously breaks down the road, you know where to look. :) Cheers, -- Nikodemus

Is this a problem in Lisp? I know in C that's an issue since you're just calling close on an integer but I had thought lisp would handle that more gracefully. Although I guess the behavior isn't specified since saving an image is outside the spec?
Well...
It is possible that your implementation keeps track of all open "system" streams, and cleans up such objects on its own when you save a core. I would not know if it does that, but I would not rely on it either.
If it doesn't do that, at some point closing a socket means calling close(2) or equivalent system call on the file descriptor / handle -- which means is risky on init.
This is pretty much what I was hoping it would do. I've emailed Franz technical support about what the behavior is in this scenario. I guess I won't be to surprised if they don't want to promise any particular behavior in this area. I'll look into the fork method, should have thought of that before, that would at least give us a working solution on Unix platforms (which is all I need personally). How does Slime feel about platform specific functionality like that?

* Andrew Myers [2011-11-07 12:46] writes:
I'll look into the fork method, should have thought of that before, that would at least give us a working solution on Unix platforms (which is all I need personally).
You may also want to look at contrib/swank-snapshot.lisp and how it works for SBCL.
How does Slime feel about platform specific functionality like that?
I don't know if Slime has already reached consciousness but Emacs's position is clear: GNU first; other platforms have low priority. Helmut

That suits me fine, that's pretty much my position too. Thanks for the tip on where to look for inspiration, I'll implement a fork based version and send you a patch. Andrew On Mon, Nov 7, 2011 at 10:47 AM, Helmut Eller <heller@common-lisp.net> wrote:
* Andrew Myers [2011-11-07 12:46] writes:
I'll look into the fork method, should have thought of that before, that would at least give us a working solution on Unix platforms (which is all I need personally).
You may also want to look at contrib/swank-snapshot.lisp and how it works for SBCL.
How does Slime feel about platform specific functionality like that?
I don't know if Slime has already reached consciousness but Emacs's position is clear: GNU first; other platforms have low priority.
Helmut
_______________________________________________ slime-devel site list slime-devel@common-lisp.net http://common-lisp.net/mailman/listinfo/slime-devel
participants (4)
-
Andrew Myers
-
Helmut Eller
-
Jeffrey Cunningham
-
Nikodemus Siivola