So I've been looking into the problem of restarting a lisp image saved from Slime. It appears to me that Swank is still blocked on the socket listening for connections in the restarted image in some cases (I'm not sure why it's inconsistent). As a result when the image is restarted it's stuck listening on a connection that will never receive data and can't execute the Swank initialization process to connect to the new socket which Slime has chosen.
I was thinking that I would implement swank-backend:save-image to shutdown Swank before dumping the image (or possibly have the restart function shutdown any existing Swank processes) so there's nothing in the way of Swank being re-initialized when the image is loaded again. (Does this sounds like a reasonable thing to do?) I'm not sure how to do this though, there only seem to be methods for shutting down the Swank server, not individual connections. Looking at other implementations of save-image wasn't very illuminating so I seem to be missing something.
It looks like the connection struct and the *connections* defvar might have what I need in it, but there is only sanctioned access to the most recently opened connection via `default-connection`. Presumably I would need to close _all_ *connections*?
Can anyone point me in the right direction?
Thanks, Andrew
As data point I can reliably restart images in which I've executed this form:
(setf *restart-init-function* (lambda () (dolist (connection swank::*connections*) (format t "closing ~a~%" connection) (swank::close-connection connection nil nil))))
Without that the reliability ranges from "works most of the time" to "never ever works" depending on the OS I'm testing on. This solution uses a number of non exported swank features and I'm still not sure if it's the right direction to go. Any thoughts? Thanks, Andrew
On Fri, Nov 4, 2011 at 8:56 AM, Andrew Myers asm198@gmail.com wrote:
So I've been looking into the problem of restarting a lisp image saved from Slime. It appears to me that Swank is still blocked on the socket listening for connections in the restarted image in some cases (I'm not sure why it's inconsistent). As a result when the image is restarted it's stuck listening on a connection that will never receive data and can't execute the Swank initialization process to connect to the new socket which Slime has chosen.
I was thinking that I would implement swank-backend:save-image to shutdown Swank before dumping the image (or possibly have the restart function shutdown any existing Swank processes) so there's nothing in the way of Swank being re-initialized when the image is loaded again. (Does this sounds like a reasonable thing to do?) I'm not sure how to do this though, there only seem to be methods for shutting down the Swank server, not individual connections. Looking at other implementations of save-image wasn't very illuminating so I seem to be missing something.
It looks like the connection struct and the *connections* defvar might have what I need in it, but there is only sanctioned access to the most recently opened connection via `default-connection`. Presumably I would need to close _all_ *connections*?
Can anyone point me in the right direction?
Thanks, Andrew
On Fri, 04 Nov 2011 06:13:25 -0700, Andrew Myers asm198@gmail.com wrote:
As data point I can reliably restart images in which I've executed this form:
(setf *restart-init-function* (lambda () (dolist (connection swank::*connections*) (format t "closing ~a~%" connection) (swank::close-connection connection nil nil))))
Without that the reliability ranges from "works most of the time" to "never ever works" depending on the OS I'm testing on. This solution uses a number of non exported swank features and I'm still not sure if it's the right direction to go. Any thoughts? Thanks, Andrew
On Fri, Nov 4, 2011 at 8:56 AM, Andrew Myers asm198@gmail.com wrote:
So I've been looking into the problem of restarting a lisp image saved from Slime. It appears to me that Swank is still blocked on the socket listening for connections in the restarted image in some cases (I'm not sure why it's inconsistent). As a result when the image is restarted it's stuck listening on a connection that will never receive data and can't execute the Swank initialization process to connect to the new socket which Slime has chosen.
I was thinking that I would implement swank-backend:save-image to shutdown Swank before dumping the image (or possibly have the restart function shutdown any existing Swank processes) so there's nothing in the way of Swank being re-initialized when the image is loaded again. (Does this sounds like a reasonable thing to do?) I'm not sure how to do this though, there only seem to be methods for shutting down the Swank server, not individual connections. Looking at other implementations of save-image wasn't very illuminating so I seem to be missing something.
It looks like the connection struct and the *connections* defvar might have what I need in it, but there is only sanctioned access to the most recently opened connection via `default-connection`. Presumably I would need to close _all_ *connections*?
Can anyone point me in the right direction?
I'm not sure this will help, but here's the way I create an executable image with swank that works (100% of the time). It's SBCL specific but I'm guessing you can generalize it.
;; Shut down Swank and anyone else by terminating all threads (dolist (thread (sb-thread:list-all-threads)) (unless (equal sb-thread:*current-thread* thread) (sb-thread:terminate-thread thread))) ;; Set the function to run on startup of the core executable (setf sb-ext:*init-hooks* (list #'start-the-servers)) ;; Dump core and exit (sb-ext:save-lisp-and-die *core* :executable t)
Note that since all the threads have to be shut down first, you cannot do this from within Slime, but from the command line it is:
sbcl make-core.lisp
Jeff
I've got an updated version that seems to work quite well:
(defun aux-save-image (image-name) (let ((old-restart excl:*restart-init-function*)) (setf excl:*restart-init-function* (lambda () (dolist (connection swank::*connections*) (format t "closing ~a~%" connection) (swank::close-connection connection nil nil)) (when old-restart (funcall old-restart)))) (excl:dumplisp :name image-name) (setf excl:*restart-init-function* old-restart)))
Is there anything wrong with doing things this way? I get the same number of threads after I restart an image saved this way (i.e. no left overs from prior runs) and things don't hang. If this seems like a clean solution I may add a (close-all-swank-connections) method to the swank package and make the swank-backend:save-image implementation for Allegro do something like this.
Does this sounds like a valid solution to the problem that other Lispers would be happy with?
Andrew
On Fri, Nov 4, 2011 at 10:39 AM, Jeffrey Cunningham jeffrey@jkcunningham.com wrote:
On Fri, 04 Nov 2011 06:13:25 -0700, Andrew Myers asm198@gmail.com wrote:
As data point I can reliably restart images in which I've executed this form:
(setf *restart-init-function* (lambda () (dolist (connection swank::*connections*) (format t "closing ~a~%" connection) (swank::close-connection connection nil nil))))
Without that the reliability ranges from "works most of the time" to "never ever works" depending on the OS I'm testing on. This solution uses a number of non exported swank features and I'm still not sure if it's the right direction to go. Any thoughts? Thanks, Andrew
On Fri, Nov 4, 2011 at 8:56 AM, Andrew Myers asm198@gmail.com wrote:
So I've been looking into the problem of restarting a lisp image saved from Slime. It appears to me that Swank is still blocked on the socket listening for connections in the restarted image in some cases (I'm not sure why it's inconsistent). As a result when the image is restarted it's stuck listening on a connection that will never receive data and can't execute the Swank initialization process to connect to the new socket which Slime has chosen.
I was thinking that I would implement swank-backend:save-image to shutdown Swank before dumping the image (or possibly have the restart function shutdown any existing Swank processes) so there's nothing in the way of Swank being re-initialized when the image is loaded again. (Does this sounds like a reasonable thing to do?) I'm not sure how to do this though, there only seem to be methods for shutting down the Swank server, not individual connections. Looking at other implementations of save-image wasn't very illuminating so I seem to be missing something.
It looks like the connection struct and the *connections* defvar might have what I need in it, but there is only sanctioned access to the most recently opened connection via `default-connection`. Presumably I would need to close _all_ *connections*?
Can anyone point me in the right direction?
I'm not sure this will help, but here's the way I create an executable image with swank that works (100% of the time). It's SBCL specific but I'm guessing you can generalize it.
;; Shut down Swank and anyone else by terminating all threads (dolist (thread (sb-thread:list-all-threads)) (unless (equal sb-thread:*current-thread* thread) (sb-thread:terminate-thread thread))) ;; Set the function to run on startup of the core executable (setf sb-ext:*init-hooks* (list #'start-the-servers)) ;; Dump core and exit (sb-ext:save-lisp-and-die *core* :executable t)
Note that since all the threads have to be shut down first, you cannot do this from within Slime, but from the command line it is:
sbcl make-core.lisp Jeff
On 4 November 2011 17:54, Andrew Myers asm198@gmail.com wrote:
I've got an updated version that seems to work quite well:
(defun aux-save-image (image-name) (let ((old-restart excl:*restart-init-function*)) (setf excl:*restart-init-function* (lambda () (dolist (connection swank::*connections*) (format t "closing ~a~%" connection) (swank::close-connection connection nil nil)) (when old-restart (funcall old-restart)))) (excl:dumplisp :name image-name) (setf excl:*restart-init-function* old-restart)))
Is there anything wrong with doing things this way? I get the same
Closing a connection on initialization is risky. Consider: what if the underlying FD has is already in use in the new image?
You should clean up before you save, not when you initialize.
Cheers,
-- Nikodemus
Is this a problem in Lisp? I know in C that's an issue since you're just calling close on an integer but I had thought lisp would handle that more gracefully. Although I guess the behavior isn't specified since saving an image is outside the spec?
The problem with cleaning up _before_ saving the image is that you can't use the image any more. I was hoping to have a solution that didn't require an exit and restart every time an image was dumped.
Andrew
On Fri, Nov 4, 2011 at 12:03 PM, Nikodemus Siivola nikodemus@random-state.net wrote:
On 4 November 2011 17:54, Andrew Myers asm198@gmail.com wrote:
I've got an updated version that seems to work quite well:
(defun aux-save-image (image-name) (let ((old-restart excl:*restart-init-function*)) (setf excl:*restart-init-function* (lambda () (dolist (connection swank::*connections*) (format t "closing ~a~%" connection) (swank::close-connection connection nil nil)) (when old-restart (funcall old-restart)))) (excl:dumplisp :name image-name) (setf excl:*restart-init-function* old-restart)))
Is there anything wrong with doing things this way? I get the same
Closing a connection on initialization is risky. Consider: what if the underlying FD has is already in use in the new image?
You should clean up before you save, not when you initialize.
Cheers,
-- Nikodemus
* Andrew Myers [2011-11-04 17:24] writes:
Is this a problem in Lisp? I know in C that's an issue since you're just calling close on an integer but I had thought lisp would handle that more gracefully. Although I guess the behavior isn't specified since saving an image is outside the spec?
Lisp and C use the same API to the OS so it's the same problem and there is no general solution.
The problem with cleaning up _before_ saving the image is that you can't use the image any more. I was hoping to have a solution that didn't require an exit and restart every time an image was dumped.
On Unix, one trick is to fork before saving. In the child process, clean up OS resources, dump the image and exit. The parent process continues normally.
Helmut
On 4 November 2011 19:24, Andrew Myers asm198@gmail.com wrote:
Is this a problem in Lisp? I know in C that's an issue since you're just calling close on an integer but I had thought lisp would handle that more gracefully. Although I guess the behavior isn't specified since saving an image is outside the spec?
Well...
It is possible that your implementation keeps track of all open "system" streams, and cleans up such objects on its own when you save a core. I would not know if it does that, but I would not rely on it either.
If it doesn't do that, at some point closing a socket means calling close(2) or equivalent system call on the file descriptor / handle -- which means is risky on init.
The problem with cleaning up _before_ saving the image is that you can't use the image any more. I was hoping to have a solution that didn't require an exit and restart every time an image was dumped.
Unless you're on Windows, the time honored solution is to fork(2) before saving.
If you can't do that, then digging into implementation specifics you may be able to nuke the connections without closing the file descriptors / handles.
...or you can just live dangerously. Just be aware that you're doing something potentially tricksy, so that if and when something mysteriously breaks down the road, you know where to look. :)
Cheers,
-- Nikodemus
Is this a problem in Lisp? I know in C that's an issue since you're just calling close on an integer but I had thought lisp would handle that more gracefully. Although I guess the behavior isn't specified since saving an image is outside the spec?
Well...
It is possible that your implementation keeps track of all open "system" streams, and cleans up such objects on its own when you save a core. I would not know if it does that, but I would not rely on it either.
If it doesn't do that, at some point closing a socket means calling close(2) or equivalent system call on the file descriptor / handle -- which means is risky on init.
This is pretty much what I was hoping it would do. I've emailed Franz technical support about what the behavior is in this scenario. I guess I won't be to surprised if they don't want to promise any particular behavior in this area. I'll look into the fork method, should have thought of that before, that would at least give us a working solution on Unix platforms (which is all I need personally). How does Slime feel about platform specific functionality like that?
* Andrew Myers [2011-11-07 12:46] writes:
I'll look into the fork method, should have thought of that before, that would at least give us a working solution on Unix platforms (which is all I need personally).
You may also want to look at contrib/swank-snapshot.lisp and how it works for SBCL.
How does Slime feel about platform specific functionality like that?
I don't know if Slime has already reached consciousness but Emacs's position is clear: GNU first; other platforms have low priority.
Helmut
That suits me fine, that's pretty much my position too. Thanks for the tip on where to look for inspiration, I'll implement a fork based version and send you a patch. Andrew
On Mon, Nov 7, 2011 at 10:47 AM, Helmut Eller heller@common-lisp.net wrote:
- Andrew Myers [2011-11-07 12:46] writes:
I'll look into the fork method, should have thought of that before, that would at least give us a working solution on Unix platforms (which is all I need personally).
You may also want to look at contrib/swank-snapshot.lisp and how it works for SBCL.
How does Slime feel about platform specific functionality like that?
I don't know if Slime has already reached consciousness but Emacs's position is clear: GNU first; other platforms have low priority.
Helmut
slime-devel site list slime-devel@common-lisp.net http://common-lisp.net/mailman/listinfo/slime-devel