Recently when I tried to start slime, most of the time it throw a error message:
Connecting to Swank on port 45744.. [2 times] Lisp connection closed unexpectedly: connection broken by remote peer
in *inferior-lisp* it shows:
;; Swank started at port: 45744. 45744 * ;; swank:close-connection: end of file on #<SB-SYS:FD-STREAM for "a socket" {B1593F1}>
But I run slime again and don't create an additional inferior lisp, it works very well.
I don't know why and can not get any debug information about it.
I used Emacs 23.1.1, SLIME 2009-12-03, SBCL 1.0.31-2, my inferior-lisp-program is "/home/plato/clbuild/clbuild --implementation sbcl lisp"
Thank you in advance!
* Plato Wu [2009-12-06 10:08+0100] writes:
Recently when I tried to start slime, most of the time it throw a error message:
Connecting to Swank on port 45744.. [2 times] Lisp connection closed unexpectedly: connection broken by remote peer
in *inferior-lisp* it shows:
;; Swank started at port: 45744. 45744
- ;; swank:close-connection: end of file on #<SB-SYS:FD-STREAM for "a socket" {B1593F1}>
But I run slime again and don't create an additional inferior lisp, it works very well.
I don't know why and can not get any debug information about it.
This sounds like a hard to reproduce Emacs problem but maybe you can find something out with strace.
Helmut
* Helmut Eller m2iqckcnjt.fsf@common-lisp.net : Wrote on Sun, 06 Dec 2009 13:08:38 +0100:
| * Plato Wu [2009-12-06 10:08+0100] writes: | |> Recently when I tried to start slime, most of the time it throw a |> error message: |> |> Connecting to Swank on port 45744.. [2 times] |> Lisp connection closed unexpectedly: connection broken by remote peer [...] |> I don't know why and can not get any debug information about it. | | This sounds like a hard to reproduce Emacs problem but maybe you can | find something out with strace.
No, strace will most likely make the problem go away (in a heisenbug sense). At least if this is the timing issue in Fmake_network_process that I suspect it is, which I've seen since mentioning it here last year in http://permalink.gmane.org/gmane.lisp.slime.devel/7938
-- Madhu
Helmut Eller heller@common-lisp.net writes:
- Plato Wu [2009-12-06 10:08+0100] writes:
Recently when I tried to start slime, most of the time it throw a error message:
Connecting to Swank on port 45744.. [2 times] Lisp connection closed unexpectedly: connection broken by remote peer
in *inferior-lisp* it shows:
;; Swank started at port: 45744. 45744
- ;; swank:close-connection: end of file on #<SB-SYS:FD-STREAM for "a socket" {B1593F1}>
But I run slime again and don't create an additional inferior lisp, it works very well.
I don't know why and can not get any debug information about it.
This sounds like a hard to reproduce Emacs problem but maybe you can find something out with strace.
I've seen it too at work, but only if I ssh to my work place from home and use emacs -nw; it does not appear with the gui version. It's also 23.1.x, I think.
-T.
"Tobias C. Rittweiler" tcr@freebits.de writes:
Helmut Eller heller@common-lisp.net writes:
- Plato Wu [2009-12-06 10:08+0100] writes:
Recently when I tried to start slime, most of the time it throw a error message:
Connecting to Swank on port 45744.. [2 times] Lisp connection closed unexpectedly: connection broken by remote peer
in *inferior-lisp* it shows:
;; Swank started at port: 45744. 45744
- ;; swank:close-connection: end of file on #<SB-SYS:FD-STREAM for "a
socket" {B1593F1}>
But I run slime again and don't create an additional inferior lisp, it works very well.
I don't know why and can not get any debug information about it.
This sounds like a hard to reproduce Emacs problem but maybe you can find something out with strace.
I've seen it too at work, but only if I ssh to my work place from home and use emacs -nw; it does not appear with the gui version. It's also 23.1.x, I think.
-T.
Yes, I am at the same situation: ssh and -nw and 23.1.x
I'm looking forward perfect solution, :)
* Plato Wu [2009-12-06 15:33+0100] writes:
I've seen it too at work, but only if I ssh to my work place from home and use emacs -nw; it does not appear with the gui version. It's also 23.1.x, I think.
-T.
Yes, I am at the same situation: ssh and -nw and 23.1.x
I was able to produce a network capture with tshark. I changed swank.lisp so that it always creates the port on 4444 and captured with tshark -i lo -w /tmp/x.dump port 4444
The file is attached below and can be opened with Wireshark. Some TCP packets have a "TCP CHECKSUM INCORRECT". This seems very odd to me. Does somebody know what it means?
Helmut
On Sun, 06 Dec 2009 20:16:50 +0100 Helmut Eller heller@common-lisp.net wrote:
I was able to produce a network capture with tshark. I changed swank.lisp so that it always creates the port on 4444 and captured with tshark -i lo -w /tmp/x.dump port 4444
The file is attached below and can be opened with Wireshark. Some TCP packets have a "TCP CHECKSUM INCORRECT". This seems very odd to me. Does somebody know what it means?
The text results of tcpdump (especially using -nvvxX flags) or a binary tcpdump result would be easier for me (and perhaps others?) to read.
When I encountered packet checksum errors here it was due to a card/driver specific TCP hardware acceleration feature when enabled, or hardware problems, although a faulty software packet translator or IP stack bug is not impossible... Since your test appears to be local, I doubt NIC TCP acceleration to be the problem however.
In case your question was litteral about "what it means?" (sorry for stating the obvious if it wasn't), a TCP checksum is wrong for a packet when it doesn't match the actual checksum of its payload (indicating it was probably corrupted in transit, or wrongly calculated), and this is rarely calculated/verified at the application layer, except by userland packet-level analysis/manipulation tools.
It could even be a tshark-specific problem...
* Matthew Mondor [2009-12-07 04:31+0100] writes:
The file is attached below and can be opened with Wireshark. Some TCP packets have a "TCP CHECKSUM INCORRECT". This seems very odd to me. Does somebody know what it means?
The text results of tcpdump (especially using -nvvxX flags) or a binary tcpdump result would be easier for me (and perhaps others?) to read.
When I encountered packet checksum errors here it was due to a card/driver specific TCP hardware acceleration feature when enabled, or hardware problems, although a faulty software packet translator or IP stack bug is not impossible... Since your test appears to be local, I doubt NIC TCP acceleration to be the problem however.
In case your question was litteral about "what it means?" (sorry for stating the obvious if it wasn't), a TCP checksum is wrong for a packet when it doesn't match the actual checksum of its payload (indicating it was probably corrupted in transit, or wrongly calculated), and this is rarely calculated/verified at the application layer, except by userland packet-level analysis/manipulation tools.
It could even be a tshark-specific problem...
Everything goes through the loopback device and no real hardware is involved so it strange to see checksum errors. Maybe it's just a Wireshark weiredness.
Anyway, my current hypothesis is that Emacs' connect() is interrupted by a timer and, instead of retrying, Emacs closes the socket and starts over with a new socket. That would explain the two connections from different ports in the trace. It's also not entirely unreasonable to start with a fresh socket if we assume that the server accepts multiple connections, but our server accepts one connection only. It's still strange that the 3-way handshake for the second connection succeeds. Need to dig deeper.
Helmut
On Mon, Dec 7, 2009 at 1:53 PM, Helmut Eller heller@common-lisp.net wrote:
- Matthew Mondor [2009-12-07 04:31+0100] writes:
Everything goes through the loopback device and no real hardware is involved so it strange to see checksum errors. Maybe it's just a Wireshark weiredness.
It's probably just a Wireshark/tshark problem. Over the years I've seen invalid checksum notes in Wireshark before where there otherwise wasn't anything wrong going on that I could tell, and I've come to believe it can't be trusted in that regard.
On Sun, 2009-12-06 at 20:16 +0100, Helmut Eller wrote:
- Plato Wu [2009-12-06 15:33+0100] writes:
I've seen it too at work, but only if I ssh to my work place from home and use emacs -nw; it does not appear with the gui version. It's also 23.1.x, I think.
-T.
Yes, I am at the same situation: ssh and -nw and 23.1.x
I was able to produce a network capture with tshark. I changed swank.lisp so that it always creates the port on 4444 and captured with tshark -i lo -w /tmp/x.dump port 4444
The file is attached below and can be opened with Wireshark. Some TCP packets have a "TCP CHECKSUM INCORRECT". This seems very odd to me. Does somebody know what it means?
When creating TCP/UDP packets, the kernel could compute the checksum itself but that's slow so most of the times nowadays it delegates the checksum calculation to the network card. Since wireshark intercepts the packet before being sent to the NIC, it sees incorrect checksums.
Furthermore, for packets sent on the loopback interface Linux avoids calculating or checking the checksum at all(one could always use ECC RAM for safety).
* Plato Wu [2009-12-06 15:33+0100] writes:
I'm looking forward perfect solution, :)
I sent a bug with a fix to the Emacs' maintainers: http://emacsbugs.donarmstrong.com/cgi-bin/bugreport.cgi?bug=5173 but I haven't yet received an answer.
In the meantime you can start Swank so that it accepts multiple connections which is opens a small security hole but has no other drawbacks. You can achieve that if you put something like this in your .emacs:
(defun load-swank-dont-close (port-filename encoding) (format "%S\n\n" `(progn (load ,(expand-file-name slime-backend slime-path) :verbose t) (funcall (read-from-string "swank-loader:init")) (funcall (read-from-string "swank:start-server") ,port-filename :coding-system ,(slime-coding-system-cl-name encoding) :dont-close t))))
(setq slime-lisp-implementations '((sbcl-noclose ("sbcl") :init load-swank-dont-close)))
Helmut
Helmut Eller heller@common-lisp.net writes:
- Plato Wu [2009-12-06 15:33+0100] writes:
I'm looking forward perfect solution, :)
I sent a bug with a fix to the Emacs' maintainers: http://emacsbugs.donarmstrong.com/cgi-bin/bugreport.cgi?bug=5173 but I haven't yet received an answer.
In the meantime you can start Swank so that it accepts multiple connections which is opens a small security hole but has no other drawbacks. You can achieve that if you put something like this in your .emacs:
(defun load-swank-dont-close (port-filename encoding) (format "%S\n\n" `(progn (load ,(expand-file-name slime-backend slime-path) :verbose t) (funcall (read-from-string "swank-loader:init")) (funcall (read-from-string "swank:start-server") ,port-filename :coding-system ,(slime-coding-system-cl-name encoding) :dont-close t))))
(setq slime-lisp-implementations '((sbcl-noclose ("sbcl") :init load-swank-dont-close)))
Helmut
Thanks, It works and hope Emacs's next release could accept your fix.