Hi,
A few days ago on the site lisper.ru I started using Hunchentoot from trunk. And found that the system stopped working in a few hours. I watched the behavior of the system during the day and found that some threads do not complete their work and hangs on "finish-ouptut" (in "process-connection"). When they accumulate enough, the work stops.
I have this problem on Debian 64-bit with SBCL-1.0.47 and SBCL-1.0.37. I have not had this trouble when used Hunchentoot-1.1.1
After replacing the "finish-output" on "force-output" in the "process-connection" the problem was fixed for me.
Andrey
Andrey,
thank you for your report:
On Thu, Mar 31, 2011 at 10:22 AM, Andrey Moskvitin archimag@gmail.com wrote:
A few days ago on the site lisper.ru I started using Hunchentoot from trunk. And found that the system stopped working in a few hours. I watched the behavior of the system during the day and found that some threads do not complete their work and hangs on "finish-ouptut" (in "process-connection"). When they accumulate enough, the work stops.
In principle, I am fine with making that change in Hunchentoot. I do wonder what is really going on. The difference between finish-output and force-output is, according to the spec, that finish-output actually waits for the output to have drained, while force-output just initiates the flushing and returns immediately. What you seem to report is that the flushing is not actually terminating when finish-output is used. Just to understand what's going on, what kind of setup did you observe the bad behavior in? Can you reproduce the problem and tell us some more about the connections that seem to be hanging - Are they still in ESTABLISHED or in FIN_WAIT state?
It'd be great if you could supply us with some more information.
Thanks! Hans
Are they still in ESTABLISHED or in FIN_WAIT state? It'd be great if you could supply us with some more information.
Hmm...
Threads (via *slime-threads*):
27 repl-thread Running 28 auto-flush-thread Running 29 reader-thread Running 30 control-thread Running 31 hunchentoot-worker-62.78.43.253:8576 Running 7 hunchentoot-listener-*:80 Running 8 Swank 4005 Running 9 initial thread Running
Backtrace:
Interrupt from Emacs [Condition of type SIMPLE-ERROR]
Restarts: 0: [CONTINUE] Continue from break. 1: [TERMINATE-THREAD] Terminate this thread (#<THREAD "hunchentoot-worker-62.78.43.253:8576" RUNNING {1002BD0451}>)
Backtrace: 0: ("bogus stack frame") 1: (SB-IMPL::SUB-SUB-SERVE-EVENT NIL NIL) 2: (SB-IMPL::SUB-SERVE-EVENT NIL NIL NIL) 3: (SB-SYS:SERVE-ALL-EVENTS NIL) 4: (SB-IMPL::FINISH-FD-STREAM-OUTPUT #<SB-SYS:FD-STREAM for "a socket" {1002BCE731}>) 5: (FINISH-OUTPUT #<SB-SYS:FD-STREAM for "a socket" {1002BCE731}>) 6: ((SB-PCL::FAST-METHOD HUNCHENTOOT:PROCESS-CONNECTION (HUNCHENTOOT:ACCEPTOR T)) ..) 7: ((SB-PCL::FAST-METHOD HUNCHENTOOT:PROCESS-CONNECTION :AROUND (HUNCHENTOOT:ACCEPTOR T)) ..) 8: ((LAMBDA ())) 9: ((LAMBDA ())) 10: ((FLET #:WITHOUT-INTERRUPTS-BODY-[BLOCK358]363)) 11: ((FLET SB-THREAD::WITH-MUTEX-THUNK)) 12: ((FLET #:WITHOUT-INTERRUPTS-BODY-[CALL-WITH-MUTEX]293)) 13: (SB-THREAD::CALL-WITH-MUTEX ..) 14: (SB-THREAD::INITIAL-THREAD-FUNCTION) 15: ("foreign function: #x4220F0") 16: ("foreign function: #x418D67")
But, NETSTAT not show connection to 62.78.43.253:8576. It looks like a bug in SBCL.
Andrey
On Fri, Apr 1, 2011 at 4:02 PM, Andrey Moskvitin archimag@gmail.com wrote:
Are they still in ESTABLISHED or in FIN_WAIT state? It'd be great if you could supply us with some more information.
But, NETSTAT not show connection to 62.78.43.253:8576. It looks like a bug in SBCL.
Thanks for this report - Would you be able to try another SBCL version? Someone else have any ideas?
-Hans
Would you be able to try another SBCL version?
I tested with SBCL 1.0.47 and SBCL 1.0.37. I have not had this problem on a local network, only on http://lisper.ru/.
Andrey
2011/4/1 Hans Hübner hans.huebner@gmail.com:
On Fri, Apr 1, 2011 at 4:02 PM, Andrey Moskvitin archimag@gmail.com wrote:
Are they still in ESTABLISHED or in FIN_WAIT state? It'd be great if you could supply us with some more information.
But, NETSTAT not show connection to 62.78.43.253:8576. It looks like a bug in SBCL.
Thanks for this report - Would you be able to try another SBCL version? Someone else have any ideas?
-Hans
On Fri, Apr 1, 2011 at 4:21 PM, Andrey Moskvitin archimag@gmail.com wrote:
Would you be able to try another SBCL version?
I tested with SBCL 1.0.47 and SBCL 1.0.37. I have not had this problem on a local network, only on http://lisper.ru/.
I have made the change, as there does not seem to be any downside.
-Hans
Hi Hans,
I think changing FINISH-OUTPUT back to FORCE-OUTPUT is wrong. It reintroduces the bug I reported a couple of months ago: http://common-lisp.net/pipermail/tbnl-devel/2011-February/005411.html
Using (FORCE-OUTPUT S) in conjunction with (CLOSE S :ABORT T) does not guarantee all buffered output is sent to the socket before the socket is closed. That is probably why replacing FINISH-OUTPUT with FORCE-OUTPUT "fixed" Andrey's problem: the new version simply discards data instead of writing it to the socket.
Ilya
Hi Ilya,
thank you for the heads-up and sorry for the improper fix, which I will be backing out. I made the change after having discussed with David Lichteblau, and I have wrongly got the impression that FORCE-OUTPUT and FINISH-OUTPUT without any other keyword parameters should be basically equivalent. I have been pondering whether the FINISH-OUTPUT, with waiting for the flush to have finished, is the right thing. What is really required at this point is making sure that the data eventually arrives at the client, not neccessarily waiting for that to have happened.
I am not quite sure what do do now. I personally don't use SBCL so I am not affected by the problem. It seems that neither of the two approaches that have been tried properly work for SBCL - FORCE-OUTPUT potentially loses data, and FINISH-OUTPUT potentially hangs the server.
Any suggestions?
Thanks, Hans
On Mon, Apr 4, 2011 at 8:08 AM, Ilya Perminov iperminov@dwavesys.com wrote:
Hi Hans,
I think changing FINISH-OUTPUT back to FORCE-OUTPUT is wrong. It reintroduces the bug I reported a couple of months ago: http://common-lisp.net/pipermail/tbnl-devel/2011-February/005411.html
Using (FORCE-OUTPUT S) in conjunction with (CLOSE S :ABORT T) does not guarantee all buffered output is sent to the socket before the socket is closed. That is probably why replacing FINISH-OUTPUT with FORCE-OUTPUT "fixed" Andrey's problem: the new version simply discards data instead of writing it to the socket.
Ilya
tbnl-devel site list tbnl-devel@common-lisp.net http://common-lisp.net/mailman/listinfo/tbnl-devel