Hi Mathieu,
you may want write a log entry in the DO clause of the loop in this function:
https://github.com/edicl/hunchentoot/blob/master/taskmaster.lisp#L250
It could be that somehow, you're running out of worker threads, but the code fails to correctly decrement the free worker count. You can also inspect request-count slot of the taskmaster when you see the hanging server again.
Keep us posted about what you find. -Hans
On Wed, Aug 15, 2012 at 6:02 PM, Mathieu Lemoine mlemoine@mentel.com wrote:
Hi Hans,
I though this was the problem because of the error message I find in my *standard-error* log: Error while creating worker thread for new incoming connection: Socket error in "getpeername": ENOTCONN (Transport endpoint is not connected) However, I noticed reading your patches, this is the error message introduced by the patch. Since the error message is present many times, I think your code is working, otherwise the acceptor would crash instantly.
Currently, the symptoms I saw are these:
- Timeout when trying to reach port 80 (on which Hunchentoot is listening)
- No error message in either *standard-error* nor *standard-output* logs
- Only one thread (hunchentoot listener?) running in addition to the main
process thread (as shown by htop)
This problem occurs in production environment after a few hours of operations. I restarted it once again with a swank server this time. Next time the problem occurs, I will try to dig in it to gather more informations on the state of my software.
I still don't know what is causing the problem or how to reproduce it, but I will try to check it up and keep you posted.
Thanks for your time and help. Mathieu.
2012/8/15 Hans Hübner hans.huebner@gmail.com
Hi Mathieu,
I have mined the Subversion repository for the commits that you referenced, and I found them to represent this patch:
Index: taskmaster.lisp
--- taskmaster.lisp (revision 4427) +++ taskmaster.lisp (revision 4435) @@ -127,9 +127,21 @@
#-:lispworks (defmethod handle-incoming-connection ((taskmaster one-thread-per-connection-taskmaster) socket)
- (bt:make-thread (lambda ()
(process-connection (taskmaster-acceptor
taskmaster) socket))
:name (format nil "Hunchentoot worker \(client:
~A)" (client-as-string socket))))
- ;; We are handling all conditions here as we want to make sure that
- ;; the acceptor process never crashes while trying to create a
- ;; worker thread. One such problem exists in
- ;; GET-PEER-ADDRESS-AND-PORT which can signal socket conditions on
- ;; some platforms in certain situations.
- (handler-case
(bt:make-thread (lambda ()
(process-connection (taskmaster-acceptor
taskmaster) socket))
:name (format nil "Hunchentoot worker \(client:
~A)" (client-as-string socket)))
- (error (cond)
;; Need to bind *ACCEPTOR* so that LOG-MESSAGE can do its work.
(let ((*acceptor* (taskmaster-acceptor taskmaster)))
(log-message *lisp-errors-log-level*
"Error while creating worker thread for new
incoming connection: ~A" cond)))))
;; LispWorks implementation
Now, the patch actually does not apply, but it was on trunk and has been evolved into the current version:
https://github.com/edicl/hunchentoot/blob/master/taskmaster.lisp#L355
If I understand you correctly, you say that the code does not work. Can you somehow provide a way to reproduce the problem?
Thanks, Hans
On Wed, Aug 15, 2012 at 3:29 PM, Mathieu Lemoine mlemoine@mentel.com wrote:
Hi,
I am currently hitting the problem described in the following thread: http://lists.common-lisp.net/pipermail/tbnl-devel/2009-July/004768.html
I'm using SBCL 1.0.58 with hunchentoot 1.2.3 and usocket 0.5.5 (both from quicklisp).
I understand the long-term fix will be some hard work. However, in this thread, Hans is referencing a few diff hunks :
http://bknr.net/trac/changeset/4430?format=diff&new=4430 http://bknr.net/trac/changeset/4434?format=diff&new=4434 http://bknr.net/trac/changeset/4435?format=diff&new=4435
Since the old trac is not working anymore, could it be possible to either point me toward the equivalent commits in the github repository or the content of these patches.
By the way, Peter, if you're still around, did you need to apply only 4435 or both this and 4434 in order to solve the second issue (unbound variable on *acceptor*)?
If anybody has information with respect to these issues, including the corresponding bug report in usocket, so I can do some follow up with them, it would be greatly appreciated.
Thanks in advance for your help and time.
Mathieu.
tbnl-devel site list tbnl-devel@common-lisp.net http://common-lisp.net/mailman/listinfo/tbnl-devel
tbnl-devel site list tbnl-devel@common-lisp.net http://common-lisp.net/mailman/listinfo/tbnl-devel
tbnl-devel site list tbnl-devel@common-lisp.net http://common-lisp.net/mailman/listinfo/tbnl-devel