I am having a problem that I can't replicate on demand but has been happening with increasing frequency. I am hoping some of you might suggest ways to troubleshoot it. I don't know that it is a Hunchentoot problem, per se, but it may be an interaction problem between Hunchentoot and Apache2 via mod_lisp2.
The symptom is the server is hung first thing in the morning when I check it. The cpu is at 99% activity on the server image (sbcl). When I look in the error_log I see dozens of these:
[Wed Feb 20 05:10:10 2008] [error] (70007)The timeout specified has expired: error reading from Lisp [Wed Feb 20 05:12:23 2008] [error] (70007)The timeout specified has expired: error reading from Lisp [Wed Feb 20 05:15:14 2008] [error] (70007)The timeout specified has expired: error reading from Lisp [Wed Feb 20 05:18:57 2008] [error] (70007)The timeout specified has expired: error reading from Lisp
These generally correspond to googlebot activity:
66.249.73.165 - - [20/Feb/2008:05:09:10 -0800] "GET /intrepid-obstacle-map.html?hunchentoot-session=18%3A15F2776FDB3BD7CA852EBF74A3B40B2B HTTP/1.1" 500 673 "-" "Mozilla/5.0 (c ompatible; Googlebot/2.1; +http://www.google.com/bot.html)" 66.249.73.165 - - [20/Feb/2008:05:11:23 -0800] "GET /index.html?hunchentoot-session=17%3A549FDE65186DCCFD1D2552AA79A49871 HTTP/1.1" 500 673 "-" "Mozilla/5.0 (compatible; Googl ebot/2.1; +http://www.google.com/bot.html)" 66.249.73.165 - - [20/Feb/2008:05:14:14 -0800] "GET /intrepid-obstacle-map.html?hunchentoot-session=17%3A549FDE65186DCCFD1D2552AA79A49871 HTTP/1.1" 500 673 "-" "Mozilla/5.0 (c ompatible; Googlebot/2.1; +http://www.google.com/bot.html)" 66.249.73.165 - - [20/Feb/2008:05:17:57 -0800] "GET /intrepid-robothon-2007.html?hunchentoot-session=15%3A3C06276A2A5CD0567B6A332AE52B3940 HTTP/1.1" 500 673 "-" "Mozilla/5.0 ( compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
And finally, netstat shows a whole bunch of 4242 ports open (these are just a small subset:)
tcp 680 0 127.0.0.1:4242 127.0.0.1:38121 CLOSE_WAIT tcp 517 0 127.0.0.1:4242 127.0.0.1:44273 CLOSE_WAIT tcp 0 0 127.0.0.1:42641 127.0.0.1:4242 ESTABLISHED tcp 0 0 127.0.0.1:4242 127.0.0.1:51379 ESTABLISHED tcp 509 0 127.0.0.1:4242 127.0.0.1:33727 CLOSE_WAIT tcp 515 0 127.0.0.1:4242 127.0.0.1:44279 CLOSE_WAIT tcp 0 0 127.0.0.1:4242 127.0.0.1:48386 ESTABLISHED tcp 561 0 127.0.0.1:4242 127.0.0.1:44283 CLOSE_WAIT tcp 0 0 127.0.0.1:4242 127.0.0.1:42646 ESTABLISHED tcp 557 0 127.0.0.1:4242 127.0.0.1:43923 CLOSE_WAIT tcp 0 0 192.168.1.102:39210 192.168.1.2:993 ESTABLISHED tcp 0 0 127.0.0.1:42651 127.0.0.1:4242 ESTABLISHED tcp 0 0 127.0.0.1:4242 127.0.0.1:51382 ESTABLISHED tcp 589 0 127.0.0.1:4242 127.0.0.1:48377 CLOSE_WAIT tcp 589 0 127.0.0.1:4242 127.0.0.1:56928 CLOSE_WAIT tcp 589 0 127.0.0.1:4242 127.0.0.1:54195 CLOSE_WAIT
The only way I can restart the server is to kill the process, restart it, and restart apache.
Any ideas?
Thanks. --Jeff