Hi!
Again, a little patch. This time it's not about leaked sockets but rather about "leaked children," so to say. It was Stefan Scholl who actually prompted me to investigate this further and here's what I found out:
This piece of code
if (ReadLength < ContentLength || r->connection->aborted) { char buffer[HUGE_STRING_LEN]; ContentLength -= ReadLength; do { ReadLength = ForceGets(buffer, (BUFF *) BuffSocket, HUGE_STRING_LEN > ContentLength ? ContentLength : HUGE_STRING_LEN); ContentLength -= ReadLength; } while (ReadLength > 0 && ContentLength > 0); }
doesn't work if ContentLength is large enough. What's happening is that the process of emptying BuffSocket (the loop above) always hangs when exactly the last 8192 (HUGE_STRING_LEN) bytes are waiting to be removed.
I tried with ap_bread instead of ForceGets but got the same result. (Note that ap_bgets which is used by ForceGets does CR/LF handling which is not needed here.)
The result is that this child becomes unusable but is still there, it is never killed by the Apache root process. An easy way to reproduce this (with TBNL) is to do
(asdf:oos 'asdf:load-op :tbnl-test) (tbnl:start-test)
with a proper Apache configuration (see TBNL docs) and then call ApacheBench with large values like so:
ab -n 2000 -c 200 http://localhost/tbnl/test/image.jpg
(The important point is that image.jpg is big enough - about 20kB in this case.)
After doing this you'll see a large number of Apache processes with ps(1) and the same amount of Lisp processes from within your Lisp image. Call ApacheBench often enough (two or three times) and Apache will completely stop to respond because it has reached its 'MaxClient' limit of 150 (unless you've changed it, of course) but none of the 150 clients is usable.
This happens because ApacheBench will abort all pending connections as soons as its finished with its tests. The pattern can obviously be used as a DoS attack on mod_lisp.
Now, what to do? One option would probably be to set a timeout before emptying the buffer (not tested). But I think the better (and faster) solution is to get rid of the buffer and the socket as well. The next time the client is used we'll have to open up a new socket to Lisp but this won't need 300 seconds (the default Apache value for 'Timeout'). So...
if (ReadLength < ContentLength || r->connection->aborted) { ap_log_error("mod_lisp", 0, APLOG_WARNING|APLOG_NOERRNO, r->server, "Could not send complete body to client, closing socket to Lisp"); ap_bclose(BuffSocket); KeepSocket = 0; LispSocket = 0; }
The appended patch also adds a couple of braces to appease gcc and it removes some code that is never used at all. Hope that's OK.
Cheers, Edi.