I have a file handle leak in my code (that is, I run out of file handles after running for a while) and I suspect that I'm not using drakma correctly.
I wish to use the :want-stream t parameter and read the resulting stream directly, so I have this code:
(defun simple-get (url) "Download the url using GET and return the body as a string." (handler-case (multiple-value-bind (stream code headers dummy-uri dummy-stream must-close?) (drakma:http-request url :want-stream t :keep-alive nil :method :get) (declare (ignore headers dummy-stream dummy-uri)) (unwind-protect (and stream code (= code 200) (with-output-to-string (out) (do ((ch (read-char stream nil :eof) (read-char stream nil :eof))) ((not (characterp ch))) (princ ch out)))) (when (and stream must-close?) (ignore-errors (close stream))))) (error (condition) (format t "Error ~s: ~a~%" url condition) nil)))
Is there anything extra I need to do to make sure that all the streams opened by drakma are closed?
My production code is much more complex, but the simple stub above will generate the out of file handles problem. Besides the actual error I can use lsof on Linux and Mac OS and see many sockets stuck in CLOSED or CLOSE_WAIT states. This is all under LispWorks 5.0.1
Also, when using :want-stream nil I never encounter the problem.
Cheers, Chris Dean
Hi, Dean.
I've tried your code, but I can't reproduce socket handle leak. I'm on Windows + Clisp.
I've made several calls to (SIMPLE-GET "http://microsoft.com") and (SIMPLE-GET "http://google.com"); and watching sockets using netstat. All sockets are closed properly.
What URLs lead to socket handle leak?
May it be that URLs you use point to servers that use http 1.1 but don't return close http header properly?
As far as I understand, MUST-CLOSE? = NIL means that stream may be reused in further calls of HTTP-REQUEST for the same server. If so and your are not intended to resuse stream in further calls, you can always CLOSE it.
Try to always CLOSE returned streams, without regard to MUST-CLOSE?.
Regards, -Anton
Anton Vodonosov vodonosov@mail.ru writes:
May it be that URLs you use point to servers that use http 1.1 but don't return close http header properly?
The problem is very data dependent and I have a test set of 1647 urls that exercises the leak. I can send the data in a private email if you wish.
Thanks for running a test on Windows.
Cheers, Chris Dean
On Fri, 02 Feb 2007 18:27:34 -0800, Chris Dean ctdean@sokitomi.com wrote:
I have a file handle leak in my code (that is, I run out of file handles after running for a while) and I suspect that I'm not using drakma correctly.
I wish to use the :want-stream t parameter and read the resulting stream directly, so I have this code:
(defun simple-get (url) "Download the url using GET and return the body as a string." (handler-case (multiple-value-bind (stream code headers dummy-uri dummy-stream must-close?) (drakma:http-request url :want-stream t :keep-alive nil :method :get) (declare (ignore headers dummy-stream dummy-uri)) (unwind-protect (and stream code (= code 200) (with-output-to-string (out) (do ((ch (read-char stream nil :eof) (read-char stream nil :eof))) ((not (characterp ch))) (princ ch out)))) (when (and stream must-close?) (ignore-errors (close stream))))) (error (condition) (format t "Error ~s: ~a~%" url condition) nil)))
Is there anything extra I need to do to make sure that all the streams opened by drakma are closed?
My production code is much more complex, but the simple stub above will generate the out of file handles problem. Besides the actual error I can use lsof on Linux and Mac OS and see many sockets stuck in CLOSED or CLOSE_WAIT states. This is all under LispWorks 5.0.1
Also, when using :want-stream nil I never encounter the problem.
The meaning of the sixth return value (MUST-CLOSE) is that you're not allowed to re-use the stream, because according to the reply headers the server will close the stream on its side.
However, if you do /not/ want to re-use the stream (which is obviously the case in your example as your function doesn't return the stream), you must of course always close it. Drakma can't close it for you as it doesn't know when you're done with it, and why would you want to keep an open stream hanging around in your image that can't be accessed by your code anyway?
In other words: It should be
(when stream (ignore-errors (close stream)))))
above.
I'll re-word this in the documentation to make it more clear (hopefully).
HTH, Edi.
On Sat, 03 Feb 2007 16:09:48 +0100, Edi Weitz edi@agharta.de wrote:
I'll re-word this in the documentation to make it more clear (hopefully).
Although, after reading it once more, I think the documentation was already pretty clear:
HTTP-REQUEST will always close the stream to the server before it returns unless WANT-STREAM is true or if the headers exchanged between Drakma and the server determine that the connection will be kept alive - for example if both client and server used the HTTP 1.1 protocol and no explicit "Connection: close" header was sent. In these cases /you/ will have to close the stream manually.
[...]
If WANT-STREAM is true, the message body is not read and instead the (open) socket stream is returned as the first return value. If the sixth value of HTTP-REQUEST is true, the stream should be closed (and not be re-used) after the body has been read.
Anyway, I'll try to be even more precise... :)
Edi Weitz edi@agharta.de writes:
However, if you do /not/ want to re-use the stream (which is obviously the case in your example as your function doesn't return the stream), you must of course always close it.
Sure, of course.
(when stream (ignore-errors (close stream)))))
Fair enough. FWIW, my production code looks exactly like this. (During my debugging I noticed that must-close was always t in my case.)
Regardless, if I make that change I still see the leak.
I have a data set I can send off-list if anyone is interested.
Cheers, Chris Dean
(defun simple-get (url) "Download the url using GET and return the body as a string." (handler-case (multiple-value-bind (stream code) (drakma:http-request url :want-stream t :keep-alive nil :method :get) (unwind-protect (and stream code (= code 200) (with-output-to-string (out) (do ((ch (read-char stream nil :eof) (read-char stream nil :eof))) ((not (characterp ch))) (princ ch out)))) (when stream (ignore-errors (close stream))))) (error (condition) (format t "Error ~s: ~a~%" url condition) nil)))
On Sat, 03 Feb 2007 13:10:31 -0800, Chris Dean ctdean@sokitomi.com wrote:
Regardless, if I make that change I still see the leak.
I have a data set I can send off-list if anyone is interested.
OK, thanks for the data. I think I've found the leak: It happened if there was a redirect and the server explicitely wanted to close the first connection, the one which sent the 302. Drakma realized that it couldn't re-use the socket stream and created a new one, but it "forgot" to close the old one.
Please try the new release and see if you still have the same problems.
Thanks for the bug report, Edi.
Edi Weitz edi@agharta.de writes:
Please try the new release and see if you still have the same problems.
I will, and I'll let you know the results of my tests.
Cheers, Chris Dean
Please try the new release and see if you still have the same problems.
I've tried it out on my data and it is certainly much better. But there are still some connections left after a run of 1646 urls.
I'll take another look at the code later tonight and see if I can discover anything. Also, I'll send along my run data in a separate email to interested parties.
Cheers, Chris Dean
Chris Dean ctdean@sokitomi.com writes:
I've tried it out on my data and it is certainly much better. But there are still some connections left after a run of 1646 urls.
Some urls give an error when parsing the header. We do hit the final unwind-protect in http-request, but since the error occurs during the parsing of the header the caller (me) doesn't have the stream object available to close. One solution is below: set another flag that indicates whether or not to leave the stream open.
I'll continue testing in case I come across any other issues.
Cheers, Chris Dean
On Sun, 04 Feb 2007 18:20:05 -0800, Chris Dean ctdean@sokitomi.com wrote:
I've tried it out on my data and it is certainly much better. But there are still some connections left after a run of 1646 urls.
I've now done a full run through the test URLs you sent and they provide for a lot of interesting problematic cases. I'll update Drakma and probably Chunga with bugfixes and/or workarounds in the next days.
Thanks, Edi.
On Tue, 06 Feb 2007 01:46:02 +0100, Edi Weitz edi@agharta.de wrote:
I've now done a full run through the test URLs you sent and they provide for a lot of interesting problematic cases. I'll update Drakma and probably Chunga with bugfixes and/or workarounds in the next days.
OK, see the new releases of Drakma and Chunga. I can now run Drakma (tested on LWW 5.0.1) through Chris Dean's 1600+ test cases with only very few warnings and errors. These are:
1. Charsets like GB2312 that FLEXI-STREAMS doesn't know.
2. Headers sent by the server which are really corrupt.
3. Network-related errors like "Unknown host".
4. Five cases of "End of file while reading ...".
I'm only concerned about #4, but unfortunately these aren't reproducible. I'll see what I can find out, but if someone has an idea, please step forward.
FWIW, this is the function I used for testing:
(defun simple-get (url) (handler-case (let ((puri:*strict-parse* nil) (flex:*provide-use-value-restart* t) (flex:*substitution-char* #?)) (multiple-value-bind (stream code) (drakma:http-request url :cookie-jar (make-instance 'drakma:cookie-jar) :want-stream t) (unwind-protect (and stream (eql code 200) (with-output-to-string (out) (do ((ch (read-char stream nil :eof) (read-char stream nil :eof))) ((not (characterp ch))) (princ ch out)))) (when stream (ignore-errors (close stream :abort t)))))) (error (condition) (format t "~&Error (~A): ~A~%%" url condition) nil)))
Edi.
That's great! I'll pull the new versions and run them through my code.
Cheers, Chris Dean