Hi all,
I've been using tbnl behind mod_lisp2 for quite a while.
After Edi merged tbnl with hunchentoot I wanted to get rid of apache and run hunchentoot in standalone mode for easier deployment (not to mention the coolness factor of running everything in lisp).
However, I've found that the file upload performance of hunchentoot is 4-10 times slower than tbnl (standalone or behind mod_lisp). I tested this by uploading a 30mb file in the test upload page.
I'm guessing that Chunga might be the reason, since this is the only new component introduced.
[ Chunga is currently not optimized towards performance - it is rather intended to be easy to use and (if possible) to behave correctly.]
On a side note, while the file is being uploaded the cpu will go up to 90-100% (This happens to both tbnl and hunchentoot).
This is not the case when I upload the same file to a php script w/ apache.
Again, my guess is that rfc2388 repeatedly call read-char instead of grabbing a buffer with read-sequence and decode it as a chunk.
These two issues make migrating to hunchentoot particular painful because if one of the users uploads a huge file the whole site will become very unresponsive (in tbnl the cpu spike goes away a lot faster, but improving the i/o routine is still a big win).
I don't want to blindly guess, but I don't know how to use lispworks' profiler to profile a multi-threaded server app.
The lispworks profiler requires you to run (profile <forms>) and it will return the profiling data after running the <form>.
Is there a way to profile hunchentoot without writing individual test case that simulates uploading a 30 mb file?
What about people using sbcl? How do you go about profiling apps like hunchentoot?
Thanks.
On Tue, 14 Nov 2006 18:10:52 -0800, "Mac Chan" emailmac@gmail.com wrote:
I've been using tbnl behind mod_lisp2 for quite a while.
After Edi merged tbnl with hunchentoot I wanted to get rid of apache and run hunchentoot in standalone mode for easier deployment (not to mention the coolness factor of running everything in lisp).
However, I've found that the file upload performance of hunchentoot is 4-10 times slower than tbnl (standalone or behind mod_lisp). I tested this by uploading a 30mb file in the test upload page.
Yes, I've seen the same.
I'm guessing that Chunga might be the reason, since this is the only new component introduced.
That would be my guess as well - Chunga and/or FLEXI-STREAMS. Including the fact that both Chunga and FLEXI-STREAMS make heavy use of Gray streams.
Again, my guess is that rfc2388 repeatedly call read-char instead of grabbing a buffer with read-sequence and decode it as a chunk.
That is certainly part of the problem.
The only way out is probably to write our own version of the RFC2388 library - which is one of my long-term plans.
These two issues make migrating to hunchentoot particular painful because if one of the users uploads a huge file the whole site will become very unresponsive (in tbnl the cpu spike goes away a lot faster, but improving the i/o routine is still a big win).
I don't want to blindly guess, but I don't know how to use lispworks' profiler to profile a multi-threaded server app.
The lispworks profiler requires you to run (profile <forms>) and it will return the profiling data after running the <form>.
Is there a way to profile hunchentoot without writing individual test case that simulates uploading a 30 mb file?
I'd like to know the answer too... :)
I think you better chances to get an answer to this question on the LW mailing list.
Cheers, Edi.
On Tue, 14 Nov 2006 18:10:52 -0800, "Mac Chan" emailmac@gmail.com wrote:
I'm guessing that Chunga might be the reason, since this is the only new component introduced.
Actually, the more I think about it the more I'm sure that FLEXI-STREAMS is the culprit. I also have an idea how to make it faster, but I'm not sure if I'll find the time to do it in the next days.
Again, my guess is that rfc2388 repeatedly call read-char instead of grabbing a buffer with read-sequence and decode it as a chunk.
Because of the way the streams are layered now, you probably wouldn't win much (if anything at all) if you used READ-SEQUENCE instead.
More later, Edi.
On 11/15/06, Edi Weitz edi@agharta.de wrote:
Actually, the more I think about it the more I'm sure that FLEXI-STREAMS is the culprit. I also have an idea how to make it faster, but I'm not sure if I'll find the time to do it in the next days.
No hurry, Edi. As long as we know there's a solution I'm at ease :-)
Again, my guess is that rfc2388 repeatedly call read-char instead of grabbing a buffer with read-sequence and decode it as a chunk.
Because of the way the streams are layered now, you probably wouldn't win much (if anything at all) if you used READ-SEQUENCE instead.
So I just did some profiling earlier using the tip from
http://thread.gmane.org/gmane.lisp.lispworks.general/5563/focus=5573
and found that most time are spent in tons of calls to read-char.
I then tried something like reading the whole buffer into a string and use the (parse-mime string) method instead of (parse-mime stream) (this is not a fix because if you upload 100mb file, it will allocate a 100mb buffer).
The result is worse... will investigate later. Also I plan do similar test with allegro-serve and see if it uses 100% cpu performing the same task.
On Wed, 15 Nov 2006 19:04:10 -0800, "Mac Chan" emailmac@gmail.com wrote:
I then tried something like reading the whole buffer into a string and use the (parse-mime string) method instead of (parse-mime stream) (this is not a fix because if you upload 100mb file, it will allocate a 100mb buffer).
You could divide this in smaller buffers, but then it gets ugly. See below.
The result is worse...
Yes, as I conjectured in my previous email. The reason is that FLEXI-STREAMS currently has to read in octet-size steps anyway.
will investigate later. Also I plan do similar test with allegro-serve and see if it uses 100% cpu performing the same task.
I'm pretty sure it doesn't, mainly for two reasons:
1. For chunking and external format switching it uses AllegroCL's built-in "simple streams" - you're unlikely to beat that with portable Gray stream solutions like Chunga and FLEXI-STREAMS.
2. Lots of AllegroServe's source (to me) look like C code with parentheses around it - pre-allocated buffers and all that stuff. That's probably good for performance, but it's not the road I want to go.
Cheers, Edi.
On Thu, 16 Nov 2006 01:30:05 +0100, Edi Weitz edi@agharta.de wrote:
Actually, the more I think about it the more I'm sure that FLEXI-STREAMS is the culprit. I also have an idea how to make it faster, but I'm not sure if I'll find the time to do it in the next days.
You might want to try out the new release (0.9.0) of FLEXI-STREAMS to see if this makes things more acceptable for you.
SBCL (and problably CMUCL) users please note this, though:
http://thread.gmane.org/gmane.lisp.steel-bank.general/1400/
Cheers, Edi.