"Edi Weitz" edi@agharta.de wrote:
I'm currently having some trouble with a long-running TBNL/mod_lisp application on a public website which gets about 500,000 requests per month. It's an application delivered with LispWorks professional 4.4.6 and it runs on Linux behind Apache 2.0.54.
From time to time I see errors in the server log file and sometimes
(very rarely) users also have reported errors. It turns out they're all of the "error reading from Lisp" kind. I hacked mod_lisp2.c a bit to make this more specific and I now get one of "error reading from Lisp (fill status)" or "error reading header name." This means that the error happens either when the reply entity is copied from Lisp to the client or when mod_lisp tries to read a header name from Lisp.
I've attached a couple of these error message to the end of this email. What's suspicious for me is that they almost always come in chunks - half a dozen of them or more in a couple of seconds, then several hours without errors.
I'm calling MARK-AND-SWEEP from time to time but at intervals that don't match the time of the error messages. (My first thought was that a full GC blocks LW long enough for mod_lisp to time out but that doesn't seem to be the case.) Other than that I only use locks in admin parts of the website which are clearly not used in the middle of the night when some of these errors happened.
I'm lost. Does anyone have an idea where I should look to find the cause for these problems?
Have you tried to log the time taken to process the mod_lisp command in Lisp (internally in the Lisp process) to see if there is a correlation with the Apache errors ?
Marc