On Sun, 22 Jan 2006 10:20:04 -0700, "Robert J. Macomber" tbnl@rojoma.com wrote:
In my apache2 config, I've got this [simplified]:
LispServer 127.0.0.1 3000 lisp <VirtualHost *:80 *:54321> ServerName lisp.rojoma.com DocumentRoot /home/robertm/mod_lisp <LocationMatch /.*.l> SetHandler lisp-handler </LocationMatch>
</VirtualHost>
If you visit the url http://lisp.rojoma.com/%ef.l apache sends bytes for the script-filename parameter which can be interpreted as the latin-1 string "/home/robertm/mod_lisp/ï.l". If they're interpreted as UTF-8, like sbcl's READ-LINE tries to do when running in an UTF-8 locale (since kmrcl doesn't specify an explicit :external-format in ACCEPT-TCP-CONNECTION) an error is thrown in GET-REQUEST-DATA.
In fact, a user can make apache send arbitrary bytes to lisp while it's reading the request headers in other ways too (e.g., sending a "Mumble: {random octets}" header) but script-filename is the only one where Apache will do so more or less on its own.
I changed kmrcl here rather than making tbnl send a "bad request" reply because you never know what random junk proxies and such will throw into users' HTTP request headers, and I didn't want to deny service just because some proxy is adding the (accented) name of their ISP to user-agent strings or something. I haven't actually seen that happen yet, but the net is vast.
OK, thanks for the info. I've been thinking about extracting the relevant bits from KMRCL and integrating them into TBNL directly anyway. We only use a small fraction of what's in there and it might make sense to make some of this user-configurable, not only because of the external format but also because of the interface(s) TBNL will listen on, for example. I'll do that one day... :)
Cheers, Edi.