About 6 months ago I got some strange encoding errors with a Hunchentoot web server. There are a few of places in Hunchentoot where the +latin-1+ character encoding is used as the external format regardless of headers received from the client:
- GET-POST-DATA returns a +latin-1+ externally encoded stream no matter what when the WANT-STREAM parameter is true. - PARSE-MULTIPART-FORM-DATA creates a +latin-1+ stream from the CONTENT-STREAM of the request. (relevant RFC: 2388) - MAYBE-READ-POST-PARAMETERS uses +latin-1+ to process "application/x-www-form-urlencoded" content-type POST bodies
In addition, RECOMPUTE-REQUEST-PARAMETERS seems to interpret both the message body and the query string according to a charset in the request header. I thought that Content-Type was only supposed to affect the message body, not the headers (which are assumed to be in ASCII). Then shouldn't the URL and query string always be read as ASCII? RFC2047 discusses non-ascii headers for MIME, but I don't know if that is relevant except for parsing multipart forms.
I'm not thoroughly versed in the HTTP protocol, but it seems that these are bugs in Hunchentoot. I have a half-completed patch but I want to get some more opinions before I go any further. There may also be other lurking encoding issues in Hunchentoot, or I may be entirely mistaken.
Proposed solution: - GET-POST-DATA, PARSE-MULTIPART-FORM-DATA, and MAYBE-READ-POST-PARAMETERS should respect the Content-Type header in the request and use that to define the external-format of the stream used to parse - RECOMPUTE-REQUEST-PARAMETERS should only use the Content-Type external format to parse the post parameters - PARSE-MULTIPART-FORM-DATA may need additional review to be in accordance with RFC2047 and RFC2388
Feedback, please.
Thanks, Red