About 6 months ago I got some strange encoding errors with a Hunchentoot web server. There are a few of places in Hunchentoot where the +latin-1+ character encoding is used as the external format regardless of headers received from the client:
- GET-POST-DATA returns a +latin-1+ externally encoded stream no matter what when the WANT-STREAM parameter is true. - PARSE-MULTIPART-FORM-DATA creates a +latin-1+ stream from the CONTENT-STREAM of the request. (relevant RFC: 2388) - MAYBE-READ-POST-PARAMETERS uses +latin-1+ to process "application/x-www-form-urlencoded" content-type POST bodies
In addition, RECOMPUTE-REQUEST-PARAMETERS seems to interpret both the message body and the query string according to a charset in the request header. I thought that Content-Type was only supposed to affect the message body, not the headers (which are assumed to be in ASCII). Then shouldn't the URL and query string always be read as ASCII? RFC2047 discusses non-ascii headers for MIME, but I don't know if that is relevant except for parsing multipart forms.
I'm not thoroughly versed in the HTTP protocol, but it seems that these are bugs in Hunchentoot. I have a half-completed patch but I want to get some more opinions before I go any further. There may also be other lurking encoding issues in Hunchentoot, or I may be entirely mistaken.
Proposed solution: - GET-POST-DATA, PARSE-MULTIPART-FORM-DATA, and MAYBE-READ-POST-PARAMETERS should respect the Content-Type header in the request and use that to define the external-format of the stream used to parse - RECOMPUTE-REQUEST-PARAMETERS should only use the Content-Type external format to parse the post parameters - PARSE-MULTIPART-FORM-DATA may need additional review to be in accordance with RFC2047 and RFC2388
Feedback, please.
Thanks, Red
Red,
I'd suggest that you make yourself thoroughly familiar with the relevant RFCs and supply a patch once you are sure that Hunchentoot is buggy. I know that there are some places in Hunchentoot that assume Latin 1 encoding, but I also faintly remember that I have checked RFC conformance in some of these cases years ago. Additionally, before changing Hunchentoot, it'd be very nice to have a case that exposes non-conformant behavior. I'm not saying that Hunchentoot is bug free, but clients are generally buggy as well and we don't want to cater for buggy clients in general.
-Hans
On Sat, Jun 5, 2010 at 20:57, Red Daly reddaly@gmail.com wrote:
About 6 months ago I got some strange encoding errors with a Hunchentoot web server. There are a few of places in Hunchentoot where the +latin-1+ character encoding is used as the external format regardless of headers received from the client:
- GET-POST-DATA returns a +latin-1+ externally encoded stream no
matter what when the WANT-STREAM parameter is true.
- PARSE-MULTIPART-FORM-DATA creates a +latin-1+ stream from the
CONTENT-STREAM of the request. (relevant RFC: 2388)
- MAYBE-READ-POST-PARAMETERS uses +latin-1+ to process
"application/x-www-form-urlencoded" content-type POST bodies
In addition, RECOMPUTE-REQUEST-PARAMETERS seems to interpret both the message body and the query string according to a charset in the request header. I thought that Content-Type was only supposed to affect the message body, not the headers (which are assumed to be in ASCII). Then shouldn't the URL and query string always be read as ASCII? RFC2047 discusses non-ascii headers for MIME, but I don't know if that is relevant except for parsing multipart forms.
I'm not thoroughly versed in the HTTP protocol, but it seems that these are bugs in Hunchentoot. I have a half-completed patch but I want to get some more opinions before I go any further. There may also be other lurking encoding issues in Hunchentoot, or I may be entirely mistaken.
Proposed solution:
- GET-POST-DATA, PARSE-MULTIPART-FORM-DATA, and
MAYBE-READ-POST-PARAMETERS should respect the Content-Type header in the request and use that to define the external-format of the stream used to parse
- RECOMPUTE-REQUEST-PARAMETERS should only use the Content-Type
external format to parse the post parameters
- PARSE-MULTIPART-FORM-DATA may need additional review to be in
accordance with RFC2047 and RFC2388
Feedback, please.
Thanks, Red
tbnl-devel site list tbnl-devel@common-lisp.net http://common-lisp.net/mailman/listinfo/tbnl-devel