Jingtao,
please refer to http://www.w3.org/Protocols/rfc2616/rfc2616-sec3.html#sec3.7, it clearly describes that a media type consists of exactly one type/subtype indicator followed by optional attribute=value pairs. The content type that you have presented is not valid according to these rules. Neither a lax parser like the one in CL-HTTP nor the fact that a large site sends these bogus headers makes them valid. I do not want to include code in Hunchentoot that tries to interpret such bogus data.
However, if you cannot get your trading partner to fix their client, I can offer this solution:
(defclass request-with-bad-content-type (hunchentoot:request) ())
(defmethod hunchentoot:header-in :around ((name (eql :content-type)) (request request-with-bad-content-type)) (alexandria:when-let (content-type (call-next-method)) (ppcre:regex-replace-all "^([^/]+/[^/]+); *[^/]+/[^/;]+" content-type "\1")))
You'll then have to use the :request-class argument to your acceptor instantiation to make it use the request-with-bad-content-type class. You also want to review the regular expression carefully and maybe profile your application to see whether you need to cache or otherwise improve performance.
-Hans
On Sun, May 26, 2013 at 5:07 AM, Jingtao Xu jingtaozf@gmail.com wrote:
Hi Hans,
I don't agree with you to say that this content type header is just bogus. As the content-type is sent by the largest B2B/B2C site in china, it must have a reason.
And if you try cl-http, you can find that cl-http will parse such content type correctly.
(parse-mime-content-type-header "application/x-www-form-urlencoded; text/html; charset=UTF-8") ==> (:APPLICATION :X-WWW-FORM-URLENCODED :CHARSET :UTF-8)
You can find the definition in cl-http/server/headers.lisp
(define-header-type :content-type-header (:header) :parse-function parse-mime-content-type-header :print-function print-mime-content-type-header)
Even this content-type header is bogus(actually I don't think so), hunchentoot/drakma should parse the header without raising an error if one special variable like * accept-bogus-content-type* is true.
With Best Regards, jingtao.
On Sat, May 25, 2013 at 8:11 PM, Hans Hübner hans.huebner@gmail.com wrote:
Jingtao,
the content-type header "application/x-www-form-urlencoded; text/html; charset=UTF-8" is just bogus. I do not want to include code that makes Hunchentoot work with clearly broken clients. Better error reporting
would
be acceptable, though.
-Hans
On Sat, May 25, 2013 at 12:38 PM, Jingtao Xu jingtaozf@gmail.com
wrote:
Hi all,
I found the content type header which raise the bug in my message.log generated by hunchentoot. It happened when hunchentoot get following content type header:
application/x-www-form-urlencoded; text/html; charset=UTF-8
I noticed that in package drakma's file read.lisp,function 'get-content-type' also assumed "/" as a token separator.
I hope package chunga/drakma/hunchentoot could accept such content type header without raising an exception,As Edl said,a new special variable similar to *accept-bogus-eols* or *treat-semicolon-as-continuation* which only assume " ,;" as token separator may be a good idea and will fix my question.
Any way, RFC standard is not well fit with the read world.
Thanks very much.
WIth Best Regards, jingtao.
On Thu, May 23, 2013 at 2:01 PM, Edi Weitz edi@agharta.de wrote:
I'm not the maintainer anymore, but my take is that if some Ruby or Java client misinterprets the RFC I wouldn't change Hunchentoot's (or rather Chunga's) default behavior because of that. I'd rather introduce a new special variable similar to *accept-bogus-eols* or *treat-semicolon-as-continuation*.
Just my .02 Euros, Edi.
On Thu, May 23, 2013 at 2:52 AM, Jingtao Xu jingtaozf@gmail.com
wrote:
Hi All,
- The function `read-name-value-pair' is called by `
parse-content-type' in hunchentoo/util.lisp,not by my codes. 2. the slash is a token constituent in java/ruby implementation,and I think some web client/server treat it as a token constituent too, but I am waiting for the hunchentoot log to give us a live
example.
With Best Regards, jingtao
On Wed, May 22, 2013 at 11:40 PM, Edi Weitz edi@agharta.de wrote:
If I'm not mistaken, the slash is a "separator" and thus not a token constituent according to RFC 2616 which means "path=/foo" is not
legal
input for READ-NAME-VALUE-PAIR.
On Wed, May 22, 2013 at 5:27 PM, Ron Garret ron@flownet.com
wrote:
> Very likely Jingtao's code is calling READ-NAME-VALUE-PAIR without > being wrapped in this macro > > But there's still a bug in READ-NAME-VALUE-PAIR: > > ? (WITH-INPUT-FROM-VECTOR (S (MAP '(VECTOR (UNSIGNED-BYTE 8)) > 'CHAR-CODE "path=/foo")) > (chunga:with-character-stream-semantics > (CHUNGA:READ-NAME-VALUE-PAIR S))) > ("path" . "") > > On May 22, 2013, at 8:19 AM, Edi Weitz wrote: > >> On Wed, May 22, 2013 at 4:18 PM, Ron Garret ron@flownet.com
wrote:
>>> I found a bug in CHUNGA:READ-NAME-VALUE-PAIR. >> >> It's not quite clear to me yet what the bug is supposed to be. >> >> The documentation clearly says that calls to READ-NAME-VALUE-PAIR >> and >> friends must be wrapped with this macro: >> >> http://weitz.de/chunga/#with-character-stream-semantics >> >> (You might argue that this isn't very user-friendly, but Chunga >> wasn't >> really intended to be used that way.) >