Re: A bug in functon parse-content-type.

8 Apr 2015

      Jingtao,

please refer to http://www.w3.org/Protocols/rfc2616/rfc2616-sec3.html#sec3.7,
it clearly describes that a media type consists of exactly one type/subtype
indicator followed by optional attribute=value pairs.  The content type
that you have presented is not valid according to these rules.   Neither a
lax parser like the one in CL-HTTP nor the fact that a large site sends
these bogus headers makes them valid.  I do not want to include code in
Hunchentoot that tries to interpret such bogus data.

However, if you cannot get your trading partner to fix their client, I can
offer this solution:

(defclass request-with-bad-content-type (hunchentoot:request)
  ())

(defmethod hunchentoot:header-in :around ((name (eql :content-type))
(request request-with-bad-content-type))
  (alexandria:when-let (content-type (call-next-method))
    (ppcre:regex-replace-all "^([^/]+/[^/]+); *[^/]+/[^/;]+" content-type
"\\1")))

You'll then have to use the :request-class argument to your acceptor
instantiation to make it use the request-with-bad-content-type class.  You
also want to review the regular expression carefully and maybe profile your
application to see whether you need to cache or otherwise improve
performance.

-Hans

On Sun, May 26, 2013 at 5:07 AM, Jingtao Xu <jingtaozf@gmail.com> wrote:
...
Hi Hans,
I don't agree with you to say that this content type header is just bogus.
As the content-type is sent by the largest B2B/B2C site in china, it
must have a reason.
And if you try cl-http, you can find that cl-http will parse such
content type correctly.
-----------------------------------------------------------------------------
(parse-mime-content-type-header "application/x-www-form-urlencoded;
text/html; charset=UTF-8")
   ==> (:APPLICATION :X-WWW-FORM-URLENCODED :CHARSET :UTF-8)
-----------------------------------------------------------------------------
You can find the definition in cl-http/server/headers.lisp
-----------------------------------------------------------------------------
(define-header-type :content-type-header (:header)
  :parse-function parse-mime-content-type-header
  :print-function print-mime-content-type-header)
-----------------------------------------------------------------------------
Even this content-type header is bogus(actually I don't think so),
hunchentoot/drakma should parse
the header without raising an error if one special variable like *
accept-bogus-content-type* is true.
With Best Regards,
jingtao.
On Sat, May 25, 2013 at 8:11 PM, Hans Hübner <hans.huebner@gmail.com>
wrote:
...
Jingtao,
the content-type header "application/x-www-form-urlencoded; text/html;
charset=UTF-8" is just bogus.  I do not want to include code that makes
Hunchentoot work with clearly broken clients.  Better error reporting
would
be acceptable, though.
-Hans
On Sat, May 25, 2013 at 12:38 PM, Jingtao Xu <jingtaozf@gmail.com>
wrote:
...
Hi all,
I found the content type header which raise the bug in my message.log
generated by hunchentoot.
It happened when hunchentoot get following content type header:

...
...
application/x-www-form-urlencoded; text/html; charset=UTF-8

...
...
I noticed that in package drakma's file read.lisp,function
'get-content-type'
also assumed "/" as a token separator.
I hope package chunga/drakma/hunchentoot could accept such content type
header
without raising an exception,As Edl said,a new special variable
similar to *accept-bogus-eols* or
*treat-semicolon-as-continuation* which only assume " ,;" as token
separator may be a good idea and will fix my question.
Any way, RFC standard is not well fit with the read world.
Thanks very much.
WIth Best Regards,
jingtao.
On Thu, May 23, 2013 at 2:01 PM, Edi Weitz <edi@agharta.de> wrote:
...
I'm not the maintainer anymore, but my take is that if some Ruby or
Java client misinterprets the RFC I wouldn't change Hunchentoot's (or
rather Chunga's) default behavior because of that.  I'd rather
introduce a new special variable similar to *accept-bogus-eols* or
*treat-semicolon-as-continuation*.
Just my .02 Euros,
Edi.
On Thu, May 23, 2013 at 2:52 AM, Jingtao Xu <jingtaozf@gmail.com>
wrote:
...
...
...
Hi All,
1. The function `read-name-value-pair' is called by `
parse-content-type' in hunchentoo/util.lisp,not by my codes.
2. the slash is a token constituent in java/ruby implementation,and I
think some web client/server treat it as a token constituent too,
    but I am waiting for the hunchentoot log to give us a live
example.
With Best Regards,
jingtao
On Wed, May 22, 2013 at 11:40 PM, Edi Weitz <edi@agharta.de> wrote:
...
If I'm not mistaken, the slash is a "separator" and thus not a token
constituent according to RFC 2616 which means "path=/foo" is not
legal
input for READ-NAME-VALUE-PAIR.
On Wed, May 22, 2013 at 5:27 PM, Ron Garret <ron@flownet.com>
wrote:
> Very likely Jingtao's code is calling READ-NAME-VALUE-PAIR without
> being wrapped in this macro
>
> But there's still a bug in READ-NAME-VALUE-PAIR:
>
> ? (WITH-INPUT-FROM-VECTOR (S (MAP '(VECTOR (UNSIGNED-BYTE 8))
> 'CHAR-CODE "path=/foo"))
>   (chunga:with-character-stream-semantics
>       (CHUNGA:READ-NAME-VALUE-PAIR S)))
> ("path" . "")
>
> On May 22, 2013, at 8:19 AM, Edi Weitz wrote:
>
>> On Wed, May 22, 2013 at 4:18 PM, Ron Garret <ron@flownet.com>
wrote:
>>> I found a bug in CHUNGA:READ-NAME-VALUE-PAIR.
>>
>> It's not quite clear to me yet what the bug is supposed to be.
>>
>> The documentation clearly says that calls to READ-NAME-VALUE-PAIR
>> and
>> friends must be wrapped with this macro:
>>
>>  http://weitz.de/chunga/#with-character-stream-semantics
>>
>> (You might argue that this isn't very user-friendly, but Chunga
>> wasn't
>> really intended to be used that way.)
>