Re: [hunchentoot-devel] [hunchentoot] wrong character set in parsing request (#1)

newer
[hunchentoot-devel] Hunchentoot...

older
[hunchentoot-devel] Hunchentoot...

John DeSoi

31 Oct 2011 31 Oct '11

12:50 p.m.

Since this issue has been closed on github, I'd like to repost to the list and get some feedback. It still seems like a bug to me.

...

An incoming request has

Content-Type: application/json; charset=UTF-8

but the server treats it as latin-1 (the default). It seems the code correctly parses the character set from the content type, but ignores it unless the type is "text". See parse-content-type.

On Oct 30, 2011, at 4:29 PM, Hans Hübner wrote:

...

application/json is always encoded as utf-8, at least that is what I understood so far.

So in order to correctly handle a application/json request, I have to explicitly control the request parsing, even though the request has already specified it is UTF-8. (defhandler (rpc-handler :uri "/rpc" :default-request-type :post) () (setf (tbnl:content-type*) "application/json") (setf (tbnl:reply-external-format*) +utf-8+) (invoke-rpc (tbnl:raw-post-data :external-format +utf-8+))) The parse-content-type function parses the "charset" part, but then completely ignores it if the mime type does not have "text" in it. If the content type had been "text/json", it would have worked correctly. John DeSoi, Ph.D.

Show replies by date

Hans Hübner

31 Oct 31 Oct

2:11 p.m.

New subject: [hunchentoot-devel] [hunchentoot] wrong character set in parsing request (#1)

On Mon, Oct 31, 2011 at 1:50 PM, John DeSoi <desoi@pgedit.com> wrote:

...

Since this issue has been closed on github, I'd like to repost to the list and get some feedback. It still seems like a bug to me.

I have continued the topic on github, even though the issue is closed. Hunchentoot makes no attempt in determining the content type for non-text bodies. I think that the piece of code that you've posted does the right thing. I also think that RAW-POST-DATA should return the body as octet vector rather than making a possibly false attempt at guessing the character set. If you have another suggestion, please let us know. -Hans

John DeSoi

2:44 p.m.

New subject: [hunchentoot-devel] [hunchentoot] wrong character set in parsing request (#1)

Hans, On Oct 31, 2011, at 10:11 AM, Hans Hübner wrote:

...

I have continued the topic on github, even though the issue is closed.

Hunchentoot makes no attempt in determining the content type for non-text bodies. I think that the piece of code that you've posted does the right thing. I also think that RAW-POST-DATA should return the body as octet vector rather than making a possibly false attempt at guessing the character set. If you have another suggestion, please let us know.

Does not the fact that charset is provided, tell you that it is textual? There is nothing magic about the mime time having "text" in it. I think there are a lot of mime types that are textual that don't have "text" as the subtype. In all cases Hunchentoot is going to give you the wrong character type by default (unless it matches by luck). The handler I provided is not a complete solution because, I assumed UTF-8 which is incorrect. Other character sets are allowed. To handle any text request correctly (that does not have "text" in the mime type), the handler will need to parse the charset itself. This repeats what Hunchentoot has already executed, but ignored because "text" was not in the mime subtype. This is not a big deal and I can work around it. But I suspect it will come up again in the near future :). Thanks for your feedback, John DeSoi, Ph.D.

Hans Hübner

2:58 p.m.

New subject: [hunchentoot-devel] [hunchentoot] wrong character set in parsing request (#1)

I think it is silly to conduct a conversation in two channels. Anyway: The applicability of the charset attribute in content type specifications is restricted to text content types as per rfc2045. At least that is my reading of it. Now, there is nothing wrong with being liberal in accepting, but in this case I actually tend to make RAW-POST-DATA return an octet vector for non-text content types and have the application handle the desired decoding. -Hans On Mon, Oct 31, 2011 at 3:44 PM, John DeSoi <desoi@pgedit.com> wrote:

...

Hans,

On Oct 31, 2011, at 10:11 AM, Hans Hübner wrote:

...
I have continued the topic on github, even though the issue is closed.

Hunchentoot makes no attempt in determining the content type for non-text bodies. I think that the piece of code that you've posted does the right thing. I also think that RAW-POST-DATA should return the body as octet vector rather than making a possibly false attempt at guessing the character set. If you have another suggestion, please let us know.

Does not the fact that charset is provided, tell you that it is textual? There is nothing magic about the mime time having "text" in it. I think there are a lot of mime types that are textual that don't have "text" as the subtype. In all cases Hunchentoot is going to give you the wrong character type by default (unless it matches by luck). The handler I provided is not a complete solution because, I assumed UTF-8 which is incorrect. Other character sets are allowed. To handle any text request correctly (that does not have "text" in the mime type), the handler will need to parse the charset itself. This repeats what Hunchentoot has already executed, but ignored because "text" was not in the mime subtype.

This is not a big deal and I can work around it. But I suspect it will come up again in the near future :).

Thanks for your feedback,

John DeSoi, Ph.D.

_______________________________________________ tbnl-devel site list tbnl-devel@common-lisp.net http://common-lisp.net/mailman/listinfo/tbnl-devel

John DeSoi

8:25 p.m.

New subject: [hunchentoot-devel] [hunchentoot] wrong character set in parsing request (#1)

I'll put further comments here. Fun stuff for issue #1 :) https://github.com/edicl/hunchentoot/issues/1 John DeSoi, Ph.D. On Oct 31, 2011, at 10:58 AM, Hans Hübner wrote:

...

I think it is silly to conduct a conversation in two channels. Anyway: The applicability of the charset attribute in content type specifications is restricted to text content types as per rfc2045. At least that is my reading of it.

Now, there is nothing wrong with being liberal in accepting, but in this case I actually tend to make RAW-POST-DATA return an octet vector for non-text content types and have the application handle the desired decoding.

4996

Age (days ago)

4996

Last active (days ago)

List overview

Download

4 comments

2 participants

participants (2)

Hans Hübner
John DeSoi