Since this issue has been closed on github, I'd like to repost to the list and get some feedback. It still seems like a bug to me.
An incoming request has
Content-Type: application/json; charset=UTF-8
but the server treats it as latin-1 (the default). It seems the code correctly parses the character set from the content type, but ignores it unless the type is "text". See parse-content-type.
On Oct 30, 2011, at 4:29 PM, Hans Hübner wrote:
application/json is always encoded as utf-8, at least that is what I understood so far.
So in order to correctly handle a application/json request, I have to explicitly control the request parsing, even though the request has already specified it is UTF-8.
(defhandler (rpc-handler :uri "/rpc" :default-request-type :post) () (setf (tbnl:content-type*) "application/json") (setf (tbnl:reply-external-format*) +utf-8+) (invoke-rpc (tbnl:raw-post-data :external-format +utf-8+)))
The parse-content-type function parses the "charset" part, but then completely ignores it if the mime type does not have "text" in it. If the content type had been "text/json", it would have worked correctly.
John DeSoi, Ph.D.
On Mon, Oct 31, 2011 at 1:50 PM, John DeSoi desoi@pgedit.com wrote:
Since this issue has been closed on github, I'd like to repost to the list and get some feedback. It still seems like a bug to me.
I have continued the topic on github, even though the issue is closed.
Hunchentoot makes no attempt in determining the content type for non-text bodies. I think that the piece of code that you've posted does the right thing. I also think that RAW-POST-DATA should return the body as octet vector rather than making a possibly false attempt at guessing the character set. If you have another suggestion, please let us know.
-Hans
Hans,
On Oct 31, 2011, at 10:11 AM, Hans Hübner wrote:
I have continued the topic on github, even though the issue is closed.
Hunchentoot makes no attempt in determining the content type for non-text bodies. I think that the piece of code that you've posted does the right thing. I also think that RAW-POST-DATA should return the body as octet vector rather than making a possibly false attempt at guessing the character set. If you have another suggestion, please let us know.
Does not the fact that charset is provided, tell you that it is textual? There is nothing magic about the mime time having "text" in it. I think there are a lot of mime types that are textual that don't have "text" as the subtype. In all cases Hunchentoot is going to give you the wrong character type by default (unless it matches by luck). The handler I provided is not a complete solution because, I assumed UTF-8 which is incorrect. Other character sets are allowed. To handle any text request correctly (that does not have "text" in the mime type), the handler will need to parse the charset itself. This repeats what Hunchentoot has already executed, but ignored because "text" was not in the mime subtype.
This is not a big deal and I can work around it. But I suspect it will come up again in the near future :).
Thanks for your feedback,
John DeSoi, Ph.D.
I think it is silly to conduct a conversation in two channels. Anyway: The applicability of the charset attribute in content type specifications is restricted to text content types as per rfc2045. At least that is my reading of it.
Now, there is nothing wrong with being liberal in accepting, but in this case I actually tend to make RAW-POST-DATA return an octet vector for non-text content types and have the application handle the desired decoding.
-Hans
On Mon, Oct 31, 2011 at 3:44 PM, John DeSoi desoi@pgedit.com wrote:
Hans,
On Oct 31, 2011, at 10:11 AM, Hans Hübner wrote:
I have continued the topic on github, even though the issue is closed.
Hunchentoot makes no attempt in determining the content type for non-text bodies. I think that the piece of code that you've posted does the right thing. I also think that RAW-POST-DATA should return the body as octet vector rather than making a possibly false attempt at guessing the character set. If you have another suggestion, please let us know.
Does not the fact that charset is provided, tell you that it is textual? There is nothing magic about the mime time having "text" in it. I think there are a lot of mime types that are textual that don't have "text" as the subtype. In all cases Hunchentoot is going to give you the wrong character type by default (unless it matches by luck). The handler I provided is not a complete solution because, I assumed UTF-8 which is incorrect. Other character sets are allowed. To handle any text request correctly (that does not have "text" in the mime type), the handler will need to parse the charset itself. This repeats what Hunchentoot has already executed, but ignored because "text" was not in the mime subtype.
This is not a big deal and I can work around it. But I suspect it will come up again in the near future :).
Thanks for your feedback,
John DeSoi, Ph.D.
tbnl-devel site list tbnl-devel@common-lisp.net http://common-lisp.net/mailman/listinfo/tbnl-devel
I'll put further comments here. Fun stuff for issue #1 :)
https://github.com/edicl/hunchentoot/issues/1
John DeSoi, Ph.D.
On Oct 31, 2011, at 10:58 AM, Hans Hübner wrote:
I think it is silly to conduct a conversation in two channels. Anyway: The applicability of the charset attribute in content type specifications is restricted to text content types as per rfc2045. At least that is my reading of it.
Now, there is nothing wrong with being liberal in accepting, but in this case I actually tend to make RAW-POST-DATA return an octet vector for non-text content types and have the application handle the desired decoding.