[hunchentoot-devel] url-decode with Unicode text

Let's say I have some Greek text in the url, which is encoded in utf-8. The string is "ελληνική". When the ht server receives the request with the string, it goes to url-decode with fails with the message: junk in string "%u03B5%u03BB%u03BB%u03B7%u03BD%u03B9%u03BA%u03AE" [Condition of type SB-INT:SIMPLE-PARSE-ERROR] It can be reproduced by evaluating: (HUNCHENTOOT:URL-DECODE "%u03B5%u03BB%u03BB%u03B7%u03BD%u03B9%u03BA%u03AE" (flex:make-external-format :utf-8 :eol-style :lf)) Backtrace: 0: (PARSE-INTEGER "%u03B5%u03BB%u03BB%u03B7%u03BD%u03B9%u03BA%u03AE")[:EXTERNAL] 1: (URL-DECODE "%u03B5%u03BB%u03BB%u03B7%u03BD%u03B9%u03BA%u03AE" #<FLEXI-STREAMS::FLEXI-UTF-8-FORMAT (:UTF-8 :EOL-STYLE :LF) {18422651}>) How can I decode Greek (or any other for that matter) text? I took Greek as an example as Latin/German text worked. Thank you, Andrew

On Tue, 2 Sep 2008 18:27:16 -0400, "Andrei Stebakov" <lispercat@gmail.com> wrote:
Let's say I have some Greek text in the url, which is encoded in utf-8. The string is "ελληνική". When the ht server receives the request with the string, it goes to url-decode with fails with the message: junk in string "%u03B5%u03BB%u03BB%u03B7%u03BD%u03B9%u03BA%u03AE"
That's a non-standard syntax which is not supported. http://en.wikipedia.org/wiki/Url_encoding#Non-standard_implementations

But both IE and FF send this kind of encoding. Anything can be done to make them send the supported encoding? Thank you, Andrew On Wed, Sep 3, 2008 at 2:50 AM, Edi Weitz <edi@agharta.de> wrote:
On Tue, 2 Sep 2008 18:27:16 -0400, "Andrei Stebakov" <lispercat@gmail.com> wrote:
Let's say I have some Greek text in the url, which is encoded in utf-8. The string is "ελληνική". When the ht server receives the request with the string, it goes to url-decode with fails with the message: junk in string "%u03B5%u03BB%u03BB%u03B7%u03BD%u03B9%u03BA%u03AE"
That's a non-standard syntax which is not supported.
http://en.wikipedia.org/wiki/Url_encoding#Non-standard_implementations _______________________________________________ tbnl-devel site list tbnl-devel@common-lisp.net http://common-lisp.net/mailman/listinfo/tbnl-devel

They do it when I use JavaScript escape() function to encode some characters like #@, line feed, etc. If I don't use this function I lose those characters, but then I don't get the error message for the Greek or other languages (those characters are being displayed as question marks). On Wed, Sep 3, 2008 at 10:30 AM, Edi Weitz <edi@agharta.de> wrote:
On Wed, 3 Sep 2008 10:13:52 -0400, "Andrei Stebakov" <lispercat@gmail.com> wrote:
But both IE and FF send this kind of encoding.
When do they do that? _______________________________________________ tbnl-devel site list tbnl-devel@common-lisp.net http://common-lisp.net/mailman/listinfo/tbnl-devel

I would be in favour of supporting the non-standard behavior and will come up with a patch to try, hoping that Edi accepts it. -Hans On Wed, Sep 3, 2008 at 17:40, Andrei Stebakov <lispercat@gmail.com> wrote:
They do it when I use JavaScript escape() function to encode some characters like #@, line feed, etc. If I don't use this function I lose those characters, but then I don't get the error message for the Greek or other languages (those characters are being displayed as question marks).
On Wed, Sep 3, 2008 at 10:30 AM, Edi Weitz <edi@agharta.de> wrote:
On Wed, 3 Sep 2008 10:13:52 -0400, "Andrei Stebakov" <lispercat@gmail.com> wrote:
But both IE and FF send this kind of encoding.
When do they do that? _______________________________________________ tbnl-devel site list tbnl-devel@common-lisp.net http://common-lisp.net/mailman/listinfo/tbnl-devel
_______________________________________________ tbnl-devel site list tbnl-devel@common-lisp.net http://common-lisp.net/mailman/listinfo/tbnl-devel

On Wed, Sep 3, 2008 at 17:45, Hans Hübner <hans@huebner.org> wrote:
I would be in favour of supporting the non-standard behavior and will come up with a patch to try, hoping that Edi accepts it.
I took this as an opportunity to work on overcoming my aversion against loop. Please see http://bknr.net/trac/changeset/3785 (http://bknr.net/trac/changeset/3785?format=diff&new=3785) for a patch. Let me know if it works for you. -Hans

Kilian reported that URL-DECODE of an empty string did not work with my new implementation. Here is the corrected patch: http://bknr.net/trac/changeset?format=diff&new=3790&old=3784&new_path=trunk%2Fthirdparty%2Fhunchentoot%2Futil.lisp&old_path=trunk%2Fthirdparty%2Fhunchentoot%2Futil.lisp -Hans On Thu, Sep 4, 2008 at 11:28, Hans Hübner <hans@huebner.org> wrote:
On Wed, Sep 3, 2008 at 17:45, Hans Hübner <hans@huebner.org> wrote:
I would be in favour of supporting the non-standard behavior and will come up with a patch to try, hoping that Edi accepts it.
I took this as an opportunity to work on overcoming my aversion against loop. Please see http://bknr.net/trac/changeset/3785 (http://bknr.net/trac/changeset/3785?format=diff&new=3785) for a patch. Let me know if it works for you.
-Hans

For hunchentoot-0.15.7 the patch is rejected. What version of hunchentoot this patch is for? Thank you, Andrew On Thu, Sep 4, 2008 at 9:12 AM, Hans Hübner <hans@huebner.org> wrote:
Kilian reported that URL-DECODE of an empty string did not work with my new implementation. Here is the corrected patch:
-Hans
On Thu, Sep 4, 2008 at 11:28, Hans Hübner <hans@huebner.org> wrote:
On Wed, Sep 3, 2008 at 17:45, Hans Hübner <hans@huebner.org> wrote:
I would be in favour of supporting the non-standard behavior and will come up with a patch to try, hoping that Edi accepts it.
I took this as an opportunity to work on overcoming my aversion against loop. Please see http://bknr.net/trac/changeset/3785 (http://bknr.net/trac/changeset/3785?format=diff&new=3785) for a patch. Let me know if it works for you.
-Hans
_______________________________________________ tbnl-devel site list tbnl-devel@common-lisp.net http://common-lisp.net/mailman/listinfo/tbnl-devel

On Thu, Sep 4, 2008 at 20:54, Andrei Stebakov <lispercat@gmail.com> wrote:
For hunchentoot-0.15.7 the patch is rejected. What version of hunchentoot this patch is for?
The patch is against the development version, but it should be manually appliable to the release version, too, as it only reimplements URL-DECODE. -Hans

Yes, Hans, it works with the patch. It displays cubes instead of characters but I think it's because the fonts are not installed on the system. Much better than the crash I used to have! Thank you, Andrew On Thu, Sep 4, 2008 at 3:10 PM, Hans Hübner <hans@huebner.org> wrote:
On Thu, Sep 4, 2008 at 20:54, Andrei Stebakov <lispercat@gmail.com> wrote:
For hunchentoot-0.15.7 the patch is rejected. What version of hunchentoot this patch is for?
The patch is against the development version, but it should be manually appliable to the release version, too, as it only reimplements URL-DECODE.
-Hans _______________________________________________ tbnl-devel site list tbnl-devel@common-lisp.net http://common-lisp.net/mailman/listinfo/tbnl-devel

Oh, you mean using JS urlEncode and then use HT url-decode? I'll try it.** On Wed, Sep 3, 2008 at 12:07 PM, Edi Weitz <edi@agharta.de> wrote:
On Wed, 3 Sep 2008 11:40:32 -0400, "Andrei Stebakov" <lispercat@gmail.com> wrote:
They do it when I use JavaScript escape() function
See the link I sent earlier. _______________________________________________ tbnl-devel site list tbnl-devel@common-lisp.net http://common-lisp.net/mailman/listinfo/tbnl-devel

Well, encodeURIComponent used for text only part solves the non-standard language problem (strange characters, but no lisp exception), but it messes up some characters which are handled properly by escape() function (i.e "Ciarán" is handled by escape() but not encodeURIComponent()). Andrew On Wed, Sep 3, 2008 at 12:11 PM, Andrei Stebakov <lispercat@gmail.com>wrote:
Oh, you mean using JS urlEncode and then use HT url-decode? I'll try it. **
On Wed, Sep 3, 2008 at 12:07 PM, Edi Weitz <edi@agharta.de> wrote:
On Wed, 3 Sep 2008 11:40:32 -0400, "Andrei Stebakov" <lispercat@gmail.com> wrote:
They do it when I use JavaScript escape() function
See the link I sent earlier. _______________________________________________ tbnl-devel site list tbnl-devel@common-lisp.net http://common-lisp.net/mailman/listinfo/tbnl-devel

On Wed, 3 Sep 2008 12:37:16 -0400, "Andrei Stebakov" <lispercat@gmail.com> wrote:
Well, encodeURIComponent used for text only part solves the non-standard language problem (strange characters, but no lisp exception), but it messes up some characters which are handled properly by escape() function (i.e "Ciarán" is handled by escape() but not encodeURIComponent()).
The Wiki page mentioned a function called encodeURI. That's not there?

Yes, it's there, I tried it but for some reason I get an exception on lisp side when the whole URI is encoded. Then I started to use encodeURIComponent() with which I only encode the text part of the GET request. This works, only it messes up some of the non-standard characters. On Wed, Sep 3, 2008 at 1:57 PM, Edi Weitz <edi@agharta.de> wrote:
On Wed, 3 Sep 2008 12:37:16 -0400, "Andrei Stebakov" <lispercat@gmail.com> wrote:
Well, encodeURIComponent used for text only part solves the non-standard language problem (strange characters, but no lisp exception), but it messes up some characters which are handled properly by escape() function (i.e "Ciarán" is handled by escape() but not encodeURIComponent()).
The Wiki page mentioned a function called encodeURI. That's not there? _______________________________________________ tbnl-devel site list tbnl-devel@common-lisp.net http://common-lisp.net/mailman/listinfo/tbnl-devel
participants (3)
-
Andrei Stebakov
-
Edi Weitz
-
Hans Hübner