Let's say I have some Greek text in the url, which is encoded in utf-8. The string is "ελληνική". When the ht server receives the request with the string, it goes to url-decode with fails with the message: junk in string "%u03B5%u03BB%u03BB%u03B7%u03BD%u03B9%u03BA%u03AE" [Condition of type SB-INT:SIMPLE-PARSE-ERROR]
It can be reproduced by evaluating: (HUNCHENTOOT:URL-DECODE "%u03B5%u03BB%u03BB%u03B7%u03BD%u03B9%u03BA%u03AE" (flex:make-external-format :utf-8 :eol-style :lf))
Backtrace: 0: (PARSE-INTEGER "%u03B5%u03BB%u03BB%u03B7%u03BD%u03B9%u03BA%u03AE")[:EXTERNAL] 1: (URL-DECODE "%u03B5%u03BB%u03BB%u03B7%u03BD%u03B9%u03BA%u03AE" #<FLEXI-STREAMS::FLEXI-UTF-8-FORMAT (:UTF-8 :EOL-STYLE :LF) {18422651}>)
How can I decode Greek (or any other for that matter) text? I took Greek as an example as Latin/German text worked.
Thank you, Andrew
On Tue, 2 Sep 2008 18:27:16 -0400, "Andrei Stebakov" lispercat@gmail.com wrote:
Let's say I have some Greek text in the url, which is encoded in utf-8. The string is "ελληνική". When the ht server receives the request with the string, it goes to url-decode with fails with the message: junk in string "%u03B5%u03BB%u03BB%u03B7%u03BD%u03B9%u03BA%u03AE"
That's a non-standard syntax which is not supported.
http://en.wikipedia.org/wiki/Url_encoding#Non-standard_implementations
But both IE and FF send this kind of encoding. Anything can be done to make them send the supported encoding?
Thank you, Andrew
On Wed, Sep 3, 2008 at 2:50 AM, Edi Weitz edi@agharta.de wrote:
On Tue, 2 Sep 2008 18:27:16 -0400, "Andrei Stebakov" lispercat@gmail.com wrote:
Let's say I have some Greek text in the url, which is encoded in utf-8. The string is "ελληνική". When the ht server receives the request with the string, it goes to url-decode with fails with the message: junk in string "%u03B5%u03BB%u03BB%u03B7%u03BD%u03B9%u03BA%u03AE"
That's a non-standard syntax which is not supported.
http://en.wikipedia.org/wiki/Url_encoding#Non-standard_implementations _______________________________________________ tbnl-devel site list tbnl-devel@common-lisp.net http://common-lisp.net/mailman/listinfo/tbnl-devel
On Wed, 3 Sep 2008 10:13:52 -0400, "Andrei Stebakov" lispercat@gmail.com wrote:
But both IE and FF send this kind of encoding.
When do they do that?
They do it when I use JavaScript escape() function to encode some characters like #@, line feed, etc. If I don't use this function I lose those characters, but then I don't get the error message for the Greek or other languages (those characters are being displayed as question marks).
On Wed, Sep 3, 2008 at 10:30 AM, Edi Weitz edi@agharta.de wrote:
On Wed, 3 Sep 2008 10:13:52 -0400, "Andrei Stebakov" lispercat@gmail.com wrote:
But both IE and FF send this kind of encoding.
When do they do that? _______________________________________________ tbnl-devel site list tbnl-devel@common-lisp.net http://common-lisp.net/mailman/listinfo/tbnl-devel
I would be in favour of supporting the non-standard behavior and will come up with a patch to try, hoping that Edi accepts it.
-Hans
On Wed, Sep 3, 2008 at 17:40, Andrei Stebakov lispercat@gmail.com wrote:
They do it when I use JavaScript escape() function to encode some characters like #@, line feed, etc. If I don't use this function I lose those characters, but then I don't get the error message for the Greek or other languages (those characters are being displayed as question marks).
On Wed, Sep 3, 2008 at 10:30 AM, Edi Weitz edi@agharta.de wrote:
On Wed, 3 Sep 2008 10:13:52 -0400, "Andrei Stebakov" lispercat@gmail.com wrote:
But both IE and FF send this kind of encoding.
When do they do that? _______________________________________________ tbnl-devel site list tbnl-devel@common-lisp.net http://common-lisp.net/mailman/listinfo/tbnl-devel
tbnl-devel site list tbnl-devel@common-lisp.net http://common-lisp.net/mailman/listinfo/tbnl-devel
On Wed, Sep 3, 2008 at 17:45, Hans Hübner hans@huebner.org wrote:
I would be in favour of supporting the non-standard behavior and will come up with a patch to try, hoping that Edi accepts it.
I took this as an opportunity to work on overcoming my aversion against loop. Please see http://bknr.net/trac/changeset/3785 (http://bknr.net/trac/changeset/3785?format=diff&new=3785) for a patch. Let me know if it works for you.
-Hans
Kilian reported that URL-DECODE of an empty string did not work with my new implementation. Here is the corrected patch:
http://bknr.net/trac/changeset?format=diff&new=3790&old=3784&new...
-Hans
On Thu, Sep 4, 2008 at 11:28, Hans Hübner hans@huebner.org wrote:
On Wed, Sep 3, 2008 at 17:45, Hans Hübner hans@huebner.org wrote:
I would be in favour of supporting the non-standard behavior and will come up with a patch to try, hoping that Edi accepts it.
I took this as an opportunity to work on overcoming my aversion against loop. Please see http://bknr.net/trac/changeset/3785 (http://bknr.net/trac/changeset/3785?format=diff&new=3785) for a patch. Let me know if it works for you.
-Hans
For hunchentoot-0.15.7 the patch is rejected. What version of hunchentoot this patch is for?
Thank you, Andrew
On Thu, Sep 4, 2008 at 9:12 AM, Hans Hübner hans@huebner.org wrote:
Kilian reported that URL-DECODE of an empty string did not work with my new implementation. Here is the corrected patch:
http://bknr.net/trac/changeset?format=diff&new=3790&old=3784&new...
-Hans
On Thu, Sep 4, 2008 at 11:28, Hans Hübner hans@huebner.org wrote:
On Wed, Sep 3, 2008 at 17:45, Hans Hübner hans@huebner.org wrote:
I would be in favour of supporting the non-standard behavior and will come up with a patch to try, hoping that Edi accepts it.
I took this as an opportunity to work on overcoming my aversion against loop. Please see http://bknr.net/trac/changeset/3785 (http://bknr.net/trac/changeset/3785?format=diff&new=3785) for a patch. Let me know if it works for you.
-Hans
tbnl-devel site list tbnl-devel@common-lisp.net http://common-lisp.net/mailman/listinfo/tbnl-devel
On Thu, Sep 4, 2008 at 20:54, Andrei Stebakov lispercat@gmail.com wrote:
For hunchentoot-0.15.7 the patch is rejected. What version of hunchentoot this patch is for?
The patch is against the development version, but it should be manually appliable to the release version, too, as it only reimplements URL-DECODE.
-Hans
Yes, Hans, it works with the patch. It displays cubes instead of characters but I think it's because the fonts are not installed on the system. Much better than the crash I used to have!
Thank you, Andrew
On Thu, Sep 4, 2008 at 3:10 PM, Hans Hübner hans@huebner.org wrote:
On Thu, Sep 4, 2008 at 20:54, Andrei Stebakov lispercat@gmail.com wrote:
For hunchentoot-0.15.7 the patch is rejected. What version of hunchentoot this patch is for?
The patch is against the development version, but it should be manually appliable to the release version, too, as it only reimplements URL-DECODE.
-Hans _______________________________________________ tbnl-devel site list tbnl-devel@common-lisp.net http://common-lisp.net/mailman/listinfo/tbnl-devel
On Wed, 3 Sep 2008 11:40:32 -0400, "Andrei Stebakov" lispercat@gmail.com wrote:
They do it when I use JavaScript escape() function
See the link I sent earlier.
Oh, you mean using JS urlEncode and then use HT url-decode? I'll try it.**
On Wed, Sep 3, 2008 at 12:07 PM, Edi Weitz edi@agharta.de wrote:
On Wed, 3 Sep 2008 11:40:32 -0400, "Andrei Stebakov" lispercat@gmail.com wrote:
They do it when I use JavaScript escape() function
See the link I sent earlier. _______________________________________________ tbnl-devel site list tbnl-devel@common-lisp.net http://common-lisp.net/mailman/listinfo/tbnl-devel
Well, encodeURIComponent used for text only part solves the non-standard language problem (strange characters, but no lisp exception), but it messes up some characters which are handled properly by escape() function (i.e "Ciarán" is handled by escape() but not encodeURIComponent()).
Andrew
On Wed, Sep 3, 2008 at 12:11 PM, Andrei Stebakov lispercat@gmail.comwrote:
Oh, you mean using JS urlEncode and then use HT url-decode? I'll try it. **
On Wed, Sep 3, 2008 at 12:07 PM, Edi Weitz edi@agharta.de wrote:
On Wed, 3 Sep 2008 11:40:32 -0400, "Andrei Stebakov" lispercat@gmail.com wrote:
They do it when I use JavaScript escape() function
See the link I sent earlier. _______________________________________________ tbnl-devel site list tbnl-devel@common-lisp.net http://common-lisp.net/mailman/listinfo/tbnl-devel
On Wed, 3 Sep 2008 12:37:16 -0400, "Andrei Stebakov" lispercat@gmail.com wrote:
Well, encodeURIComponent used for text only part solves the non-standard language problem (strange characters, but no lisp exception), but it messes up some characters which are handled properly by escape() function (i.e "Ciarán" is handled by escape() but not encodeURIComponent()).
The Wiki page mentioned a function called encodeURI. That's not there?
Yes, it's there, I tried it but for some reason I get an exception on lisp side when the whole URI is encoded. Then I started to use encodeURIComponent() with which I only encode the text part of the GET request. This works, only it messes up some of the non-standard characters.
On Wed, Sep 3, 2008 at 1:57 PM, Edi Weitz edi@agharta.de wrote:
On Wed, 3 Sep 2008 12:37:16 -0400, "Andrei Stebakov" lispercat@gmail.com wrote:
Well, encodeURIComponent used for text only part solves the non-standard language problem (strange characters, but no lisp exception), but it messes up some characters which are handled properly by escape() function (i.e "Ciarán" is handled by escape() but not encodeURIComponent()).
The Wiki page mentioned a function called encodeURI. That's not there? _______________________________________________ tbnl-devel site list tbnl-devel@common-lisp.net http://common-lisp.net/mailman/listinfo/tbnl-devel