[Please use the mailing list - see Cc (and register first).]
Hi Will!
On Wed, 21 Sep 2005 23:52:41 -0700, Will will@glozer.net wrote:
CafeSpot came a problem in the URL-DECODE function of TBNL, it doesn't decode UTF-8 encoded URLs correctly. I see there was a thread on this in July, http://common-lisp.net/pipermail/tbnl-devel/2005-July/000358.html, but apparently no resolution. Enclosed is a new version of the function, I'm a lisp newbie so it may not be ideal =)
This particular function only works in Allegro, but it would work in any lisp that has a function to convert a UTF-8 encoded octet array to a string. I belive SBCL has a similar OCTETS-TO-STRING function, I didn't see anything really obvious for LispWorks though. At the moment I only have ACL.
(defun url-decode (string) (let ((string-length (length string))) (flet ((parse-hex-escape (start) (if (<= (+ start 3) string-length) (parse-integer string :start (+ start 1) :end (+ start 3) :radix 16) (error "invalid hex encoding in string '~A'" string)))) (let ((vector (make-array string-length :adjustable t :element-type '(unsigned-byte 8) :fill-pointer 0))) (loop for i below string-length for char = (aref string i) do (vector-push-extend (case char ((#+) (char-code #\Space)) ((#%) (parse-hex-escape (prog1 i (incf i 2)))) (otherwise (char-code char))) vector)) #+allegro (excl:octets-to-string vector :external-format :utf-8)))))
Thanks for that. I admit that the current version of URL-DECODE is not ideal but your version will break existing code. Note that browsers will use different URL encodings based on the charset of the HTML document they're responding to. For example, if the charset is ISO-8859-1 (which AFAIK is the default charset for Apache) the string "äöü" (that's umlaut a, umlaut o, umlaut u in case it doesn't make it through email) will be sent as
%E4%F6%FC
which the version of URL-DECODE above won't decode correctly - it'll expect
%C3%A4%C3%B6%C3%BC
instead. Unfortunately, the browsers don't tell you which charset they're using... :(
The right way to do it would be to add a second optional argument for the charset to URL-DECODE and make the default value user-configurable on a per-request basis. Does that sound OK? I'll probably add something like this in the next days.
Cheers, Edi.
PS: For LispWorks use EXTERNAL-FORMAT:DECODE-EXTERNAL-STRING and EXTERNAL-FORMAT:ENCODE-LISP-STRING but see the recent discussion on the LW mailing list w.r.t. delivered applications:
On Thu, 22 Sep 2005 14:30:55 +0200, Edi Weitz edi@agharta.de wrote:
The right way to do it would be to add a second optional argument for the charset to URL-DECODE and make the default value user-configurable on a per-request basis. Does that sound OK? I'll probably add something like this in the next days.
OK, I've done that now - version 0.8.0. Please try it out yourself. I've added some examples to test/test.lisp.
There's code in there for LispWorks, AllegroCL and Unicode-SBCL but I haven't tested with SBCL yet. Likewise, I only tested with the latest Firefox and IE on Windows.
Note that the part about "user-configurable on a per-request basis" is kind of hard becaue URL-DECODE is used when the REQUEST object is created, i.e. before the user's dispatchers and handlers are called. Thus the new function RECOMPUTE-REQUEST-PARAMETERS.
For some more info about HTML forms and character sets see this nice article:
http://ppewww.ph.gla.ac.uk/~flavell/charset/form-i18n.html
ChangeLog:
Version 0.8.0 2005-09-24 Added the ability to cope with different external formats (incorporating suggestions from Will Glozer and Ivan Shvedunov) Raw post data is now always saved (so *SAVE-RAW-POST-DATA-P* is gone)
Download:
http://weitz.de/files/tbnl.tar.gz
Cheers, Edi.
On 2005-09-24 01:43:40, Edi Weitz wrote:
On Thu, 22 Sep 2005 14:30:55 +0200, Edi Weitz edi@agharta.de wrote:
The right way to do it would be to add a second optional argument for the charset to URL-DECODE and make the default value user-configurable on a per-request basis. Does that sound OK? I'll probably add something like this in the next days.
OK, I've done that now - version 0.8.0. Please try it out yourself.
(CMUCL 19a) A project now emits me a "An error has occured" to the browser. I use URL-DECODE. TBNL:*ERROR* contains a SIMPLE-WARNING with the "Ignoring external format ..." warning.
On 2005-09-26 14:32:19, Stefan Scholl wrote:
On 2005-09-24 01:43:40, Edi Weitz wrote:
On Thu, 22 Sep 2005 14:30:55 +0200, Edi Weitz edi@agharta.de wrote:
The right way to do it would be to add a second optional argument for the charset to URL-DECODE and make the default value user-configurable on a per-request basis. Does that sound OK? I'll probably add something like this in the next days.
OK, I've done that now - version 0.8.0. Please try it out yourself.
(CMUCL 19a) A project now emits me a "An error has occured" to the browser. I use URL-DECODE. TBNL:*ERROR* contains a SIMPLE-WARNING with the "Ignoring external format ..." warning.
Please ignore for the moment. This stupid browser makes two requests: The page and a favicon.ico which isn't there. So the TBNL:*ERROR* doesn't correspond to the displayed error in the browser.
After a test with a text based browser I now have a UNDEFINED-FUNCTION in TBNL:*ERROR*. Most likely my own fault ...
On 2005-09-26 14:50:58, Stefan Scholl wrote:
On 2005-09-26 14:32:19, Stefan Scholl wrote:
On 2005-09-24 01:43:40, Edi Weitz wrote:
OK, I've done that now - version 0.8.0. Please try it out yourself.
(CMUCL 19a) A project now emits me a "An error has occured" to the browser. I use URL-DECODE. TBNL:*ERROR* contains a SIMPLE-WARNING with the "Ignoring external format ..." warning.
Please ignore for the moment. This stupid browser makes two requests: The page and a favicon.ico which isn't there. So the TBNL:*ERROR* doesn't correspond to the displayed error in the browser.
After a test with a text based browser I now have a UNDEFINED-FUNCTION in TBNL:*ERROR*. Most likely my own fault ...
Hmm!? Inspected error:
FUNCTION-NAME: KERNEL::UNDEFINED-SYMBOL-ERROR-HANDLER ACTUAL-INITARGS: (:FUNCTION-NAME "<error finding name>" :NAME TBNL:URL-ENCODE) ASSIGNED-SLOTS: (CONDITIONS::NAME TBNL:URL-ENCODE)
TBNL:URL-ENCODE is gone? I see it's shadowed in packages.lisp ... Yes, gone.
Error in KERNEL:%COERCE-TO-FUNCTION: the function TBNL:URL-ENCODE is undefined. [Condition of type UNDEFINED-FUNCTION]
On Mon, 26 Sep 2005 14:56:30 +0200, Stefan Scholl stesch@no-spoon.de wrote:
TBNL:URL-ENCODE is gone? I see it's shadowed in packages.lisp ... Yes, gone.
No, it isn't. The version from URL-REWRITE is shadowed in favor of a new definition which is exported from the TBNL package:
http://weitz.de/tbnl/#url-encode
Cheers, Edi.
On 2005-09-26 15:13:28, Edi Weitz wrote:
On Mon, 26 Sep 2005 14:56:30 +0200, Stefan Scholl stesch@no-spoon.de wrote:
TBNL:URL-ENCODE is gone? I see it's shadowed in packages.lisp ... Yes, gone.
No, it isn't. The version from URL-REWRITE is shadowed in favor of a new definition which is exported from the TBNL package:
OK, then maybe DEFPACKAGE in CMUCL 19a is broken. I have no TBNL:URL-ENCODE. I can access URL-REWRITE:URL-ENCODE for the moment.
|sigh|
I'll have to make some tests in CMUCL for a bug report ...
On Mon, 26 Sep 2005 15:27:40 +0200, Stefan Scholl stesch@no-spoon.de wrote:
OK, then maybe DEFPACKAGE in CMUCL 19a is broken.
Works in 19b:
* (lisp-implementation-version)
"CVS 19b 19b-release-20050628-3 + minimal debian patches (19B)" * (asdf:oos 'asdf:load-op :tbnl)
; loading system definition from /usr/local/lisp/Registry/tbnl.asd into ; #<The ASDF1521 package> ; registering #<SYSTEM TBNL {580D405D}> as TBNL ; loading system definition from /usr/local/lisp/Registry/url-rewrite.asd into ; #<The ASDF1524 package> ; registering #<SYSTEM URL-REWRITE {580FDEED}> as URL-REWRITE ; loading system definition from /usr/local/lisp/Registry/rfc2388.asd into ; #<The ASDF1526 package> ; registering #<SYSTEM :RFC2388 {581276ED}> as RFC2388 ; loading system definition from /usr/local/lisp/Registry/kmrcl.asd into ; #<The ASDF1528 package> ; registering #<SYSTEM KMRCL {58147D25}> as KMRCL ; Compiling LAMBDA (.PV-CELL. .NEXT-METHOD-CALL. O C): ; Compiling Top-Level Form: ; loading system definition from /usr/local/lisp/Registry/cl-ppcre.asd into ; #<The ASDF1569 package> ; registering #<SYSTEM #:CL-PPCRE {58323955}> as CL-PPCRE ; loading system definition from /usr/local/lisp/Registry/cl-base64.asd into ; #<The ASDF1571 package> ; registering #<SYSTEM CL-BASE64 {58352EDD}> as CL-BASE64 ; Compiling LAMBDA (.PV-CELL. .NEXT-METHOD-CALL. O C): ; Compiling Top-Level Form: ; registering #<SYSTEM CL-BASE64-TESTS {584E733D}> as CL-BASE64-TESTS ; Compiling LAMBDA (.PV-CELL. .NEXT-METHOD-CALL. O C): ; Compiling Top-Level Form: ; loading system definition from /usr/local/lisp/Registry/md5.asd into ; #<The ASDF1633 package> ; registering #<SYSTEM MD5 {58612975}> as MD5 NIL * (find-symbol "URL-ENCODE" :tbnl)
TBNL:URL-ENCODE :EXTERNAL
I have no TBNL:URL-ENCODE. I can access URL-REWRITE:URL-ENCODE for the moment.
|sigh|
I'll have to make some tests in CMUCL for a bug report ...
19b is current. I guess they won't fix old versions.
On 2005-09-26 15:37:36, Edi Weitz wrote:
On Mon, 26 Sep 2005 15:27:40 +0200, Stefan Scholl stesch@no-spoon.de wrote:
OK, then maybe DEFPACKAGE in CMUCL 19a is broken.
Works in 19b:
Nice. Then I don't need to write a bug report.
For the moment I just use URL-REWRITE:URL-ENCODE until the next release.
"CVS 19b 19b-release-20050628-3 + minimal debian patches (19B)"
[...]
19b is current. I guess they won't fix old versions.
I think 19b isn't that current. There was some misunderstanding and trouble regarding this release. But a 19c is on its way.
On Mon, 26 Sep 2005 15:27:40 +0200, Stefan Scholl stesch@no-spoon.de wrote:
OK, then maybe DEFPACKAGE in CMUCL 19a is broken.
Looks OK here:
* (lisp-implementation-version)
"CVS release-19a 19a-release-20040728 + minimal debian patches" * (asdf:oos 'asdf:load-op :tbnl)
; loading system definition from /home/edi/lisp/tbnl-0.8.0/tbnl.asd into ; #<The ASDF1503 package> ; registering #<SYSTEM TBNL {5803C755}> as TBNL ; loading system definition from /home/edi/lisp/url-rewrite/url-rewrite.asd ; into #<The ASDF1506 package> ; registering #<SYSTEM URL-REWRITE {5805FE65}> as URL-REWRITE ; loading system definition from /home/edi/lisp/rfc2388/rfc2388.asd into ; #<The ASDF1508 package> ; registering #<SYSTEM :RFC2388 {5808442D}> as RFC2388 ; loading system definition from /home/edi/lisp/kmrcl-1.84/kmrcl.asd into ; #<The ASDF1510 package> ; registering #<SYSTEM KMRCL {5809FC85}> as KMRCL ; Compiling LAMBDA (PCL::.PV-CELL. PCL::.NEXT-METHOD-CALL. O C): ; Compiling Top-Level Form: ; loading system definition from /home/edi/lisp/cl-ppcre-1.2.11/cl-ppcre.asd ; into #<The ASDF1551 package> ; registering #<SYSTEM #:CL-PPCRE {582706CD}> as CL-PPCRE ; loading system definition from /home/edi/lisp/cl-base64-3.3.1/cl-base64.asd ; into #<The ASDF1553 package> ; registering #<SYSTEM CL-BASE64 {5829C365}> as CL-BASE64 ; Compiling LAMBDA (PCL::.PV-CELL. PCL::.NEXT-METHOD-CALL. O C): ; Compiling Top-Level Form: ; registering #<SYSTEM CL-BASE64-TESTS {5842CCCD}> as CL-BASE64-TESTS ; Compiling LAMBDA (PCL::.PV-CELL. PCL::.NEXT-METHOD-CALL. O C): ; Compiling Top-Level Form: ; loading system definition from /home/edi/lisp/md5-1.8.5/md5.asd into ; #<The ASDF1615 package> ; registering #<SYSTEM MD5 {5855077D}> as MD5 NIL * (find-symbol "URL-ENCODE" :tbnl)
TBNL:URL-ENCODE :EXTERNAL
On 2005-09-26 15:52:17, Edi Weitz wrote:
On Mon, 26 Sep 2005 15:27:40 +0200, Stefan Scholl stesch@no-spoon.de wrote:
OK, then maybe DEFPACKAGE in CMUCL 19a is broken.
Looks OK here:
- (lisp-implementation-version)
"CVS release-19a 19a-release-20040728 + minimal debian patches"
- (asdf:oos 'asdf:load-op :tbnl)
SIGH!
OK, I've just deleted my test directory ... Now I have to do it again. I use the vanilla CMUCL without the "minimal debian patches".
On Mon, 26 Sep 2005 15:59:54 +0200, Stefan Scholl stesch@no-spoon.de wrote:
SIGH!
OK, I've just deleted my test directory ... Now I have to do it again. I use the vanilla CMUCL without the "minimal debian patches".
I don't know what those patches do but I don't expect them to fix upstream bugs.
On 2005-09-26 15:52:17, Edi Weitz wrote:
- (find-symbol "URL-ENCODE" :tbnl)
TBNL:URL-ENCODE :EXTERNAL
By the way: I can find the symbol that way, too. But not call it.
CL-USER> (find-symbol "URL-ENCODE" :tbnl) TBNL:URL-ENCODE :EXTERNAL
CL-USER> (tbnl:url-encode "Schlumpf")
==> Error in KERNEL:%COERCE-TO-FUNCTION: the function TBNL:URL-ENCODE is undefined.
On Mon, 26 Sep 2005 16:38:35 +0200, Stefan Scholl stesch@no-spoon.de wrote:
By the way: I can find the symbol that way, too. But not call it.
CL-USER> (find-symbol "URL-ENCODE" :tbnl) TBNL:URL-ENCODE :EXTERNAL
CL-USER> (tbnl:url-encode "Schlumpf")
==> Error in KERNEL:%COERCE-TO-FUNCTION: the function TBNL:URL-ENCODE is undefined.
Hmm...
* (lisp-implementation-version)
"CVS release-19a 19a-release-20040728 + minimal debian patches" * (asdf:oos 'asdf:load-op :tbnl)
; loading system definition from /home/edi/lisp/tbnl-0.8.0/tbnl.asd into ; #<The ASDF1503 package> ; registering #<SYSTEM TBNL {5803C755}> as TBNL ; loading system definition from /home/edi/lisp/url-rewrite/url-rewrite.asd ; into #<The ASDF1506 package> ; registering #<SYSTEM URL-REWRITE {5805FE25}> as URL-REWRITE ; loading system definition from /home/edi/lisp/rfc2388/rfc2388.asd into ; #<The ASDF1508 package> ; registering #<SYSTEM :RFC2388 {5808442D}> as RFC2388 ; loading system definition from /home/edi/lisp/kmrcl-1.84/kmrcl.asd into ; #<The ASDF1510 package> ; registering #<SYSTEM KMRCL {5809FC85}> as KMRCL ; Compiling LAMBDA (PCL::.PV-CELL. PCL::.NEXT-METHOD-CALL. O C): ; Compiling Top-Level Form: ; loading system definition from /home/edi/lisp/cl-ppcre-1.2.11/cl-ppcre.asd ; into #<The ASDF1551 package> ; registering #<SYSTEM #:CL-PPCRE {582706CD}> as CL-PPCRE ; loading system definition from /home/edi/lisp/cl-base64-3.3.1/cl-base64.asd ; into #<The ASDF1553 package> ; registering #<SYSTEM CL-BASE64 {5829C365}> as CL-BASE64 ; Compiling LAMBDA (PCL::.PV-CELL. PCL::.NEXT-METHOD-CALL. O C): ; Compiling Top-Level Form: ; registering #<SYSTEM CL-BASE64-TESTS {5842CCCD}> as CL-BASE64-TESTS ; Compiling LAMBDA (PCL::.PV-CELL. PCL::.NEXT-METHOD-CALL. O C): ; Compiling Top-Level Form: ; loading system definition from /home/edi/lisp/md5-1.8.5/md5.asd into ; #<The ASDF1615 package> ; registering #<SYSTEM MD5 {5855077D}> as MD5 NIL * (tbnl:url-encode "Schlumpf")
"Schlumpf" * (describe 'tbnl:url-encode)
URL-ENCODE is an external symbol in the TBNL package. Function: #<Function TBNL:URL-ENCODE {58B2C2C9}> Function arguments: (string &optional (external-format *tbnl-default-external-format*)) Function documentation: URL-encodes a string using the external format EXTERNAL-FORMAT. Its defined argument types are: (T &OPTIONAL T) Its result type is: SIMPLE-BASE-STRING On Monday, 9/26/05 03:50:17 pm [-2] it was compiled from: /home/edi/lisp/tbnl-0.8.0/util.lisp Created: Saturday, 9/24/05 01:33:39 am [-2]
Works on 19b as well.
On 2005-09-26 16:45:09, Edi Weitz wrote:
- (tbnl:url-encode "Schlumpf")
"Schlumpf"
I hate computers. After deleting all *.x86f from the tbnl directory every thing works fine.
TBNL:URL-ENCODE is found.
No, I haven't upgraded CMUCL between TBNL-Installs.
Always a little voodoo ...
On Mon, 26 Sep 2005 14:32:19 +0200, Stefan Scholl stesch@no-spoon.de wrote:
A project now emits me a "An error has occured" to the browser.
On your development machine you should set *SHOW-LISP-ERRORS-P* and *SHOW-LISP-BACKTRACES-P* to true for easier debugging.