Hi!
I have a problem with drakma and character encoding. My goal is to make a small web utility which first GETs some content from a certain page on my wiki, then optionally adds prepends stuff to this content, and then POST the new content back. It works as long as only ASCII characters are involved but fails when I use characters from the higher part of Latin-1, in my case the Swedish character "ä".
Below I have simplified the code so that it just GETs the old content and POST it back. What one sees is that the Swedish character is correctly handled in the GET, I see the "ä" in it's full glory, but after having posted it back, the content gets corrupt and the next time I run the function I get an error because of the strange character.
Ok, enough blabbing, here is the code:
;;;;; code starts here
(defvar *boundary* "-------------------------1852275791466338532535335716")
(defconstant +crlf+ #.(format nil "~C~C" #\Return #\Linefeed))
(defun format-field (name value) (format nil "--~a~aContent-Disposition: form-data; name="~a"~a~a~a~a" *boundary* +crlf+ name +crlf+ +crlf+ value +crlf+))
(defun foo () (let* ((old-content (drakma:http-request
"http://klibb.com/cgi-bin/wiki.pl?action=browse;id=2007-05-31;raw=1")) (cookie-jar (make-instance 'drakma:cookie-jar)) (new-content (concatenate 'string (format-field "title" "2007-05-31") (format-field "text" old-content) (format-field "recent_edit" "on") (format-field "username" "MathiasDahl") "--" *boundary* "--" +crlf+))) (format t "Old content: ~a" old-content) (setf (drakma:cookie-jar-cookies cookie-jar) (list (make-instance 'drakma:cookie :name "pwd" :value "editeramera" :expires (+ (get-universal-time) 36000) :domain "klibb.com"))) (format t "New content: ~a" new-content) (drakma:http-request "http://klibb.com/cgi-bin/wiki.pl" :method :post :cookie-jar cookie-jar :content-type (format nil "multipart/form-data; boundary=~a" *boundary*) :content new-content)))
;;;;; code ends here
Again, the code is simplified, some parts are hardcoded etc, but the above is enough to recreate the problem. Note that after running the code one time, you cannot test it again, because the content on the page is now changed.
Here is what I get after running the function the first time:
==== * (foo) Old content: blä New content: ---------------------------1852275791466338532535335716 Content-Disposition: form-data; name="title"
2007-05-31 ---------------------------1852275791466338532535335716 Content-Disposition: form-data; name="text"
blä
---------------------------1852275791466338532535335716 Content-Disposition: form-data; name="recent_edit"
on ---------------------------1852275791466338532535335716 Content-Disposition: form-data; name="username"
MathiasDahl ---------------------------1852275791466338532535335716-- NIL 302 ((:DATE . "Sat, 02 Jun 2007 09:30:53 GMT") (:SERVER . "Apache/2.2.3 (Mandriva Linux/PREFORK-1mdv2007.0)") (:SET-COOKIE . "MuuWiki=username%1EMathiasDahl; path=/; expires=Mon, 01-Jun-2009 09:30:53 GMT") (:LOCATION . "http://klibb.com/cgi-bin/wiki.pl/2007-05-31") (:CONTENT-LENGTH . "0") (:CONNECTION . "close") (:CONTENT-TYPE . "application/x-perl")) #<PURI:URI http://klibb.com/cgi-bin/wiki.pl%3E #<FLEXI-STREAMS:FLEXI-IO-STREAM {C5728E1}> T ====
As you can see, all looks well; the old content ("blä") looks like it should, and the new content looks the same (it's the data in the form field "text"). However, when I now run the function again, I get this:
==== * (foo)
debugger invoked on a FLEXI-STREAMS:FLEXI-STREAM-ENCODING-ERROR in thread #<THREAD "initial thread" {AC14469}>: Unexpected value #xA in UTF-8 sequence.
Type HELP for debugger help, or (SB-EXT:QUIT) to exit from SBCL.
restarts (invokable by number or by possibly-abbreviated name): 0: [USE-VALUE] Specify a character to be used instead. 1: [ABORT ] Exit debugger, returning to top level.
(FLEXI-STREAMS::SIGNAL-ENCODING-ERROR #<FLEXI-STREAMS:FLEXI-IO-STREAM {C5B37D1}> "Unexpected value #x~X in UTF-8 sequence." 10) ====
It fails because that "ä" is now something else.
When I do the same thing from a browser, i.e. POST the page again and again, I don't see any problems. I have done some network sniffing with Wireshar and what I can see is that when the browser POSTs the content, the "ä" is correctly encoded in UTF-8 as xC3 xA4. In the POST done by drakma, the character is encoded xE4 (which IS the unicode code point, but not encoded as UTF-8 if I understand things correctly).
At first I tried to include the encoding in Content-Type, but when I saw that it did not do any difference and also saw that Firefox does not include this, I removed it. Oh, and I should show this as well:
* (sb-impl::default-external-format)
:UTF-8
Just so that we are clear that I DO see the content correctly and UTF-8 is used.
I also tried with a version where I even hardcoded the content to be sent to be "blä", and that gives the same problem. Maybe I should have shortened the code above to that, but what I wanted to show was that the same content I can GET nicely enough cannot be POSTed without problems.
Any ideas on how I can continue debugging this? I feel kinda lost. It feels frustrating to get stuck on a problem like this when I have got the other logic to work, GETing and POSTing and stuff...
I am running this in SBCL 1.0 under Mandriva GNU/Linux.
Thanks!
/Mathias
Hi Mathias, try to use (http-request ... :external-format-out :UTF-8 ...)
See http://weitz.de/drakma/#external-format-out
Best regards, -Anton
On Sat, 2 Jun 2007 11:44:04 +0200, "Mathias Dahl" mathias.dahl@gmail.com wrote:
(defvar *boundary* "-------------------------1852275791466338532535335716")
(defconstant +crlf+ #.(format nil "~C~C" #\Return #\Linefeed))
(defun format-field (name value) (format nil "--~a~aContent-Disposition: form-data; name="~a"~a~a~a~a" *boundary* +crlf+ name +crlf+ +crlf+ value +crlf+))
Any reason you're doing all this instead of just using :FORM-DATA?
(defun format-field (name value) (format nil "--~a~aContent-Disposition: form-data; name="~a"~a~a~a~a" *boundary* +crlf+ name +crlf+ +crlf+ value +crlf+))
Any reason you're doing all this instead of just using :FORM-DATA?
Yes, but not a very good one; it is a combination of old code, I previously used S-HTTP-CLIENT but got problems so I switched to DRAKMA and I tried to leave the code as intact as possible, not looking into all nice features that DRAKMA might have. I will try that and see how that works for me.
Thanks for a great package! It feels nice to be able to do stuff in CL, although I progress quite slowly... :)
/Mathias
(defun format-field (name value) (format nil "--~a~aContent-Disposition: form-data; name="~a"~a~a~a~a" *boundary* +crlf+ name +crlf+ +crlf+ value +crlf+))
Any reason you're doing all this instead of just using :FORM-DATA?
I have now tested this and it works like a charm, so I don't need my hacks anymore. Yay! :)
I first used the :PARAMETERS keyword without specifying the :FORM-DATA keyword, and that works as well as providing it. Are there any drawbacks of NOT using :FORM-DATA when just sending form fields (i.e. no files)? Note: one of the form fields can be quite large (maybe up to a thousand characters) sometimes.
Thanks!
/Mathias
On Mon, 4 Jun 2007 22:46:33 +0200, "Mathias Dahl" mathias.dahl@gmail.com wrote:
I have now tested this and it works like a charm, so I don't need my hacks anymore. Yay! :)
Good... :)
I first used the :PARAMETERS keyword without specifying the :FORM-DATA keyword, and that works as well as providing it. Are there any drawbacks of NOT using :FORM-DATA when just sending form fields (i.e. no files)? Note: one of the form fields can be quite large (maybe up to a thousand characters) sometimes.
AFAIK there aren't if you're /not/ sending files. The size of the request body shouldn't make a difference as it's a POST request anyway.
The only other difference that comes to mind right now is that with multipart/form-data you can potentially use different encodings for different fields. I never did that, though, and I wouldn't know why one should do it (and if the receiving end can cope with it).
Oh, one other difference that just occurs to me is that for large fields Drakma might be a tad faster if you're using :FORM-DATA. But you should only care about this if you really have performance problems.