Jeff,
you can use the :FORCE-BINARY keyword argument to have DRAKMA return the octets constituting the response, and then call FLEXI-STREAMS:OCTETS-TO-STRING with an explicit external format to force decoding using a particular external format, like so:
(flexi-streams:octets-to-string (drakma:http-request "http://www.walmart.com" :force-binary t) :external-format :ascii)
HTH, Hans
On Mon, Sep 24, 2012 at 7:01 PM, Jeff Cunningham jeffrey@jkcunningham.com wrote:
I've been running into some trouble using drakma to retrieve pages from certain commercial websites. It is very likely the HTML they are generating is broken one way or another. But the problem still remains as to how one can retrieve their pages using drakma.
For example, if you try this simple case:
(http-request "http://www.walmart.com")
It will display the following:
WARNING: Problems determining charset (falling back to binary): Corrupted Content-Type header: Read character #;, but expected #=.
And the returned body is binary-encoded ascii. This can be converted to real ascii, of course, but it is inconvenient to say the least.
Often the problem is that their metatag for the charset is simply wrong. Sometimes I can figure out what it is and supply this information, like this:
(http-request "http://www.walmart.com" :external-format-in :UTF-8)
and it will solve he problem. But this particular example does not lend itself to this, at least using the following charsets:
:UTF-8 :UTF-7 :iso-8859-1 :iso-8859-2 :iso-8859-3 :iso-8859-4 :iso-8859-5 :iso-8859-6 :iso-8859-7 :iso-8859-8 :iso-8859-9 :BIG5 :US-ASCII :UTF-16 :UTF-32
I have no idea what their server is actually sending - it appears to be invalid for any of these charsets.
Is there any way to get around this problem?
Best regards, Jeff Cunningham
drakma-devel mailing list drakma-devel@common-lisp.net http://lists.common-lisp.net/cgi-bin/mailman/listinfo/drakma-devel