On Mon, 29 Jan 2007 18:20:17 -0800, Chris Dean ctdean@sokitomi.com wrote:
The problem is that I regularly download web pages and many of them are poorly formed. I'd like my software to be permissive and return something reasonable.
Sure, I agree.
Drakma is nicely designed and I'd like to keep using it. If I were to add this "feature" of less-strict UTF-8 where should I do that?
I could modify (define-char-reader (stream flexi-utf-8-input-stream) ...) in some clever way I suppose.
My hope is that FLEXI-STREAMS is already "flexible" enough to deal with this:
CL-USER 22 > (drakma:http-request "http://zappa.agharta.de/test.html")
Error: Unexpected value #xF6 in UTF-8 sequence. 1 (abort) Return to level 0. 2 Return to top loop level 0.
Type :b for backtrace, :c <option number> to proceed, or :? for other options
CL-USER 23 : 1 > :a
CL-USER 24 > (defun use-replacement-char (condition) (declare (ignore condition)) (use-value #.(code-char 65533))) USE-REPLACEMENT-CHAR
CL-USER 25 > (let ((flex:*provide-use-value-restart* t)) (handler-bind ((flex:flexi-stream-encoding-error #'use-replacement-char)) (drakma:http-request "http://zappa.agharta.de/test.html"))) "<html> <body> This is not really UTF-8: �� </body> </html> " 200 ((:DATE . "Tue, 30 Jan 2007 07:47:59 GMT") (:SERVER . "Apache") (:CONNECTION . "close") (:TRANSFER-ENCODING . "chunked") (:CONTENT-TYPE . "text/html; charset=utf-8")) #<URI http://zappa.agharta.de/test.html%3E #<FLEXI-STREAMS::FLEXI-BINARY-UTF-8-IO-STREAM 226B80FB> T
CL-USER 26 > (let ((flex:*provide-use-value-restart* t) (flex:*substitution-char* #?)) (drakma:http-request "http://zappa.agharta.de/test.html")) "<html> <body> This is not really UTF-8: ?? </body> </html> " 200 ((:DATE . "Tue, 30 Jan 2007 07:50:30 GMT") (:SERVER . "Apache") (:CONNECTION . "close") (:TRANSFER-ENCODING . "chunked") (:CONTENT-TYPE . "text/html; charset=utf-8")) #<URI http://zappa.agharta.de/test.html%3E #<FLEXI-STREAMS::FLEXI-BINARY-UTF-8-IO-STREAM 2263F957> T
http://weitz.de/flexi-streams/#*provide-use-value-restart* http://weitz.de/flexi-streams/#*substitution-char*
Does that help?
Cheers, Edi.