Hi,
I'm stuck with a problem: I'm using CL-ZMQ, that in turn uses CFFI, that in turn uses BABEL for such tasks as FOREIGN-STRING-TO-LISP conversion. There seams to be a problem with 0 (#\Nul) characters for such strings, which can be seen below:
Illegal :UTF-8 character starting at position 328. [Condition of type BABEL-ENCODINGS:INVALID-UTF8-CONTINUATION-BYTE]
Restarts: ...
Backtrace: 0: ((LAMBDA (BABEL-ENCODINGS::SRC BABEL-ENCODINGS::START BABEL-ENCODINGS::END BABEL-ENCODINGS::DEST BABEL-ENCODINGS::D-START)) ..) 1: (CFFI:FOREIGN-STRING-TO-LISP #.(SB-SYS:INT-SAP #X0808E13C))[:EXTERNAL] ...
The translated string in the current example is this: #(#\5 #\4 #\c #\6 #\7 #\5 #\5 #\b #- #\9 #\6 #\2 #\8 #- #\4 #\0 #\a #\4 #- #\9 #\a #\2 #\d #- #\c #\c #\8 #\2 #\a #\8 #\1 #\6 #\3 #\4 #\5 #\e #\ #\1 #\8 #\ #/ #\h #\a #\n #\d #\l #\e #\r #\t #\e #\s #\t #\ #\2 #\6 #\0 #\Space #{ #" #\P #\A #\T #\H #" #\Space #" #/ #\h #\a #\n #\d #\l #\e #\r #\t #\e #\s #\t #" #, #" #\M #\E #\T #\H #\O #\D #" #\Space #" #\G #\E #\T #" #, #" #\V #\E #\R #\S #\I #\O #\N #" #\Space #" #\H #\T #\T #\P #/ #\1 #. #\1 #" #, #" #\U #\R #\I #" #\Space #" #/ #\h #\a #\n #\d #\l #\e #\r #\t #\e #\s #\t #" #, #" #\P #\A #\T #\T #\E #\R #\N #" #\Space #" #/ #\h #\a #\n #\d #\l #\e #\r #\t #\e #\s #\t #" #, #" #\A #\c #\c #\e #\p #\t #" #\Space #" #* #/ #* #" #, #" #\H #\o #\s #\t #" #\Space #" #\l #\o #\c #\a #\l #\h #\o #\s #\t #\Space #\6 #\7 #\6 #\7 #" #, #" #\U #\s #\e #\r #- #\A #\g #\e #\n #\t #" #\Space #" #\c #\u #\r #\l #/ #\7 #. #\2 #\0 #. #\0 #\ #( #\i #\4 #\8 #\6 #- #\p #\c #- #\l #\i #\n #\u #\x #- #\g #\n #\u #) #\ #\l #\i #\b #\c #\u #\r #\l #/ #\7 #. #\2 #\0 #. #\0 #\ #\O #\p #\e #\n #\S #\S #\L #/ #\0 #. #\9 #. #\8 #\n #\ #\z #\l #\i #\b #/ #\1 #. #\2 #. #\3 #. #\4 #\ #\l #\i #\b #\i #\d #\n #/ #\1 #. #\1 #\5 #\ #\l #\i #\b #\s #\s #\h #\2 #/ #\1 #. #\2 #. #\4 #" #} #, #\0 #\Space #, #\n #\S #\S #\L #/ #\0 #. #\Nul #\Nul)
Maybe, someone here can explain, why this 0-characters are not recognized as proper utf-8 ones?
Thanks! Vsevolod
On Wed, Aug 4, 2010 at 3:07 PM, Vsevolod Dyomkin vseloved@gmail.com wrote:
Maybe, someone here can explain, why this 0-characters are not recognized as proper utf-8 ones?
Seems to work for me. Can you come up with a short reproducible example?
CL-USER> (defparameter *array* #(#\5 #\4 #\c #\6 #\7 #\5 #\5 #\b #- #\9 #\6 #\2 #\8 #- #\4 #\0 #\a #\4 #- #\9 #\a #\2 #\d #- #\c #\c #\8 #\2 #\a #\8 #\1 #\6 #\3 #\4 #\5 #\e #\ #\1 #\8 #\ #/ #\h #\a #\n #\d #\l #\e #\r #\t #\e #\s #\t #\ #\2 #\6 #\0 #\Space #{ #" #\P #\A #\T #\H #" #\Space #" #/ #\h #\a #\n #\d #\l #\e #\r #\t #\e #\s #\t #" #, #" #\M #\E #\T #\H #\O #\D #" #\Space #" #\G #\E #\T #" #, #" #\V #\E #\R #\S #\I #\O #\N #" #\Space #" #\H #\T #\T #\P #/ #\1 #. #\1 #" #, #" #\U #\R #\I #" #\Space #" #/ #\h #\a #\n #\d #\l #\e #\r #\t #\e #\s #\t #" #, #" #\P #\A #\T #\T #\E #\R #\N #" #\Space #" #/ #\h #\a #\n #\d #\l #\e #\r #\t #\e #\s #\t #" #, #" #\A #\c #\c #\e #\p #\t #" #\Space #" #* #/ #* #" #, #" #\H #\o #\s #\t #" #\Space #" #\l #\o #\c #\a #\l #\h #\o #\s #\t #\Space #\6 #\7 #\6 #\7 #" #, #" #\U #\s #\e #\r #- #\A #\g #\e #\n #\t #" #\Space #" #\c #\u #\r #\l #/ #\7 #. #\2 #\0 #. #\0 #\ #( #\i #\4 #\8 #\6 #- #\p #\c #- #\l #\i #\n #\u #\x #- #\g #\n #\u #) #\ #\l #\i #\b #\c #\u #\r #\l #/ #\7 #. #\2 #\0 #. #\0 #\ #\O #\p #\e #\n #\S #\S #\L #/ #\0 #. #\9 #. #\8 #\n #\ #\z #\l #\i #\b #/ #\1 #. #\2 #. #\3 #. #\4 #\ #\l #\i #\b #\i #\d #\n #/ #\1 #. #\1 #\5 #\ #\l #\i #\b #\s #\s #\h #\2 #/ #\1 #. #\2 #. #\4 #" #} #, #\0 #\Space #, #\n #\S #\S #\L #/ #\0 #. #\Nul #\Nul)) *ARRAY* CL-USER> (cffi:with-foreign-string (fs (coerce *array* 'string) :encoding :utf-8) (cffi:foreign-string-to-lisp fs :encoding :utf-8)) "54c6755b-9628-40a4-9a2d-cc82a816345e 18 /handlertest 260 {"PATH" "/handlertest","METHOD" "GET","VERSION" "HTTP/1.1","URI" "/handlertest","PATTERN" "/handlertest","Accept" "*/*","Host" "localhost 6767","User-Agent" "curl/7.20.0 (i486-pc-linux-gnu) libcurl/7.20.0 OpenSSL/0.9.8n zlib/1.2.3.4 libidn/1.15 libssh2/1.2.4"},0 ,nSSL/0." 328
Thanks,
Luís, thanks for the answer!
The issue is connected with my recent experiment with creating CL bindings to the upcoming mongrel2 web-server. And it arises only sometimes.
You can see the initial variant at http://github.com/vseloved/cl-mongrel2. If you are willing to dive in and spend some time, try to run the example code in http://github.com/vseloved/cl-mongrel2/blob/master/example.lisp
It will also require you to install and run mongrel2 itself (see http://mongrel2.org/doc/tip/docs/manual/book.wiki for details), which will in turn require to setup a working Python environment (if you don't have one already, obviously). All the other instructions are in example.lisp. If something is unclear, feel free to write me.
It's also worth mentioning, that I'm using babel-0.3.
Looking forward for the results, Vsevolod
On Wed, Aug 4, 2010 at 10:02 PM, Luís Oliveira luismbo@gmail.com wrote:
On Wed, Aug 4, 2010 at 3:07 PM, Vsevolod Dyomkin vseloved@gmail.com wrote:
Maybe, someone here can explain, why this 0-characters are not recognized
as
proper utf-8 ones?
Seems to work for me. Can you come up with a short reproducible example?
CL-USER> (defparameter *array* #(#\5 #\4 #\c #\6 #\7 #\5 #\5 #\b #- #\9 #\6 #\2 #\8 #- #\4 #\0 #\a #\4 #- #\9 #\a #\2 #\d #- #\c #\c #\8 #\2 #\a #\8 #\1 #\6 #\3 #\4 #\5 #\e #\ #\1 #\8 #\ #/ #\h #\a #\n #\d #\l #\e #\r #\t #\e #\s #\t #\ #\2 #\6 #\0 #\Space #{ #" #\P #\A #\T #\H #" #\Space #" #/ #\h #\a #\n #\d #\l #\e #\r #\t #\e #\s #\t #" #, #" #\M #\E #\T #\H #\O #\D #" #\Space #" #\G #\E #\T #" #, #" #\V #\E #\R #\S #\I #\O #\N #" #\Space #" #\H #\T #\T #\P #/ #\1 #. #\1 #" #, #" #\U #\R #\I #" #\Space #" #/ #\h #\a #\n #\d #\l #\e #\r #\t #\e #\s #\t #" #, #" #\P #\A #\T #\T #\E #\R #\N #" #\Space #" #/ #\h #\a #\n #\d #\l #\e #\r #\t #\e #\s #\t #" #, #" #\A #\c #\c #\e #\p #\t #" #\Space #" #* #/ #* #" #, #" #\H #\o #\s #\t #" #\Space #" #\l #\o #\c #\a #\l #\h #\o #\s #\t #\Space #\6 #\7 #\6 #\7 #" #, #" #\U #\s #\e #\r #- #\A #\g #\e #\n #\t #" #\Space #" #\c #\u #\r #\l #/ #\7 #. #\2 #\0 #. #\0 #\ #( #\i #\4 #\8 #\6 #- #\p #\c #- #\l #\i #\n #\u #\x #- #\g #\n #\u #) #\ #\l #\i #\b #\c #\u #\r #\l #/ #\7 #. #\2 #\0 #. #\0 #\ #\O #\p #\e #\n #\S #\S #\L #/ #\0 #. #\9 #. #\8 #\n #\ #\z #\l #\i #\b #/ #\1 #. #\2 #. #\3 #. #\4 #\ #\l #\i #\b #\i #\d #\n #/ #\1 #. #\1 #\5 #\ #\l #\i #\b #\s #\s #\h #\2 #/ #\1 #. #\2 #. #\4 #" #} #, #\0 #\Space #, #\n #\S #\S #\L #/ #\0 #. #\Nul #\Nul)) *ARRAY* CL-USER> (cffi:with-foreign-string (fs (coerce *array* 'string) :encoding :utf-8) (cffi:foreign-string-to-lisp fs :encoding :utf-8)) "54c6755b-9628-40a4-9a2d-cc82a816345e 18 /handlertest 260 {"PATH" "/handlertest","METHOD" "GET","VERSION" "HTTP/1.1","URI" "/handlertest","PATTERN" "/handlertest","Accept" "*/*","Host" "localhost 6767","User-Agent" "curl/7.20.0 (i486-pc-linux-gnu) libcurl/7.20.0 OpenSSL/0.9.8n zlib/1.2.3.4 libidn/1.15 libssh2/1.2.4"},0 ,nSSL/0." 328
Thanks,
-- Luís Oliveira http://r42.eu/~luis/ http://r42.eu/%7Eluis/
Hello Vsevolod,
On Wed, Aug 4, 2010 at 9:42 PM, Vsevolod Dyomkin vseloved@gmail.com wrote:
It's also worth mentioning, that I'm using babel-0.3.
Perhaps you should try with the development versions of babel and CFFI. Let me know if that helps.
Cheers,
Hello Luís,
after some more examination I've discovered, that the error really was not connected with Babel, but rather with some "impedance mismatch" between CFFI and ZMQ: CFFI currently is able to support only null-terminated strings, while ZMQ operates in blobs, so TRANSLATE-FROM-FOREIGN was fed with the data, it was not ready to handle.
I've prepared a patch to CFFI, that can handle this situation (additional TRANSLATE- method for string blobs of known size) and will soon send it, if I no better solution will be found.
Best regards, Vsevolod
On Thu, Aug 5, 2010 at 12:35 AM, Luís Oliveira luismbo@gmail.com wrote:
Hello Vsevolod,
On Wed, Aug 4, 2010 at 9:42 PM, Vsevolod Dyomkin vseloved@gmail.com wrote:
It's also worth mentioning, that I'm using babel-0.3.
Perhaps you should try with the development versions of babel and CFFI. Let me know if that helps.
Cheers,
-- Luís Oliveira http://r42.eu/~luis/ http://r42.eu/%7Eluis/
On Fri, Aug 6, 2010 at 12:58 PM, Vsevolod Dyomkin vseloved@gmail.com wrote:
I've prepared a patch to CFFI, that can handle this situation (additional TRANSLATE- method for string blobs of known size) and will soon send it, if I no better solution will be found.
Cool. I think you can support that through a :size argument for the :string type. Let me know if you need help implementing such a thing.
Actually, the string type doesn't have such a slot now (and that was kind of what I thought to add). But the more simple solution (not involving the need to patch CFFI) was just to use (foreign-string-to-lisp data :count size), which is also used internally by translate-from-foreign (but without the count parameter).
Thanks! Vsevolod
On Sat, Aug 7, 2010 at 9:11 AM, Luís Oliveira luismbo@gmail.com wrote:
On Fri, Aug 6, 2010 at 12:58 PM, Vsevolod Dyomkin vseloved@gmail.com wrote:
I've prepared a patch to CFFI, that can handle this situation (additional TRANSLATE- method for string blobs of known size) and will soon send it,
if
I no better solution will be found.
Cool. I think you can support that through a :size argument for the :string type. Let me know if you need help implementing such a thing.
-- Luís Oliveira http://r42.eu/~luis/ http://r42.eu/%7Eluis/
On Sat, Aug 7, 2010 at 7:23 AM, Vsevolod Dyomkin vseloved@gmail.com wrote:
Actually, the string type doesn't have such a slot now (and that was kind of what I thought to add). But the more simple solution (not involving the need to patch CFFI) was just to use (foreign-string-to-lisp data :count size), which is also used internally by translate-from-foreign (but without the count parameter).
Sorry I wasn't clear. What I meant is that you could add such a slot, then use it in translate-from-foreign to feed foreign-string-to-lisp's count parameter.
Cheers,