Christophe Rhodes csr21@cam.ac.uk writes:
Does that clear things up? I think there are more problems than just this in the communications layer, but maybe this will do until after dinner... :-)
Well, I don't understand what the locale has to do with the SLIME coding system. Of course, the locale may have some have a multibyte coding system as default, but for SLIME we can use whatever we like. I think we should use a unibyte coding system. Simple, easy to understand, and efficient. You will not be able to send your emails with SLIME, but that's a restriction I can live with :-)
We do already
(set-buffer-multibyte nil)
in slime-make-net-buffer. So, character positions it the *cl-connection* buffer should correspond to byte offsets.
In slime-net-connect we do:
(set-process-coding-system proc 'no-conversion 'no-conversion)
Maybe we should use 'iso-latin-1-unix for writing.
How does that sound?
Helmut.
Helmut Eller e9626484@stud3.tuwien.ac.at writes:
Christophe Rhodes csr21@cam.ac.uk writes:
Does that clear things up? I think there are more problems than just this in the communications layer, but maybe this will do until after dinner... :-)
Well, I don't understand what the locale has to do with the SLIME coding system. Of course, the locale may have some have a multibyte coding system as default, but for SLIME we can use whatever we like. I think we should use a unibyte coding system. Simple, easy to understand, and efficient. You will not be able to send your emails with SLIME, but that's a restriction I can live with :-)
We do already
(set-buffer-multibyte nil)
In slime-net-connect we do:
(set-process-coding-system proc 'no-conversion 'no-conversion)
How does that sound?
Ah, I see... this is more complicated (much more complicated) than I thought. Sorry.
Consider what happens when I do (in my lisp) (code-char #x3bb)
because I want to get hold of a greek-lower-case-lambda character? I suppose slime will do prin1-to-string of the result of that as part of its repl implementation, and get the in-memory byte sequence (assuming an allegro/lispworks-like representation of strings for simplicity, but though the details are different in sbcl they're not very different)
00 23 00 5C 03 BB 00 00 # \ <l> ^@ = lowercase lambda
what should the lisp system do when it has to transmit this string through a socket to emacs? The stream between emacs and lisp is (on the lisp side) a character stream, so the lisp will look at this sequence as characters and apply an external format transformation: in a utf-8 locale, this will be to transmit the utf-8 corresponding to this sequence. If the emacs side doesn't read that as utf-8, problems of the sort that we're seeing now will occur.
And something of this form will happen whenever the lisp wants to emit a character which is unrepresentable in the non-multibyte buffer used by slime -- what does non-multibyte mean, anyway?
Meanwhile, it seems to me that there must still be some other encoding-dependent interaction around " *cl-connection*", as otherwise my "fix" to use base 128 in the communication would not have improved matters even slightly.
I feel we're still grappling with what the problem actually is, I'm afraid... sorry for not being clearer.
Cheers,
Christophe
Christophe Rhodes writes:
Helmut Eller e9626484@stud3.tuwien.ac.at writes: I feel we're still grappling with what the problem actually is, I'm afraid... sorry for not being clearer.
I'm puzzled. It works perfecly well with clisp under ilisp.
(setq clisp-hs-program "/usr/local/bin/clisp -ansi -q -K full -m 32M -I -Eterminal UTF-8")
(add-hook 'ilisp-init-hook (lambda () (set-buffer-process-coding-system 'mule-utf-8 'mule-utf-8)))
So, let's see what you're doing differently under slime (and sbcl?).
First, swank creates its own channel instead of *terminal-io*. When opening it it must specify the wanted encoding (:external-format charset:??? in clisp, what in sbcl?).
Then slime must set the buffer process codig system to match that specified by swank as external format.
Christophe Rhodes csr21@cam.ac.uk writes:
And something of this form will happen whenever the lisp wants to emit a character which is unrepresentable in the non-multibyte buffer used by slime -- what does non-multibyte mean, anyway?
From the Emacs manual:
In unibyte representation, each character occupies one byte and therefore the possible character codes range from 0 to 255. Codes 0 through 127 are ASCII characters; the codes from 128 through 255 are used for one non-ASCII character set (you can choose which character set by setting the variable `nonascii-insert-offset').
Meanwhile, it seems to me that there must still be some other encoding-dependent interaction around " *cl-connection*", as otherwise my "fix" to use base 128 in the communication would not have improved matters even slightly.
I feel we're still grappling with what the problem actually is, I'm afraid... sorry for not being clearer.
Assuming that Zach didn't change the coding system on the Emacs side we can say that SBCL changed the coding system in recent versions. Since older SBCLs had only 8-bit chars we can also say that Zach's Emacs uses a unibyte coding system for reading and writing.
I think we have two options:
1) use a fixed coding system between Emacs and Lisp. This coding system should be unibyte, since that is supported by all Lisps and covers most non-exotic uses. In this case we have to tell SBCL that it should use iso-latin-1 or similar instead of utf-8 for the socket stream to Emacs. Your lambda cannot be send to Emacs and the write operation should signal an error.
2) make the coding system configurable. One advantage is that you can send lambda to Emacs. The disadvantage is that we have to make the Emacs side multibyte clean, have to write about it in the manual, have the constant feeling that the coding systems don't match.
I strongly prefer option 1. What do you think?
Helmut.
On Wednesday 10 November 2004 00:03, Helmut Eller wrote:
I think we have two options:
use a fixed coding system between Emacs and Lisp. This coding system should be unibyte, since that is supported by all Lisps and covers most non-exotic uses. In this case we have to tell SBCL that it should use iso-latin-1 or similar instead of utf-8 for the socket stream to Emacs. Your lambda cannot be send to Emacs and the write operation should signal an error.
make the coding system configurable. One advantage is that you can send lambda to Emacs. The disadvantage is that we have to make the Emacs side multibyte clean, have to write about it in the manual, have the constant feeling that the coding systems don't match.
I strongly prefer option 1. What do you think?
Once the dust settles and I make the switch to the unicode capable SBCL - something I've been waiting for a long time - I'll have source code (html generation mostly) with embedded multi-lingual text, not to mention the obligatory lambda on the front page. It would be a great annoyance not being able to see a backtrace with funky characters.
As a user I very much prefer option 2.
Gabor