Re: [cffi-devel] a thought on string encodings

newer
Re: [cffi-devel] a thought on...

older
Re: [cffi-devel] a thought on...

Hoehle, Joerg-Cyril

2 Jan 2006 2 Jan '06

3:03 p.m.

James Bielman wrote:

...

Now that this is working, I will start pulling the implementation-specific code into an encoding-aware %FOREIGN-STRING-TO-LISP function in CFFI-SYS.

How are you going to represent the encodings? - use the implementation's objects? - introduce your own names? - restrict yourself to known encodings, e.g. UTF-8 (users on MS-Windows would appreciate UTF-16)?

...

#+clisp (defcfun "strlen" :unsigned-int (s :pointer)) Hmm, I lost track of whether that works on MS-Windows in a MS-VC build (as opposed to mingw/cygwin/unix-like).

Yeah, I should really add this string stuff to clisp instead. Regards, Jorg Hohle.

Show replies by date

Yaroslav Kavenchuk

3 Jan 3 Jan

7:48 a.m.

New subject: [cffi-devel] a thought on string encodings

Hoehle, Joerg-Cyril wrote:

...

...
Now that this is working, I will start pulling the implementation-specific code into an encoding-aware %FOREIGN-STRING-TO-LISP function in CFFI-SYS.

How are you going to represent the encodings? - use the implementation's objects? - introduce your own names? - restrict yourself to known encodings, e.g. UTF-8 (users on MS-Windows would appreciate UTF-16)?

... and other encodings which depend from Windows locale. Thanks! -- WBR, Yaroslav Kavenchuk.

James Bielman

11:44 a.m.

New subject: [cffi-devel] a thought on string encodings

On Mon, 2006-01-02 at 16:03 +0100, Hoehle, Joerg-Cyril wrote:

...

How are you going to represent the encodings? - use the implementation's objects? - introduce your own names? - restrict yourself to known encodings, e.g. UTF-8 (users on MS-Windows would appreciate UTF-16)?

Currently the plan is to represent encodings with keywords, which we can map in CFFI-SYS to whatever objects are necessary for the implementation. Ultimately, you would be able to do stuff like: (defcfun "use_a_latin1_string" :void (s (:string :encoding :iso-8859-1))) (defctype utf8-string (:string :encoding :utf-8)) (defcfun "getenv" utf8-string (name utf8-string)) I'd like to see at least :ascii, :iso-8859-1, :utf-8, and :utf-16 to start with. Then we can start adding support for extra encodings. It looks like CLISP supports the most encodings currently, so it will probably be the main test platform once this moves beyond the basics. Also, I think plain :string will be modified to use whatever the default encoding is set to in an implementation-specific manner (possibly based on the user's locale, or whatever the Windows equivalent is, etc). I haven't thought about this too hard yet. I don't think it will support much beyond :ascii or :iso-8859-1 in non-Unicode Lisps---I don't want to encumber CFFI with a bunch of character code tables.

...

Yeah, I should really add this string stuff to clisp instead.

My plan for the CFFI-SYS interface so far looks like (modulo some % prefixes): Function: LIST-ENCODINGS Return a list of CFFI encodings (keyword symbols) supported by this implementation. Function: FOREIGN-STRING-LENGTH pointer encoding &key (offset 0) Return the length in octets of the null terminated foreign string at POINTER plus OFFSET octets, assumed to be encoded in ENCODING, a CFFI encoding. This should be smart enough to look for 8-bit vs 16-bit null terminators, as appropriate for the encoding. Function: LISP-STRING-OCTET-LENGTH string encoding &key start end Return the length of STRING from START to END, converted to ENCODING, in octets. This can be used to preallocate a buffer to pass to LISP-STRING-TO-FOREIGN. Function: LISP-STRING-TO-FOREIGN string encoding &key start end buffer => pointer Convert characters from START to END (character indices) in STRING to a foreign string, encoding in ENCODING, a CFFI encoding. If BUFFER, a pointer, is supplied, the foreign string will be written to that location. BUFFER must be large enough to accommodate the foreign string---this can be queried with LISP-STRING-OCTET-LENGTH. If BUFFER is not supplied, a freshly allocated string will be returned. Free this string with CFFI:FOREIGN-FREE. Function: FOREIGN-STRING-TO-LISP pointer encoding &key start end => string Convert octets from START to END (octet indices) from POINTER, assumed to be encoded in ENCODING, to a Lisp string. If not supplied, END should default to: (foreign-string-length pointer encoding :offset start) I think CLISP has enough to implement these fairly efficiently, but any additional primitives you want to add or comments on this interface will certainly be helpful! James

Yaroslav Kavenchuk

11:52 a.m.

New subject: [cffi-devel] a thought on string encodings

James Bielman wrote:

...

I don't think it will support much beyond :ascii or :iso-8859-1 in non-Unicode Lisps---I don't want to encumber CFFI with a bunch of character code tables.

All this is possible take from iconv library. But it, most likely, will not help mono-encoding Lisps. Thanks! I wait it with impatience. :) -- WBR, Yaroslav Kavenchuk.

7133

Age (days ago)

7134

Last active (days ago)

List overview

Download

3 comments

3 participants

participants (3)

Hoehle, Joerg-Cyril
James Bielman
Yaroslav Kavenchuk