ChangeLog:
Version 0.1.1 2008-07-24 Make ADD-HANGUL-NAMES faster for ClozureCL
Download:
http://weitz.de/files/cl-unicode.tar.gz
Edi.
So when will utf-8 be the natural encoding of ppcre? (Yes, I know I'm greedy)
regards
On Thu, 24 Jul 2008 17:18:21 +0100, "Dave Pawson" dave.pawson@gmail.com wrote:
So when will utf-8 be the natural encoding of ppcre?
I don't understand the question. CL-PPCRE deals with strings and not with arrays of octets.
2008/7/24 Edi Weitz edi@agharta.de:
On Thu, 24 Jul 2008 17:18:21 +0100, "Dave Pawson" dave.pawson@gmail.com wrote:
So when will utf-8 be the natural encoding of ppcre?
xml has long dealt with 'strings of characters' encoded in utf-8. That way I can include an umlaut, an arabic glyph or a chinese symbol
Any reason lisp should not enjoy that level of internationalisation?
regards
On Thu, 24 Jul 2008 17:39:40 +0100, "Dave Pawson" dave.pawson@gmail.com wrote:
xml has long dealt with 'strings of characters' encoded in utf-8.
I think you are confused. In Lisp, characters and strings are really characters and strings.
CL-USER 4 > #\ä #\ä
CL-USER 5 > (type-of *) CHARACTER
CL-USER 6 > (char-name **) "Latin-Small-Letter-A-With-Diaeresis"
If you want to convert between octets and characters (that's where encodings like UTF-8 make sense), most CL implementations have facilities for this out of the box. For portable solutions see for example here:
http://weitz.de/flexi-streams/ http://common-lisp.net/project/babel/
That way I can include an umlaut, an arabic glyph or a chinese symbol
See above.
Any reason lisp should not enjoy that level of internationalisation?
It does already.
HTH, Edi.
2008/7/24 Edi Weitz edi@agharta.de:
I think you are confused. In Lisp, characters and strings are really characters and strings.
CL-USER 6 > (char-name **) "Latin-Small-Letter-A-With-Diaeresis"
Sorry ** doesn't look like u00e4
If you want to convert between octets and characters (that's where encodings like UTF-8 make sense), most CL implementations have facilities for this out of the box. For portable solutions see for example here:
http://weitz.de/flexi-streams/ http://common-lisp.net/project/babel/
I don't want to convert, I want to read utf-8 from a file, work in 'characters', build them into strings and write them back to file, in utf-8
Any reason lisp should not enjoy that level of internationalisation?
It does already.
seems we have a different definition of 'working'.
regards
< Sorry ** doesn't look like u00e4 >
http://www.supelec.fr/docs/cltl/clm/node181.html
Daniel
On Thu, Jul 24, 2008 at 11:09 AM, Dave Pawson dave.pawson@gmail.com wrote:
2008/7/24 Edi Weitz edi@agharta.de:
I think you are confused. In Lisp, characters and strings are really characters and strings.
CL-USER 6 > (char-name **) "Latin-Small-Letter-A-With-Diaeresis"
Sorry ** doesn't look like u00e4
If you want to convert between octets and characters (that's where encodings like UTF-8 make sense), most CL implementations have facilities for this out of the box. For portable solutions see for example here:
http://weitz.de/flexi-streams/ http://common-lisp.net/project/babel/
I don't want to convert, I want to read utf-8 from a file, work in 'characters', build them into strings and write them back to file, in utf-8
Any reason lisp should not enjoy that level of internationalisation?
It does already.
seems we have a different definition of 'working'.
regards
-- Dave Pawson XSLT XSL-FO FAQ. http://www.dpawson.co.uk _______________________________________________ cl-ppcre-devel site list cl-ppcre-devel@common-lisp.net http://common-lisp.net/mailman/listinfo/cl-ppcre-devel
Dave,
sorry to be harsh, but the problem here is that you don't understand external formats and how they relate to characters. Most modern Lisps use Unicode as their character set, and most of them represent characters as 16 or 32 bit integers internally. UTF-8, contrasted to that, is an external encoding scheme for Unicode characters, and again, most Lisps support reading and writing characters in UTF-8 encoding.
The external format of files read and written is usually specified using the :external-format keyword argument to functions like OPEN, WITH-OPEN-FILE etc. Also, there are portability libraries like BABEL that can be helpful to convert Lisp strings to arbitary external formats, for example when calling foreign functions or reading and writing binary files.
CL-PPCRE uses Lisp characters and strings and works with Unicode characters just fine. The CL-UNICODE library is a portability library for working with Unicode directly, but most users never really need to do that.
Please read up on external formats in your Lisp implementation's manual.
-Hans
On Thu, Jul 24, 2008 at 19:09, Dave Pawson dave.pawson@gmail.com wrote:
2008/7/24 Edi Weitz edi@agharta.de:
I think you are confused. In Lisp, characters and strings are really characters and strings.
CL-USER 6 > (char-name **) "Latin-Small-Letter-A-With-Diaeresis"
Sorry ** doesn't look like u00e4
If you want to convert between octets and characters (that's where encodings like UTF-8 make sense), most CL implementations have facilities for this out of the box. For portable solutions see for example here:
http://weitz.de/flexi-streams/ http://common-lisp.net/project/babel/
I don't want to convert, I want to read utf-8 from a file, work in 'characters', build them into strings and write them back to file, in utf-8
Any reason lisp should not enjoy that level of internationalisation?
It does already.
seems we have a different definition of 'working'.
regards
-- Dave Pawson XSLT XSL-FO FAQ. http://www.dpawson.co.uk _______________________________________________ cl-ppcre-devel site list cl-ppcre-devel@common-lisp.net http://common-lisp.net/mailman/listinfo/cl-ppcre-devel
"Dave Pawson" dave.pawson@gmail.com writes:
I don't want to convert, I want to read utf-8 from a file, work in 'characters', build them into strings and write them back to file, in utf-8
This just works. You probably need to use external-format with OPEN (or more likely WITH-OPEN-FILE) to indicate the encoding you are using. This will read one line of file in LispWorks:
(with-open-file (in file-name :external-format :utf-8 :element-type 'character) (read-line in))
seems we have a different definition of 'working'.
Please explain what doesn't work. Maybe a code sample would help.
Cheers, Chris Dean
On Thu, 24 Jul 2008 18:09:51 +0100, "Dave Pawson" dave.pawson@gmail.com wrote:
2008/7/24 Edi Weitz edi@agharta.de:
I think you are confused. In Lisp, characters and strings are really characters and strings.
CL-USER 6 > (char-name **) "Latin-Small-Letter-A-With-Diaeresis"
Sorry ** doesn't look like u00e4
Get a good book about Common Lisp and come back once you've understood the basic issues.
http://www.lispworks.com/documentation/HyperSpec/Body/v__stst_.htm
I don't want to convert, I want to read utf-8 from a file, work in 'characters', build them into strings and write them back to file, in utf-8
Sigh...
seems we have a different definition of 'working'.
Humor me - please give me a short description what I need to change to make UTF-8 "the natural encoding" of CL-PPCRE. I'm really looking forward to that.
cl-ppcre-devel@common-lisp.net