cl-ppcre-devel

Download

cl-ppcre-devel@common-lisp.net

January 2009

2 participants
1 discussions

[cl-ppcre-devel] Questions regarding cl-unicode
by Juan Jose Garcia-Ripoll 15 Jan '09

15 Jan '09

Hi, I just subscribed to this mailing list, which I believe is not only for cl-ppcre but also for cl-unicode. If I am wrong, please point me in the right direction :-) My name is Juanjo and I am the maintainer of ECL (http://ecls.sourceforge.net) I am currently interested on completing the support for Unicode in ECL which is, more or less, at the level of what SBCL provides and, in my opinion, far from optimal. I have been pondering several options, but all of them seem like reinventing the wheel, so I finally came to the conclusion that the most sensible strategy would be to turn cl-unicode into a full (optional) replacement of the ANSI Common Lisp functions for dealing with characters and strings, and hope that this would become a de-facto standard. Perhaps that is a too ambitious goal, or maybe it is even futile, given the level of adoption of Unicode among lispers. My concerns are now centered about several questions. 1) Optimize the database information that is built into cl-unicode. ECL currently uses the SBCL procedure for compressing the database and I believe this can be even optimized further. Instead of binary trees or hashes, this leads to two-stages byte table that encodes the currently 209 different combinations of properties. This is important for ECL because we need it to stay lean and simple and because our procedures for exporting data structures in compiled code are not efficient, due to contrants in C compilers. One possibility is that CL-UNICODE reuses the SBCL and ECL databases. Other possibility is to look for even more efficient data stuctures. 2) Add support for the most important Unicode algorithms, which are canonical decomposition of strings, string upper/lower/titlecasing, and string collation. Ideally this should be transparently incorporated into new Common-Lisp functions that can be used to replace the old ones, such as char-upcase, string-equal, etc. Of course, due to the differences between Unicode and ANSI CL, the specifications would change. 3) Add support for the locales database provided by the Unicode consortium. This is essential for implementing string collation, since the ordering of characters is locale dependent. 4) Integration and shipping of cl-unicode with different implementations, if possible. I would be interested on having CL-UNICODE as a contributed package in the ECL source tree, so that it can be activated with a simple configuration option. I believe there are no license issues, and there is only the problem that CL-UNICODE depends on CL-PPCRE (is this dependency essential? could it be eliminated?) Well, maybe this is all BS, but I would like to read your opinions on the topic. Juanjo -- Instituto de Física Fundamental, CSIC c/ Serrano, 113b, Madrid 28009 (Spain) http://juanjose.garciaripoll.googlepages.com

2 2