There are the changes I had to make in tests.lisp in order to get the tests to pass, in the latest ITA version of Clozure Common Lisp (formerly known as OpenMCL).
CCL does not support having a character with code #\udcf0. The reader signals a condition if it sees this. Unfortunately, using #-ccl does not seem to solve the problem, presumably since the #- macro is working by calling "read" and it is not suppressing unhandled conditions, or something like that. It might be hard to fix that in a robust way.
In order to make progress, I had to just comment these out. I do not suggest merging that into the official sources, but it would be very nice if we could find a way to write tests.lisp in such a way that these tests would apply when the characters are supported, and not when they are not.
The (or (code-char ..) ...) change, on the other hand, I think should be made in the official sources. The Hyperspec says clearly that code-char is allowed to return nil.
What do you think?
-- Dan
Index: trunk/qres/lisp/libs/babel/tests/tests.lisp =================================================================== --- trunk/qres/lisp/libs/babel/tests/tests.lisp (revision 249746) +++ trunk/qres/lisp/libs/babel/tests/tests.lisp (revision 262389) @@ -259,22 +259,25 @@ #(97 98 99))
-(defstest utf-8b.1 - (string-to-octets (coerce #(#\a #\b #\udcf0) 'unicode-string) - :encoding :utf-8b) - #(97 98 #xf0)) - -(defstest utf-8b.2 - (octets-to-string (ub8v 97 98 #xcd) :encoding :utf-8b) - #(#\a #\b #\udccd)) - -(defstest utf-8b.3 - (octets-to-string (ub8v 97 #xf0 #xf1 #xff #x01) :encoding :utf-8b) - #(#\a #\udcf0 #\udcf1 #\udcff #\udc01)) - -(deftest utf-8b.4 () - (let* ((octets (coerce (loop repeat 8192 collect (random (+ #x82))) - '(array (unsigned-byte 8) (*)))) - (string (octets-to-string octets :encoding :utf-8b))) - (is (equalp octets (string-to-octets string :encoding :utf-8b))))) +;; CCL does not suppport Unicode characters between d800 and e000. +;(defstest utf-8b.1 +; (string-to-octets (coerce #(#\a #\b #\udcf0) 'unicode-string) +; :encoding :utf-8b) +; #(97 98 #xf0)) + +;; CCL does not suppport Unicode characters between d800 and e000. +;(defstest utf-8b.2 +; (octets-to-string (ub8v 97 98 #xcd) :encoding :utf-8b) +; #(#\a #\b #\udccd)) + +;; CCL does not suppport Unicode characters between d800 and e000. +;(defstest utf-8b.3 +; (octets-to-string (ub8v 97 #xf0 #xf1 #xff #x01) :encoding :utf-8b) +; #(#\a #\udcf0 #\udcf1 #\udcff #\udc01)) + +;(deftest utf-8b.4 () +; (let* ((octets (coerce (loop repeat 8192 collect (random (+ #x82))) +; '(array (unsigned-byte 8) (*)))) +; (string (octets-to-string octets :encoding :utf-8b))) +; (is (equalp octets (string-to-octets string :encoding :utf-8b)))))
;;; The following tests have been adapted from SBCL's @@ -338,5 +341,6 @@ (let ((string (make-string unicode-char-code-limit))) (dotimes (i unicode-char-code-limit) - (setf (char string i) (code-char i))) + ;; CCL does not suppport Unicode characters between d800 and e000. + (setf (char string i) (or (code-char i) #\a))) (let ((string2 (octets-to-string (string-to-octets string :encoding enc