In clozure code-char returns nil for two ranges:
(loop with in = t for i below 1114110 as c = (code-char i) do (cond ((and in (not c)) (setf in nil) (format t "~&gap: #x~X" i)) ((and (not in) c) (setf in t) (format t " .. #x~X" (1- i) )))) gap: #xD800 .. #xDFFF gap: #xFFFE .. #xFFFF NIL
Searching on phrases like: "Range: D800–DBFF. The High Surrogate Area does not contain any character" "the value FFFE ! is guaranteed not to be a. Unicode character at all" will throw some additional light on that, and it seems related to this discussion: http://thread.gmane.org/gmane.lisp.babel.devel/15
These cause the file unicode.lisp in cxml-rng to error when compiling. The following let's it compile; but I doubt this is how anybody who knew what this code was doing would recommend doing this. It reorders the system's components, so I have the range functions from clex available. It then modifies massage-ranges to cut out these gaps from any ranges which pass thru.
But my interest is limited to getting the file to compile so my current project, which doesn't need cxml-rng, can get back on track as I experiment with ccl. So I'll admit I haven't tried testing this patch at all.
- ben
bash-3.2$ git diff diff --git a/cxml-rng.asd b/cxml-rng.asd index e64adff..64582ca 100644 --- a/cxml-rng.asd +++ b/cxml-rng.asd @@ -17,12 +17,12 @@ :components ((:file "package") (:file "floats") + (:file "clex") (:file "unicode") (:file "nppcre") (:file "types") (:file "parse") (:file "validate") (:file "test") - (:file "clex") (:file "compact")) :depends-on (:cxml :cl-ppcre :yacc :parse-number :cl-base64)) diff --git a/unicode.lisp b/unicode.lisp index 42b686a..c5ea17f 100644 --- a/unicode.lisp +++ b/unicode.lisp @@ -57,6 +57,8 @@ `(defranges ,name ',ranges)))
(defun massage-ranges (l) + #+ccl (setf l (cxml-clex::ranges- l '((#xd800 #xDFFF) + (#xFFFE #x10000)))) (mapcan (lambda (x) (let ((a (code-char (car x))) (b (code-char (cadr x)))) bash-3.2$