Hi,
Quoting Sunil Mishra (smishra@sfmishras.com):
Attached is an example. You should be able to load up the latest cxml in the latest openmcl snapshot, and then load bug.lisp. That's enough for me to see the error.
Thanks.
What you are seeing is due to the automatic recoding to strings that I introduced to make things easier for cxml users on non-Unicode Lisps. :-)
When cxml:parse-file is used on such Lisps, the parser will use rods internally, but since users don't tend to be interested in working with runes (Closure itself is probably the only application that really wants to see runes), the default is to recode those rods into UTF-8 strings before handing them to the user-specified SAX handler.
So in this case, your DOM builder gets Lisp strings containing UTF-8 octets, but you explicitly created a DOM builder for runes, which is the reason for the mismatch:
;; fails: (cxml:parse-file "redirect.xml" (rune-dom:make-dom-builder) :recode t ;<--- default setting :entity-resolver #'default-entity-resolver)
There are two solutions. One is to disable recoding and use runes:
;; works (using runes) (cxml:parse-file "redirect.xml" (rune-dom:make-dom-builder) :recode nil ;<--- disable recoding :entity-resolver #'default-entity-resolver)
The other is what most users will be interested in:
;; works (using characters representing UTF-8 octets) (cxml:parse-file "redirect.xml" (cxml-dom:make-dom-builder) ;<--- note cxml-dom package :entity-resolver #'default-entity-resolver)
One Lisps without Unicode support, cxml-dom is an alias for utf8-dom.
On Unicode-aware Lisps, cxml-dom is an alias for rune-dom, since runes are characters anyway.
d.