Hi,
What is the status of CXML on OpenMCL? I'm using the OpenMCL build from 070214. This is a pre-release version that runs on intel macs.
I see the following when cxml is loaded:
;;; Checking for wide character support... no, reverting to octet strings. ;;; Building cxml with (UNSIGNED-BYTE 16) RUNES
However, when I trace runes:rod=, I see:
0> Calling (RUNES:ROD= #(69 78 84 73 84 89) #(69 78 84 73 84 89)) <0 RUNES:ROD= returned T 0> Calling (RUNES:ROD= "cent" "iexcl") <0 RUNES:ROD= returned NIL 0> Calling (RUNES:ROD= "cent" "nbsp")
I think this only happens when a DTD file requests the loading of an entities file. My guess is that files are being opened inconsistently in different spots in cxml. I don't understand CXML's architecture well enough to figure out this problem, perhaps one of you could help with this?
Thanks,
Sunil
Hi,
Quoting Sunil Mishra (smishra@sfmishras.com):
What is the status of CXML on OpenMCL? I'm using the OpenMCL build from 070214. This is a pre-release version that runs on intel macs.
CXML is meant to work on OpenMCL and passes its test suites on last december's AMD 64 snapshot.
0> Calling (RUNES:ROD= "cent" "nbsp")
I think this only happens when a DTD file requests the loading of an entities file. My guess is that files are being opened inconsistently in different spots in cxml. I don't understand CXML's architecture well enough to figure out this problem, perhaps one of you could help with this?
That would be a bug, perhaps meaning that some code path in cxml is coercing rods to strings and then using the wrong functions.
However, I cannot reproduce the problem. The DTD for XHTML has *.ent files similar to what you seem to be using, and that works for me.
Can you send me a test case?
Thanks, David
Hi David,
Attached is an example. You should be able to load up the latest cxml in the latest openmcl snapshot, and then load bug.lisp. That's enough for me to see the error.
Sunil
David Lichteblau wrote:
Hi,
Quoting Sunil Mishra (smishra@sfmishras.com):
What is the status of CXML on OpenMCL? I'm using the OpenMCL build from 070214. This is a pre-release version that runs on intel macs.
CXML is meant to work on OpenMCL and passes its test suites on last december's AMD 64 snapshot.
0> Calling (RUNES:ROD= "cent" "nbsp")
I think this only happens when a DTD file requests the loading of an entities file. My guess is that files are being opened inconsistently in different spots in cxml. I don't understand CXML's architecture well enough to figure out this problem, perhaps one of you could help with this?
That would be a bug, perhaps meaning that some code path in cxml is coercing rods to strings and then using the wrong functions.
However, I cannot reproduce the problem. The DTD for XHTML has *.ent files similar to what you seem to be using, and that works for me.
Can you send me a test case?
Thanks, David
Hi,
Quoting Sunil Mishra (smishra@sfmishras.com):
Attached is an example. You should be able to load up the latest cxml in the latest openmcl snapshot, and then load bug.lisp. That's enough for me to see the error.
Thanks.
What you are seeing is due to the automatic recoding to strings that I introduced to make things easier for cxml users on non-Unicode Lisps. :-)
When cxml:parse-file is used on such Lisps, the parser will use rods internally, but since users don't tend to be interested in working with runes (Closure itself is probably the only application that really wants to see runes), the default is to recode those rods into UTF-8 strings before handing them to the user-specified SAX handler.
So in this case, your DOM builder gets Lisp strings containing UTF-8 octets, but you explicitly created a DOM builder for runes, which is the reason for the mismatch:
;; fails: (cxml:parse-file "redirect.xml" (rune-dom:make-dom-builder) :recode t ;<--- default setting :entity-resolver #'default-entity-resolver)
There are two solutions. One is to disable recoding and use runes:
;; works (using runes) (cxml:parse-file "redirect.xml" (rune-dom:make-dom-builder) :recode nil ;<--- disable recoding :entity-resolver #'default-entity-resolver)
The other is what most users will be interested in:
;; works (using characters representing UTF-8 octets) (cxml:parse-file "redirect.xml" (cxml-dom:make-dom-builder) ;<--- note cxml-dom package :entity-resolver #'default-entity-resolver)
One Lisps without Unicode support, cxml-dom is an alias for utf8-dom.
On Unicode-aware Lisps, cxml-dom is an alias for rune-dom, since runes are characters anyway.
d.