[cxml-devel] Bugs in namespace parsing.
 
            I think I have run into a pretty bad bug in namespace handling in Closure XML. Of course, I may just be using the code wrongly. I am using the 2007-02-18 tarball of cxml on a piece of XML code from RFC 4741 (netconf protocol): (defvar example "<rpc message-id=\"101\" xmlns=\"urn:ietf:params:xml:ns:netconf:base:1.0\"> <my-own-method xmlns=\"http://example.net/me/my-own/1.0\"> <my-first-parameter>14</my-first-parameter> <another-parameter>fred</another-parameter> </my-own-method> </rpc>") I run (defvar p-example (cxml:parse-stream (make-string-input-stream example) cxml-xmls:make-xmls-builder))) This is the result: (("rpc" (("xmlns" "urn:ietf:params:xml:ns:netconf:base:1.0") ("message-id" "101")) " " ("my-own-method" (("xmlns" "http://example.net/me/my-own/1.0")) " " ("my-first-parameter" NIL "14") " " ("another-parameter" NIL "fred") " ") " ")) It should have been something like ((("rpc" . "urn:ietf:params:xml:ns:netconf:base:1.0") (("xmlns" "urn:ietf:params:xml:ns:netconf:base:1.0") (("message-id" . "urn:ietf:params:xml:ns:netconf:base:1.0") "101")) " " (("my-own-method" . "http://example.net/me/my-own/1.0") (("xmlns" "http://example.net/me/my-own/1.0")) " " (("my-first-parameter" . "http://example.net/me/my-own/1.0) NIL "14") " " (("another-parameter" . "http://example.net/me/my-own/1.0) NIL "fred") " ") " ")) There seems to be not one, but two problems. First, the parser or the builder ignores the rule that the scope of an xmlns attribute starts from the tag of the element in which it is included. Second, (and that is probably the builder) fail to apply the default namespace to internal elements. FYI, XMLS has the approximately the same kind of bug. So has the PXML suite from Franz. It is evidently something that is a bit tough to get right. Maybe because it involves two scans of a <> construction - you really don't know the namespace of tags and attributes until you have parsed the entire <> construction and located the zero-or-one "xmlns=" and the zero-one-or-many "xmlns:<name>=" attributes. best regards -- Peder Chr. Nørgaard e-mail: pcn@pogt.dk Gefionsvej 19 spejder-e-mail: hathi@gallerne.dk DK-8230 Åbyhøj tel: +45 87 44 11 99 Denmark mob: +45 30 91 84 31
 
            Quoting Peder Chr. N?rgaard (pcn@pogt.dk):
There seems to be not one, but two problems. First, the parser or the builder ignores the rule that the scope of an xmlns attribute starts from the tag of the element in which it is included. Second, (and that is probably the builder) fail to apply the default namespace to internal elements.
The XML parser implements namespace handling correctly. The XMLS-compatible builder is incorrect, because it is immitating XMLS behaviour. If I may quote from my own documentation: fixme: It is unclear to me how namespaces are meant to work in xmls, since xmls documentation differs from how xmls actually works in current releases. Usually applications need to know both the namespace prefix and the namespace URI. We currently follow the xmls implementation and use the namespace prefix instead of following its documentation which shows the URI. We do not follow xmls in munging xmlns attribute values. Attributes themselves have namespaces and it is not clear to me how that works in xmls.
FYI, XMLS has the approximately the same kind of bug. So has the PXML suite from Franz. It is evidently something that is a bit tough to get right. Maybe because it involves two scans of a <> construction - you really don't know the namespace of tags and attributes until you have parsed the entire <> construction and located the zero-or-one "xmlns=" and the zero-one-or-many "xmlns:<name>=" attributes.
Based on CXML's SAX parser, tt is not hard at all to write a builder with correct namespace support, you just have to pick the format you want and implement it. The SAX parser already does all namespace processing, so the events SAX:START-ELEMENT and SAX:END-ELEMENT include all necessary information, as separate arguments (or slots in the case of the attributes). The XMLS builder source code in CXML is also meant as copy&paste material in this regard, because it is a simple SAX handler that can easily be adapted. In fact, Edi Weitz has already done exactly that. Download CL-WEBDAV (http://weitz.de/cl-webdav/) and have a look at xml.lisp in that tarball. As Edi writes on his page: We're representing XML as XMLS nodes which are very similar to CXML's XMLS nodes but try to get namespaces right because they don't purport to be compatible with XMLS Apparently there is some WebDav specific code in xml.lisp, but that should be easy to remove. Regards, David
 
            On Tuesday 22 May 2007, David Lichteblau wrote:
Quoting Peder Chr. N?rgaard (pcn@pogt.dk):
There seems to be not one, but two problems. First, the parser or the builder ignores the rule that the scope of an xmlns attribute starts from the tag of the element in which it is included. Second, (and that is probably the builder) fail to apply the default namespace to internal elements.
The XML parser implements namespace handling correctly.
The XMLS-compatible builder is incorrect, because it is immitating XMLS behaviour.
If I may quote from my own documentation:
fixme: It is unclear to me how namespaces are meant to work in xmls, since xmls documentation differs from how xmls actually works in current releases. Usually applications need to know both the namespace prefix and the namespace URI. We currently follow the xmls implementation and use the namespace prefix instead of following its documentation which shows the URI. We do not follow xmls in munging xmlns attribute values. Attributes themselves have namespaces and it is not clear to me how that works in xmls.
Thanks for your answer. That is all good news.
FYI, XMLS has the approximately the same kind of bug. So has the PXML suite from Franz. It is evidently something that is a bit tough to get right. Maybe because it involves two scans of a <> construction - you really don't know the namespace of tags and attributes until you have parsed the entire <> construction and located the zero-or-one "xmlns=" and the zero-one-or-many "xmlns:<name>=" attributes.
Based on CXML's SAX parser, tt is not hard at all to write a builder with correct namespace support, you just have to pick the format you want and implement it. The SAX parser already does all namespace processing, so the events SAX:START-ELEMENT and SAX:END-ELEMENT include all necessary information, as separate arguments (or slots in the case of the attributes).
The XMLS builder source code in CXML is also meant as copy&paste material in this regard, because it is a simple SAX handler that can easily be adapted.
Building another builder sounds like a winner for my purpose. Actually, I would then rather to to emulate the builder of Franz's xml parser (without the bug, that is!).
In fact, Edi Weitz has already done exactly that. Download CL-WEBDAV (http://weitz.de/cl-webdav/) and have a look at xml.lisp in that tarball. As Edi writes on his page:
We're representing XML as XMLS nodes which are very similar to CXML's XMLS nodes but try to get namespaces right because they don't purport to be compatible with XMLS
Apparently there is some WebDav specific code in xml.lisp, but that should be easy to remove.
I will look into that, too. Thanks for your answer. best regards -- Peder Chr. Nørgaard e-mail: pcn@pogt.dk Gefionsvej 19 spejder-e-mail: hathi@gallerne.dk DK-8230 Åbyhøj tel: +45 87 44 11 99 Denmark mob: +45 30 91 84 31
participants (2)
- 
                 David Lichteblau David Lichteblau
- 
                 Peder Chr. Nørgaard Peder Chr. Nørgaard