Quoting Andrei Stebakov (lispercat@gmail.com):
Say I need to parse html that I got from some external source and for some reason there are namespaces in the text:
(chtml:parse "<a href='someurl.com' somens:url='someurl.com'>text</a>" (stp:make-builder))
The parser will choke on somens namespace since it's not mapped to any url: 0: (CXML-STP:STP-ERROR "attribute with prefix but no URI")[:EXTERNAL] 1: (CXML-STP:RENAME-ATTRIBUTE #<error printing object>) 2: (CXML-STP:MAKE-ATTRIBUTE "someurl.com" "somens:url" "") 3: ((SB-PCL::FAST-METHOD SAX:START-ELEMENT (CXML-STP-IMPL::BUILDER T T T T)) ..)
Indeed, something needs to be done to fix this, since chtml purports to fix bogus html without erroring out.
At the moment, chtml liberally accepts these attributes for its own internal PT representation, but then accidentally turns PT attributes into HAX events (and then SAX events) without further validation.
I think it might be easiest to continue allowing them in PT, but to change PT serialization to fix them before constructing hax attribute objects.
Here is a simple patch that just discards the attribute (changing its name would be another option). Note that the patch isn't good enough to commit it as this point, because it introduces a dependency from chtml to cxml.
--- a/src/parse/html-parser.lisp +++ b/src/parse/html-parser.lisp @@ -98,16 +98,20 @@ ;;; (merge-pathnames (or pathname (pathname input)))))) (parse-xstream xstream handler)))))
+(defun good-attribute-name-p (name) + (and (cxml::valid-name-p name) + (not (or (string-equal name "xmlns") + (position #: name))))) + (defun serialize-pt-attributes (plist recode) (loop for (name value) on plist by #'cddr - unless - ;; better don't emit as HAX what would be bogus as SAX anyway - (string-equal name "xmlns") + for n = #+rune-is-character (coerce (symbol-name name) 'rod) + #-rune-is-character (symbol-name name) + ;; don't emit as HAX what would be bogus as SAX anyway + if (good-attribute-name-p n) collect - (let* ((n #+rune-is-character (coerce (symbol-name name) 'rod) - #-rune-is-character (symbol-name name)) - (v (etypecase value + (let ((v (etypecase value (symbol (coerce (string-downcase (symbol-name value)) 'rod)) (rod (funcall recode value)) (string (coerce value 'rod)))))
Is there a way to specify some global variable to turn off namespace processing? I saw *namespace-processing* variable in some other package but it doesn't seem to be relevant in this case.
You could use DOM instead of STP, I suppose. DOM doesn't do these sorts of checks IIRC.
(Personally I strongly prefer STP over DOM, but one reason for that preference is that STP is stricter, which is nice when actually working with XML.)
d.