A few years back, I posted this request, but at the time I got no reply. Since there has recently been some activity here, I'm asking this again.
Currently, closure-html throws an error when trying to parse HTML generated by Microsoft Outlook. The reason for this is that Outlook generates a lot of tags with a colon (:) in them, which closure-html considers a syntax error.
I don't know if it actually is a syntax error, but closure-html should be lenient here and simply ignore these tags. That's what my fix does. Currently, I had to advice my users to manually patch closure-html. It would be very nice if this was integrated in the official version.
Here is the patch:
diff --git a/src/parse/html-parser.lisp b/src/parse/html-parser.lisp index 1fdd457..4e45b81 100644 --- a/src/parse/html-parser.lisp +++ b/src/parse/html-parser.lisp @@ -106,7 +106,10 @@ for (name value) on plist by #'cddr unless ;; better don't emit as HAX what would be bogus as SAX anyway - (string-equal name "xmlns") + (let ((s (string name)) + (prefix "xmlns:")) + (or (string-equal s "xmlns") + (string-equal s prefix :end1 (min (length s) (length prefix))))) collect (let* ((n #+rune-is-character (coerce (symbol-name name) 'rod) #-rune-is-character (symbol-name name)) diff --git a/src/parse/sgml-parse.lisp b/src/parse/sgml-parse.lisp index faa9029..a277ece 100644 --- a/src/parse/sgml-parse.lisp +++ b/src/parse/sgml-parse.lisp @@ -182,7 +182,8 @@ (or (name-start-rune-p char) (digit-rune-p char) (rune= char #/.) - (rune= char #/-))) + (rune= char #/-) + (rune= char #/:)))
(definline sloopy-name-rune-p (char) (or (name-rune-p char)