Hi,
Quoting Russell Kliese (russell@kliese.id.au):
It may be possible, however, to solve the problem without having to perfect the parser. I found that HTML Tidy can deal with the malformed attributes if you use the option that removes proprietary attributes. This way only a set of known good attributes are passed to the SAX builder. Would this be appropriate for Closure HTML?
I've been using HTML Tidy for several years to clean up HTML. It's a very extensive solution, and was helpful to us at the time. However, I had to modify its code to fix some of its problems. The basic issue seems to be that Tidy is too aggressive: It does not agree with the more gentle cleanup performed by actual browsers (and as specified through HTML5). So if you want to pass W3C validators for HTML 4 Strict, then Tidy will indeed give output which achieves that goal. But that output won't actually match like the input when rendered by an actual browser...
None of that is an excuse for Closure HTML to be too lenient though; lack of time has prevented me so far from migrating my HTML Tidy using codebase to Closure HTML, which would entail working through tests for proper HTML5-style cleanup.
d.