Hi David,
Thanks for your reply.
It sounds that I would need to do some careful reading of the html and xml specs in order to come up with a patch to get parsing right.
It may be possible, however, to solve the problem without having to perfect the parser. I found that HTML Tidy can deal with the malformed attributes if you use the option that removes proprietary attributes. This way only a set of known good attributes are passed to the SAX builder. Would this be appropriate for Closure HTML?
If you care to look at Tidy (I was using HTML Tidy for Linux released on 25 March 2009), here is an example invocation:
echo '<html> <body> <table> <tr><td vertical-align:="" ;=""></td></tr> </table> </body> </html> ' | tidy -q -asxml --force-output yes --drop-proprietary-attributes yes
Regards,
Russell
On 28/10/11 23:13, David Lichteblau wrote:
Quoting Russell Kliese (russell@kliese.id.au):
I am looking for suggestions on how to continue parsing even when attribute names are malformed.
When Closure was written, I think a lot of effort was put into making it correct errors, but that logic is just not 100% complete. We'd need a set of good test cases.
Unfortunately I don't have a ready-to-use patch at this point; do you have a suggestion?