Parsing a github project page takes forever.
(setf sgml::*parse-warn-level* 5) (let ((page (drakma:http-request "https://github.com/rss-sync/corpus"))) (handler-case (bt:with-timeout (10) (chtml:parse page (make-instance 'hax:default-handler))) (condition (c) c)))
#<bordeaux-threads:timeout #x302009C80C6D>
Multiple div's appearing in a THead element are the root cause.
(setf sgml::*parse-warn-level* 0)
0
(chtml:parse "<table><thead><div>a</div><div>b</div></thead></table>" (make-instance 'hax:default-handler))
;; Parser warning: Line 1, column 26 : **** [-] Saw <div> in thead -- nuked <div>. ;; Parser warning: Line 1, column 31 : **** [H] Saw </div> in thead -- ??? patched (</div> <div>) -> (<div> </div>) ;; Parser warning: Line 1, column 31 : **** [-] Saw <div> in thead -- nuked <div>. ;; Parser warning: Line 1, column 32 : **** [H] Saw </div> in thead -- ??? patched (</div> <pcdata>) -> (<pcdata> </div>) ;; Parser warning: Line 1, column 32 : **** [-] Saw <pcdata> in thead -- nuked <pcdata>. ;; Parser warning: Line 1, column 38 : **** [H] Saw </div> in thead -- ??? patched (</div> </div>) -> (</div> </div>) ;; Parser warning: Line 1, column 38 : **** [H] Saw </div> in thead -- ??? patched (</div> </div>) -> (</div> </div>) ;; Parser warning: Line 1, column 38 : **** [H] Saw </div> in thead -- ??? patched (</div> </div>) -> (</div> </div>) …
So far, I'm not clever enough to fix this.
- ben
ps. Thanks for the awesome library.