[closure-devel] Array access out of bounds in Closure HTML's sgml parser
We're using Closure HTML and Drakma to extract information from Web pages. We've run across an intermittent fault with one page in particular from YouTube. We had a little difficulty reproducing the bug at first, but we discovered that YouTube was sending us different contents each time. We ran our code in a loop and captured several hundred deliveries of the Web page in question until we got another instance that failed. I've put a copy of the HTML that trips the bug up at http://www.deepsky.com/~tuxedo/youtube-sgml-breaker.html You can see the problem by loading closure-html and drakma and evaluating this form: (chtml:parse (drakma:http-request "http://www.deepsky.com/~tuxedo/youtube-sgml-breaker.html") (chtml:make-lhtml-builder)) On SBCL 1.0.53, I'm getting this error: Index 8192 out of bounds for (SIMPLE-ARRAY CHARACTER (8192)), should be nonnegative and <8192. [Condition of type SB-INT:INVALID-ARRAY-INDEX-ERROR] The error is raised in SGML::READ-LITERAL. I only vaguely understand what's going on in that function. I note that it's raising the error when it's parsing the big block of flashvar-related stuff on line 244 of the HTML file, and if I delete or add an extra character earlier in that line, I can make the error go away. I infer that there's something happening in the character decoding at the point where it needs to grow the buffer that's making it lose, but I can't figure out just what it is. Keith Browne
participants (1)
-
Keith Browne