We're using Closure HTML and Drakma to extract information from Web pages. We've run across an intermittent fault with one page in particular from YouTube. We had a little difficulty reproducing the bug at first, but we discovered that YouTube was sending us different contents each time. We ran our code in a loop and captured several hundred deliveries of the Web page in question until we got another instance that failed.
I've put a copy of the HTML that trips the bug up at
http://www.deepsky.com/~tuxedo/youtube-sgml-breaker.html
You can see the problem by loading closure-html and drakma and evaluating this form:
(chtml:parse (drakma:http-request "http://www.deepsky.com/~tuxedo/youtube-sgml-breaker.html") (chtml:make-lhtml-builder))
On SBCL 1.0.53, I'm getting this error:
Index 8192 out of bounds for (SIMPLE-ARRAY CHARACTER (8192)), should be nonnegative and <8192. [Condition of type SB-INT:INVALID-ARRAY-INDEX-ERROR]
The error is raised in SGML::READ-LITERAL. I only vaguely understand what's going on in that function. I note that it's raising the error when it's parsing the big block of flashvar-related stuff on line 244 of the HTML file, and if I delete or add an extra character earlier in that line, I can make the error go away. I infer that there's something happening in the character decoding at the point where it needs to grow the buffer that's making it lose, but I can't figure out just what it is.
Keith Browne