On 2/24/10 5:25 PM, Ville Voutilainen wrote:
On 24 February 2010 17:55, Mark Evensonevenson@panix.com wrote:
Everything seems to work, but I notice that I seem to have 35 failing ANSI compiled tests, whereas I though we only should have 34. N.B. We should really mark the expected failures because chasing the set differences of these failure lists always gives me headaches for what should>be a simple task.
It's not so difficult to compare failure lists. Dump the ansi test results into files and diff.
Explicitly marking which tests are expected to fail has the property that one doesn't have to compare anything with anything. Checking the tests would then provide immediate, unmistakable feedback.
I wrote the attached utility based on SET-DIFFENCE to parse and report on a simple s-expr based database of test results, and I have also attached the tests results I have collected over the last day. These results now contradict my earlier email, showing:
CL-USER> (difference 'compileit 'r12506 '0.18.0) R12506: 34 failures NIL 0.18.0: 35 failures (MAKE-BROADCAST-STREAM.8) NIL CL-USER> (difference 'doit 'r12506 '0.18.0) R12506: 33 failures NIL 0.18.0: 34 failures (MAKE-BROADCAST-STREAM.8)
which shows we actually improved our coverage in this development cycle (Ville fixed MAKE-BROADCAST-STREAM.8 in r12397).
INVOKE-DEBUGGER.1 and WITH-STANDARD-IO-SYNTAX.23 shouldn't be there - they didn't fail in my baseline test (taken on 0.18 release) and they didn't fail the last time I ran the tests on trunk.
My testing shows that the WITH-STANDARD-IO-SYNTAX.23 has been present since 0.18.0. Perhaps this is an OSX specific thing? I'll test that out next.