On Wed, Apr 15, 2009 at 8:55 PM, Boris Smilga boris.smilga@gmail.com wrote:
On Tue, Apr 14, 2009 at 1:53 PM, Henrik Hjelte henrik@evahjelte.com wrote:
One thing you might want to know before upgrading, when comparing the performance testcases the new version seems a bit slower on my sbcl (see below).
To be sure, that's 200% deceleration, and it stays that way if the COUNT in the test is increased 10, 100, etc. times. “A bit” seems like an understatement here. I'm afraid that's the price we pay for dynamic customization, as most of the accrued run time is used up (prima facie) by the handler invocation machinery—which is there for every little char! Methinks there is an evident way to optimize this, and I'm going to try it out (tomorrow if I have time).
Guess who is out in the left field again... Bypassing the handler mechanism does not, per se, gain much (my first impression was evidently wrong); moreover, checking whether to use the standard or the handler-free track actually impairs the performance.
However, I was able to pinpoint one factor which is responsible for about half of the performance regression (in the DECODER-PERFORMANCE test, the deceleration is down to ≈80% from ≈160% on Darwin/PPC, and to ≈120% from ≈190% on FreeBSD/i386). This factor was the use of vector accumulators in member handlers as opposed to list accumulators—the same thing that has caused test failures in SBCL 1.0.27. Ironically, it was intended as an optimization. (CLisp is the only implementation where it does work, and even in it the improvement is slight; in others, it is negative).
The attached patch altogether discards vector accumulators. I'll look if more can be done to enhance the performance.
- B. Sm.