[armedbear-devel] Fwd: [j-devel] Improving startup time: sanity check

29 Oct 2009

      forwarding

---------- Forwarded message ----------
From: Alessio Stalla <alessiostalla@gmail.com>
Date: Wed, Oct 28, 2009 at 11:20 PM
Subject: Re: [j-devel] Improving startup time: sanity check
To: Erik Huelsmann <ehuels@gmail.com>
Cc: armedbear-j-devel@lists.sourceforge.net, Alex Muscar <muscar@gmail.com>

On Wed, Oct 28, 2009 at 9:15 PM, Erik Huelsmann <ehuels@gmail.com> wrote:
...
Last weekend, we experimented with better autoloading. It turned out
to strip roughly .4 seconds from a cold startup time of 1.7s, making
it a 25% improvement.
However, the reason we started out with the startup time improvements
in the first place was the ABCL startup time on Google App Engine. It
turns out that our CPU usage during startup hasn't really decreased
much (as per their benchmark indicator - they can't really give an
actual figure).
So, I asked for advice on #appengine (on freenode). Their reaction was
"we can't imagine the startup time being related to the size of the
JAR" even though Peter Graves calculated a 34% ratio between ABCL and
Clojure jar sizes and a 35% ratio between startup times - that looks
like a linear match. Their reaction continued "you're probably just
doing too much work during the init() phase."
The init() phase is where the ABCL environment gets loaded and all
function objects get created.
Let's assume for a second they're right. In that case we must assume
it's not I/O holding us up: it's the work the CPU must do to get us up
and running. If that's true, profiling the application should tell us
something about the bottlenecks we're running into. I happen to have
done quite a number of such profiles in the course of last week. The
conclusion which stands out is that ABCL - during the startup process
- spends ~ 40% of its time finding class constructors: the main
component of creating function objects.
This brought me to the conclusion that our startup process could be
much faster, if we decided to delay function object creation until the
function is actually used: we would eliminate the need to construct
function objects until they're used instead of creating them when
their siblings are requested to be loaded.
The idea is to create another Autoload derivative which will be
"installed" in the appropriate places which, when invoked, loads the
actual class from the byte array. I'm hoping this will cause a more
equally spread "initialization load". The performance hit will only be
the first call to the function: after it has been converted from the
byte array, the autoload object will remove itself from the function
call chain.
So, how about it? Comments most welcome!
I have mixed feelings about the idea. I think it's clever; but I also
think we (I, at least) need more data to know if it will be actually
beneficial.

If the goal is speeding up startup time in a context like AppEngine -
where not only Lisp, but the whole user application will be loaded
from scratch from time to time - then it is critical to know how many
Lisp functions a generic application uses on average (both directly
and indirectly). If it turns up that, say, 50% of Lisp is commonly
used, then no matter how clever an autoloading scheme you implement,
you'll cut loading times only by roughly 50% at best.
If getting constructors through reflection is really the bottleneck,
and if we determine that using new instead of reflection is
significantly faster (from a quick test of mine, it seems it *really*
is [1]), then it might be sensible to avoid reflection altogether and
devise another scheme. For example, the compiler-generated class X
could contain in its static initialization block the equivalent of
something like

Lisp.someThreadLocal.set(new X())

and loadCompiledFunction or what it is could just fetch the instance
from the threadlocal; not very elegant, but if it speeds things up...

Alessio

[1] this is the astounding result on a couple of runs on 50000
iterations (test files attached):
REFLECTION: 16262373155
NEW: 84267527
% SLOWER: 19298

REFLECTION: 15917190176
NEW: 103681915
% SLOWER: 15351

REFLECTION: 15838714133
NEW: 77235481
% SLOWER: 20507

(times in ns) i.e. reflection as we use it is roughly 150-200 times
slower than new and that's on a very simple class with no superclasses
and a single constructor! The test might be wrong as I wrote it
quickly and it's quite tricky. It uses the very same classloader of
abcl, though (copy-pasted).

Erik Huelsmann

Tobias C. Rittweiler

Alessio Stalla

Erik Huelsmann

tags

participants (3)