Towards save-world for ABCL

14 Sep 2017

      A limiting factor in uptake and wider-scale use of ABCL, and java
applications in general, has been startup time. Some native lisps have a
save-world or save-image facility
<https://stackoverflow.com/questions/39133421/how-to-properly-save-common-lisp-image-using-sbcl#39148561>
that
lets you create a snapshot of a running lisp which can then quickly
resumed. I've long wanted something like this for ABCL - loading the system
I work on (LSW <https://github.com/alanruttenberg/lsw2>) takes maybe 15
seconds on my reasonably fast laptop. This makes it annoying to run for
short tasks, or to run in a pipe with non-lisp tooks. I also work with OWL
ontologies that can take 10s of minutes to classify. I'd very much like to
save that work so that I can fire off quick query, or reduce development
time turnaround.

Recently I've started to play with docker
<https://www.docker.com/community-edition> and found that there is starting
to be support for CRIU <https://github.com/xemul/criu>, which allows one to
save the state of linux processes/containers and then restart them. I've
just run a first successful test of this and I wanted to share the recipe,
see if others have poked at this, see if there is interest in thinking
about / implementing more transparent support for this for ABCL, and in
case it is useful, as is, for someone.

1. You may need to install support for CRIU into docker. I had to do this
with docker for mac <https://www.docker.com/docker-mac>, which is what I'm
running. I think it's built into docker for linux, and am not sure about
docker for windows. To install the mac support I followed instructions at
https://hub.docker.com/r/boucher/criu-for-mac/

run: *docker run --rm -it --privileged --pid=host boucher/criu-for-mac*

2. Start a container that loads up ABCL and give it enough time to load
(about 20 seconds in my case)

*docker run -id lsw2/devel /home/lsw/repos/lsw2/bin/lsw*

Note that the shell prompt returns immediately. There's no automatic way,
at the moment, to see when the lisp has finished loading.

In this case lsw2/lisp is my image built according to instructions at
https://github.com/alanruttenberg/lsw2/tree/owlapiv4/virtual-machine

/home/lsw/repos/lsw2/bin/lsw is the command that runs ABCL and loads my
system inside the container.

Replace these with your own, or use Mark Evenson's easye/abcl as an image
to try (or use his Dockerfile
<https://github.com/armedbear/abcl/blob/master/Dockerfile> as a base) .
That image starts ABCL by itself, so you don't need to do it on the docker
run command line as I do above.

The -i switch hooks up stdin and the -d switch lets the container sit
running in the background. Once ABCL is finished starting, the process in
the container is effectively sitting at the repl prompt, waiting for input,
but its stdin isn't connected to anything at the moment.
Note that if you use the -t switch, as one commonly sees in docker
examples, the checkpoint stage below will fail.

3. run: *docker ps* to get the Container ID for the container just created.

4. run: *docker checkpoint create <container-id> <checkpoint-name>*
<container-id> is the id you just got from docker ps.
<checkpoint-name> is a label you choose for the saved-state.

On completion of the command the container stops running.

5. run: *docker start --checkpoint <checkpoint-name> <container-id>*

After a much quicker startup(fraction of a second),  the container is
running again. You can interact with it by

6. run: *docker attach <container-id>*

Hit return and you'll get the REPL prompt and can type something in to
eval. Use control-d (EOF) to detach. You can reattach the same way again.
Type (quit) to the repl, or run *docker kill <container-id>* to stop the
container for good and exit.

--

Obviously there is MUCH more that needs to be done to make this easy and
practical, e.g.

- packaging: While one can package up an image that a container can start
from and easily serve it, the checkpoints are not easily attachable to
images. Rather, they are specific to containers, and saving process state
isn't couple with saving file-system state. That could be managed with some
wrapper scripts that checkpoints on first run, etc. I'm also not sure of
the mechanics of running several containers from the same checkpoint, at
the same time. I'm guessing that sort of thing will be more directly
supported in docker within a year or 2, if not sooner.

- Doing this without docker: While using docker means that you can create
an artifact that runs in all platforms, it may be easier to package up an
application that will certainly run under linux to use CRIU directly. It
may even be more practical to use CRIU inside the container instead of
docker checkpoint. dunno.

- reliability: I haven't run many tests yet to try to ensure that the
checkpointed process doesn't have any problems - I've just tried simple
things at the prompt. I haven't experimented with, e.g. running a web
server inside ABCL and seeing if there are any issues with ports or network
processes on restart.

- restart hooks: some things may not be able to be saved. We'll need a
provision for hooks that run when restarting and documentation about when
they have to be used.

Alan Ruttenberg

Alan Ruttenberg

tags

participants (1)