A limiting factor in uptake and wider-scale use of ABCL, and java applications in general, has been startup time. Some native lisps have a save-world or save-image facility https://stackoverflow.com/questions/39133421/how-to-properly-save-common-lisp-image-using-sbcl#39148561 that lets you create a snapshot of a running lisp which can then quickly resumed. I've long wanted something like this for ABCL - loading the system I work on (LSW https://github.com/alanruttenberg/lsw2) takes maybe 15 seconds on my reasonably fast laptop. This makes it annoying to run for short tasks, or to run in a pipe with non-lisp tooks. I also work with OWL ontologies that can take 10s of minutes to classify. I'd very much like to save that work so that I can fire off quick query, or reduce development time turnaround.
Recently I've started to play with docker https://www.docker.com/community-edition and found that there is starting to be support for CRIU https://github.com/xemul/criu, which allows one to save the state of linux processes/containers and then restart them. I've just run a first successful test of this and I wanted to share the recipe, see if others have poked at this, see if there is interest in thinking about / implementing more transparent support for this for ABCL, and in case it is useful, as is, for someone.
1. You may need to install support for CRIU into docker. I had to do this with docker for mac https://www.docker.com/docker-mac, which is what I'm running. I think it's built into docker for linux, and am not sure about docker for windows. To install the mac support I followed instructions at https://hub.docker.com/r/boucher/criu-for-mac/
run: *docker run --rm -it --privileged --pid=host boucher/criu-for-mac*
2. Start a container that loads up ABCL and give it enough time to load (about 20 seconds in my case)
*docker run -id lsw2/devel /home/lsw/repos/lsw2/bin/lsw*
Note that the shell prompt returns immediately. There's no automatic way, at the moment, to see when the lisp has finished loading.
In this case lsw2/lisp is my image built according to instructions at https://github.com/alanruttenberg/lsw2/tree/owlapiv4/virtual-machine
/home/lsw/repos/lsw2/bin/lsw is the command that runs ABCL and loads my system inside the container.
Replace these with your own, or use Mark Evenson's easye/abcl as an image to try (or use his Dockerfile https://github.com/armedbear/abcl/blob/master/Dockerfile as a base) . That image starts ABCL by itself, so you don't need to do it on the docker run command line as I do above.
The -i switch hooks up stdin and the -d switch lets the container sit running in the background. Once ABCL is finished starting, the process in the container is effectively sitting at the repl prompt, waiting for input, but its stdin isn't connected to anything at the moment. Note that if you use the -t switch, as one commonly sees in docker examples, the checkpoint stage below will fail.
3. run: *docker ps* to get the Container ID for the container just created.
4. run: *docker checkpoint create <container-id> <checkpoint-name>* <container-id> is the id you just got from docker ps. <checkpoint-name> is a label you choose for the saved-state.
On completion of the command the container stops running.
5. run: *docker start --checkpoint <checkpoint-name> <container-id>*
After a much quicker startup(fraction of a second), the container is running again. You can interact with it by
6. run: *docker attach <container-id>*
Hit return and you'll get the REPL prompt and can type something in to eval. Use control-d (EOF) to detach. You can reattach the same way again. Type (quit) to the repl, or run *docker kill <container-id>* to stop the container for good and exit.
--
Obviously there is MUCH more that needs to be done to make this easy and practical, e.g.
- packaging: While one can package up an image that a container can start from and easily serve it, the checkpoints are not easily attachable to images. Rather, they are specific to containers, and saving process state isn't couple with saving file-system state. That could be managed with some wrapper scripts that checkpoints on first run, etc. I'm also not sure of the mechanics of running several containers from the same checkpoint, at the same time. I'm guessing that sort of thing will be more directly supported in docker within a year or 2, if not sooner.
- Doing this without docker: While using docker means that you can create an artifact that runs in all platforms, it may be easier to package up an application that will certainly run under linux to use CRIU directly. It may even be more practical to use CRIU inside the container instead of docker checkpoint. dunno.
- reliability: I haven't run many tests yet to try to ensure that the checkpointed process doesn't have any problems - I've just tried simple things at the prompt. I haven't experimented with, e.g. running a web server inside ABCL and seeing if there are any issues with ports or network processes on restart.
- restart hooks: some things may not be able to be saved. We'll need a provision for hooks that run when restarting and documentation about when they have to be used.
Alan Ruttenberg
armedbear-devel@common-lisp.net