Hello folks,
I'd like to share some thoughts on the "usual" problem of in-memory compilation in abcl. I haven't written any code to support my thoughts - I just hope to make things clearer for me and other people on the list, and I'm asking you to share your ideas and comment mine. Warning - this is going to be a long post.
Let's start by describing how abcl works now. Every piece of code in the JVM needs to be contained in a method of some class, so when abcl compiles a function it produces - surprise - a class, or better, a stream of bytes that the JVM knows how to interpret to create a class. Classes produced by the compiler extend a common abcl class providing the methods that will be used to actually invoke the function; the compiler, among other things, will override some of those methods. When using the runtime compiler, the class is immediately loaded; when using the file compiler, it is stored on the filesystem for later use. So far, nothing requires temporary files to work. However, a Lisp function can contain nested functions introduced with FLET or LABELS, and those have to be compiled to classes as well. Also, they need to be loaded contextually with the main function. How does abcl solve this? By adding instructions in the main class to load the local functions when the class itself is loaded, relatively to where it is loaded from. And here lies the problem. That location is always assumed to be a file in some filesystem subtree. So even when using the runtime compiler all the classes must be, at least temporarily, stored in files for the load machinery to work. Changing this is hard; classes are resolved from strings, how do know when a given string represents a file and when it represents some object in memory instead? There are workarounds, but I think it's the approach itself that's brittle.
So let's step back a bit and take a look at how the JVM and Java the language work with respect to loading classes. The JVM uses dedicated objects called classloaders. They are responsible of translating from a class' symbolic name (a string) to a class metaobject, much like the CLOS find-class function does. Classloaders are organized hierarchically: every classloader has a parent which is first consulted to see if it already has the class (there is of course a built-in bootstrap classloader to break the circularity); if it has, it returns it, if it hasn't, it is loaded in a manner dependent to the particular classloader (e.g. from a file, from http, from memory, ...). The process of loading a class from a byte array is native in the JVM, so classloaders only get to decide where the byte array comes from and what it contains. Now to the more interesting things: 1. A class never exists in isolation; to do its work it will need to refer to other classes (at a bare minimum, its superclass and any interface it implements). The JVM - automatically! - uses the same classloader to load a class and, at linking time, all of its dependencies. 2. If I had to manually redo in Java what the abcl compiler does with functions, I'd use static inner classes to represent local functions. Inner classes are classes which are textually defined inside another class and share some data with it. Inner classes do not exist at the bytecode level, only at the Java language level: the compiler (javac) translates them to regular classes, with their name mangled. For example, a class Inner defined inside a class Outer will be referred to Outer.Inner in Java, but compiled to Outer$Inner.class by javac. 3. Inner classes then are treated exactly like the others: referred using strings inside code, resolved by a classloader (generally - always? - by the classloader of the containing class).
Return to abcl. As you may have guessed, I propose that we no longer make classes autoload their dependencies, but properly use classloaders instead, in a fashion similar to how inner classes work. We will have an InMemoryClassLoader which will load classes from a Map<String, byte[]>, and a slightly extended URLClassLoader to load classes from the filesystem. Both, in addition to load classes, will be used by the compiler to write classes as well, so it will continue to use the class-file abstraction, changing only the code that actually writes the bytes. Every time the compiler would have written a call to loadCompiledFunction(classname) it will now use something like functionFoo.class.getClassLoader().loadClass(classname) where functionFoo is the compiler-generated name of the class representing the compiled Lisp function. Everything else should stay the same.
Does this sound convincing? I admit I have left many things to elaborate on, and I haven't rehashed the code in the compiler, going mainly from memory instead. But I believe this approach has not been proposed before and looks doable. The next few days I'll try writing some sketch code to back up my ideas, if no-one finds any serious problem with them.
Peace, Alessio