Hello folks,
I'd like to share some thoughts on the "usual" problem of in-memory compilation in abcl. I haven't written any code to support my thoughts - I just hope to make things clearer for me and other people on the list, and I'm asking you to share your ideas and comment mine. Warning - this is going to be a long post.
Let's start by describing how abcl works now. Every piece of code in the JVM needs to be contained in a method of some class, so when abcl compiles a function it produces - surprise - a class, or better, a stream of bytes that the JVM knows how to interpret to create a class. Classes produced by the compiler extend a common abcl class providing the methods that will be used to actually invoke the function; the compiler, among other things, will override some of those methods. When using the runtime compiler, the class is immediately loaded; when using the file compiler, it is stored on the filesystem for later use. So far, nothing requires temporary files to work. However, a Lisp function can contain nested functions introduced with FLET or LABELS, and those have to be compiled to classes as well. Also, they need to be loaded contextually with the main function. How does abcl solve this? By adding instructions in the main class to load the local functions when the class itself is loaded, relatively to where it is loaded from. And here lies the problem. That location is always assumed to be a file in some filesystem subtree. So even when using the runtime compiler all the classes must be, at least temporarily, stored in files for the load machinery to work. Changing this is hard; classes are resolved from strings, how do know when a given string represents a file and when it represents some object in memory instead? There are workarounds, but I think it's the approach itself that's brittle.
So let's step back a bit and take a look at how the JVM and Java the language work with respect to loading classes. The JVM uses dedicated objects called classloaders. They are responsible of translating from a class' symbolic name (a string) to a class metaobject, much like the CLOS find-class function does. Classloaders are organized hierarchically: every classloader has a parent which is first consulted to see if it already has the class (there is of course a built-in bootstrap classloader to break the circularity); if it has, it returns it, if it hasn't, it is loaded in a manner dependent to the particular classloader (e.g. from a file, from http, from memory, ...). The process of loading a class from a byte array is native in the JVM, so classloaders only get to decide where the byte array comes from and what it contains. Now to the more interesting things: 1. A class never exists in isolation; to do its work it will need to refer to other classes (at a bare minimum, its superclass and any interface it implements). The JVM - automatically! - uses the same classloader to load a class and, at linking time, all of its dependencies. 2. If I had to manually redo in Java what the abcl compiler does with functions, I'd use static inner classes to represent local functions. Inner classes are classes which are textually defined inside another class and share some data with it. Inner classes do not exist at the bytecode level, only at the Java language level: the compiler (javac) translates them to regular classes, with their name mangled. For example, a class Inner defined inside a class Outer will be referred to Outer.Inner in Java, but compiled to Outer$Inner.class by javac. 3. Inner classes then are treated exactly like the others: referred using strings inside code, resolved by a classloader (generally - always? - by the classloader of the containing class).
Return to abcl. As you may have guessed, I propose that we no longer make classes autoload their dependencies, but properly use classloaders instead, in a fashion similar to how inner classes work. We will have an InMemoryClassLoader which will load classes from a Map<String, byte[]>, and a slightly extended URLClassLoader to load classes from the filesystem. Both, in addition to load classes, will be used by the compiler to write classes as well, so it will continue to use the class-file abstraction, changing only the code that actually writes the bytes. Every time the compiler would have written a call to loadCompiledFunction(classname) it will now use something like functionFoo.class.getClassLoader().loadClass(classname) where functionFoo is the compiler-generated name of the class representing the compiled Lisp function. Everything else should stay the same.
Does this sound convincing? I admit I have left many things to elaborate on, and I haven't rehashed the code in the compiler, going mainly from memory instead. But I believe this approach has not been proposed before and looks doable. The next few days I'll try writing some sketch code to back up my ideas, if no-one finds any serious problem with them.
Peace, Alessio
Alessio Stalla writes:
Let's start by describing how abcl works now. Every piece of code in the JVM needs to be contained in a method of some class, so when abcl compiles a function it produces - surprise - a class, or better, a stream of bytes that the JVM knows how to interpret to create a class.
This already seem counterintuitive to me. Consider this: (defun f(x) (lambda (y)(+ y x))) I hope that (f 1) does not have to create a new class. I hope that it creates a new "object" that is a member of a class that is only created once. Furthermore, a form like the defun above when interpreted does not have to create a new class, does it? It creates a new object of the class interpreted-function containing an fdefinition slot (field? member? whatever you call those data items in a class) which is a lisp object like (lambda(x)(lambda(y)(+ y x))).
I would hope that there is also a class compiled-function containing as one of its slots a byte code vector. Then the result of compiling a function in memory or loading a compiled function would be an instance of this compiled-function class. I see no need for subclasses of that class.
Classes produced by the compiler extend a common abcl class providing the methods that will be used to actually invoke the function; the
I argue that there is no need for the compiler to create new classes. It sould do something like new compiledFunction(byteCodeVector)
compiler, among other things, will override some of those methods. When using the runtime compiler, the class is immediately loaded; when using the file compiler, it is stored on the filesystem for later use. So far, nothing requires temporary files to work. However, a Lisp function can contain nested functions introduced with FLET or LABELS, and those have to be compiled to classes as well.
Into additional objects of type compiled-function, still not requiring any new classes.
Does this model violate any requirements of Java? Does java already have a class of compiled code objects that could be used, i.e., where creating a vector full of byte codes gives you something that can be directly executed? (Or maybe it has to go through some verifier first?) Would this approach solve all of your problems related to class loaders and temporary files? Maybe it would solve some other problems that you've not yet mentioned?
On Mon, 21 Sep 2009 21:48:57 -0700 don-sourceforge-xxz@isis.cs3-inc.com (Don Cohen) wrote:
Alessio Stalla writes:
Let's start by describing how abcl works now. Every piece of code in the JVM needs to be contained in a method of some class, so when abcl compiles a function it produces - surprise - a class, or better, a stream of bytes that the JVM knows how to interpret to create a class.
This already seem counterintuitive to me.
If you are compiling code to the JVM where would the code go but in a method? If you are defining a method, where would you put it but in a class.
Matt
Matthew D. Swank writes:
If you are compiling code to the JVM where would the code go but in a method? If you are defining a method, where would you put it but in a class.
I don't know what you mean by compiling code "to the jvm". Code is anything that can be interpreted by some interpreter and comes in many forms. There are even many different forms of byte code. I'd expect byte code to be represented in byte vectors. Suppose we don't want it to be jvm byte code, but some other sort, maybe clisp byte code. We can then write one single java function (in a method of a class) that can interpret any vector of our (clisp) byte code. We then declare a single class class compiledFunction{ byte[] code; compiledFunction(byte[] codevector){ code = codevector; } } compiledFunction f = new compiledFunction(vector); We then interpret that by calling interpret(f).
In this case the compiled code is in a byte vector, or perhaps you would say it's in a compiledFunction object. I don't think you'd say it's in a class.
If you can do that with clisp byte codes then why not with jvm byte codes? The only obvious difference is that the interpret function for jvm byte code vectors is already in the jvm so you don't even have to write it.
On Mon, 21 Sep 2009 22:58:25 -0700 don-sourceforge-xxz@isis.cs3-inc.com (Don Cohen) wrote:
Matthew D. Swank writes:
If you are compiling code to the JVM where would the code go but in a method? If you are defining a method, where would you put it but in a class.
I don't know what you mean by compiling code "to the jvm".
Well in the context of abcl, a Common Lisp implementation that _targets_ the JVM, perhaps we could have a more fruitful discussion if you focus on what jvm bytecodes represent: http://java.sun.com/docs/books/jvms/second_edition/html/VMSpecTOC.doc.html.
No matter how you are interacting with a class loader, no matter what the concrete representation of the byte code is in the compiler implementation, when it's loaded, when java sees it, it's all classes and methods.
It's not clisp bytecode, it's not p-code, it's not even Squeak s-code (though that's probably closer), it's Java. That means the issues are very concrete: how is it possible to make compiled classes available to the JVM (presumably w/o using the file system).
Matt
Matthew D. Swank writes:
Well in the context of abcl, a Common Lisp implementation that _targets_ the JVM, perhaps we could have a more fruitful discussion if you focus on what jvm bytecodes represent: http://java.sun.com/docs/books/jvms/second_edition/html/VMSpecTOC.doc.html.
No matter how you are interacting with a class loader, no matter what
I don't understand why you have to interact with a class loader in order to create a compiled function. Perhaps you're saying that the only way to create executable code in JVM is via the class loader. In other words that JVM does not support methods that make a compiled function object out of a byte vector, and doesn't even give you any way to build such a thing. I've looked at the jvm spec a little, but it takes a lot of searching and analysis to know for sure that something like this CANNOT be done. If it cannot then I would tend to view that as an unfortunate omission from the spec. Is there any reason the spec should not allow that? The only thing I can see is that the designers of the language just didn't imagine it being used this way.
the concrete representation of the byte code is in the compiler implementation, when it's loaded, when java sees it, it's all classes and methods. It's not clisp bytecode, it's not p-code, it's not even Squeak s-code (though that's probably closer), it's Java. That means the issues are very concrete: how is it possible to make compiled classes available to the JVM (presumably w/o using the file system).
I don't see why compiled code can only come from compiled classes. But if all compiled code has to be in a class then I also don't see why compiled classes (is there any other kind?) have to come from the class loader. Why are they not objects that can be created by the JVM by code like new Class(...). Perhaps that's what the class loader is, but it only supports something like new Class(classfile) ? Again I don't see why it should not support new Class on some other sort of object specifying the class and allowing byte vectors as the representation of compiled code. Why should it not just allow creation of classes by new Class() and then allow you to add functions (and data fields, and subclasses, etc) to the class?
Could this be a difference in perspective between java and lisp programmers/implementers/designers?
On Tue, Sep 22, 2009 at 9:03 AM, Don Cohen don-sourceforge-xxz@isis.cs3-inc.com wrote:
Matthew D. Swank writes:
> Well in the context of abcl, a Common Lisp implementation that > _targets_ the JVM, perhaps we could have a more fruitful discussion if > you focus on what jvm bytecodes represent: > http://java.sun.com/docs/books/jvms/second_edition/html/VMSpecTOC.doc.html. > > No matter how you are interacting with a class loader, no matter what
I don't understand why you have to interact with a class loader in order to create a compiled function. Perhaps you're saying that the only way to create executable code in JVM is via the class loader. In other words that JVM does not support methods that make a compiled function object out of a byte vector, and doesn't even give you any way to build such a thing. I've looked at the jvm spec a little, but it takes a lot of searching and analysis to know for sure that something like this CANNOT be done. If it cannot then I would tend to view that as an unfortunate omission from the spec. Is there any reason the spec should not allow that? The only thing I can see is that the designers of the language just didn't imagine it being used this way.
I'm not a language lawyer, but I have some experience in Java programming and I assure you that the JVM cannot execute code which is not in a method of a class. Even constructors and static initialization blocks - which apparently by looking at the Java source are not methods - are implemented with methods when compiled. In other words, the JVM is natively object oriented, it has no concept of function, only methods.
> the concrete representation of the byte code is in the compiler > implementation, when it's loaded, when java sees it, it's all classes > and methods. > It's not clisp bytecode, it's not p-code, it's not even Squeak s-code > (though that's probably closer), it's Java. That means the issues are > very concrete: how is it possible to make compiled classes available to > the JVM (presumably w/o using the file system). I don't see why compiled code can only come from compiled classes. But if all compiled code has to be in a class then I also don't see why compiled classes (is there any other kind?) have to come from the class loader. Why are they not objects that can be created by the JVM by code like new Class(...). Perhaps that's what the class loader is,
Exactly, the classloader is the object responsible to do the equivalent of new Class(...)
but it only supports something like new Class(classfile) ? Again I don't see why it should not support new Class on some other sort of object specifying the class and allowing byte vectors as the representation of compiled code.
It doesn't only supports loading from files. Out of the box there are classloaders that know how to load code from arbitrary URLs, from JARs inside WARs (packaged web applications), and more; you can write your own and even generate code on the fly if you want, but, no matter what, the end result will always be a class with methods holding the code.
Why should it not just allow creation of classes by new Class() and then allow you to add functions (and data fields, and subclasses, etc) to the class?
Because a Java class is unmodifiable; once loaded, it is carved in stone. You can arbitrarily construct the bytecode representing the class adding fields and methods as you wish, before loading it - that's what the abcl compiler does, in fact. Adding a subclass does not modify the superclass, you merely create a new class.
Could this be a difference in perspective between java and lisp programmers/implementers/designers?
Absolutely, the JVM does not seem to have been designed with dynamic languages in mind, although it is dynamic enough to run a Lisp...
I hope I have made things a little bit clearer.
Cheers, Alessio
Alessio Stalla writes:
In other words, the JVM is natively object oriented, it has no concept of function, only methods.
This is not, in my mind, the same thing as being natively object oriented. A function could be an object too. If you wish I'll be happy to view a function as a method of a class that is nothing more than a container for that function.
It doesn't only supports loading from files. Out of the box there are classloaders that know how to load code from arbitrary URLs, from JARs inside WARs (packaged web applications), and more; you can write your own and even generate code on the fly if you want, but, no matter what, the end result will always be a class with methods holding the code.
Good, so you should be able to write a "classloader" that supports creation of an anonymous subclass of "compiledFunction" (of which there might never be any instances) given a single argument of type byte vector.
On Tue, Sep 22, 2009 at 9:32 AM, Don Cohen don-sourceforge-xxz@isis.cs3-inc.com wrote:
Alessio Stalla writes: > In other > words, the JVM is natively object oriented, it has no concept of > function, only methods. This is not, in my mind, the same thing as being natively object oriented. A function could be an object too. If you wish I'll be happy to view a function as a method of a class that is nothing more than a container for that function.
Exactly, we agree. The point I was trying to make is that the container class must exist, so the compiler must create it.
> It doesn't only supports loading from files. Out of the box there are > classloaders that know how to load code from arbitrary URLs, from JARs > inside WARs (packaged web applications), and more; you can write your > own and even generate code on the fly if you want, but, no matter > what, the end result will always be a class with methods holding the > code. Good, so you should be able to write a "classloader" that supports creation of an anonymous subclass of "compiledFunction" (of which there might never be any instances) given a single argument of type byte vector.
Anonymous classes don't exist in the JVM, but apart from that, you have just described the abcl classloader ;) However, my original point was not about the general compilation model of abcl, which is more than fine by me; rather, it was about the specific fact that the generated code includes instructions to load the other code it needs (compiled local functions), and I think that is a Bad Idea and we can use classloaders properly to avoid it.
Bye, Ale
2009/9/22 Alessio Stalla alessiostalla@gmail.com:
have just described the abcl classloader ;) However, my original point was not about the general compilation model of abcl, which is more than fine by me; rather, it was about the specific fact that the generated code includes instructions to load the other code it needs (compiled local functions), and I think that is a Bad Idea and we can use classloaders properly to avoid it.
How exactly do we emit the instructions to load the other code that the classes need? AFAIK the classloader approach loads referenced classes automatically if they're not already present, without us having to load/autoload anything. Thus the classloader approach may end up being simpler after all.
On Tue, Sep 22, 2009 at 9:56 AM, Ville Voutilainen ville.voutilainen@gmail.com wrote:
2009/9/22 Alessio Stalla alessiostalla@gmail.com:
have just described the abcl classloader ;) However, my original point was not about the general compilation model of abcl, which is more than fine by me; rather, it was about the specific fact that the generated code includes instructions to load the other code it needs (compiled local functions), and I think that is a Bad Idea and we can use classloaders properly to avoid it.
How exactly do we emit the instructions to load the other code that the classes need? AFAIK the classloader approach loads referenced classes automatically if they're not already present, without us having to load/autoload anything. Thus the classloader approach may end up being simpler after all.
That's what I think too, however currently the compiler emits calls to loadCompiledFunction(classname) bypassing the standard classloader machinery.
2009/9/22 Alessio Stalla alessiostalla@gmail.com:
That's what I think too, however currently the compiler emits calls to loadCompiledFunction(classname) bypassing the standard classloader machinery.
That's odd. If we have a classloader present that's capable of loading our fasls, JVM will consult that loader when it needs a class. Therefore there should be no need to emit loadCompiledFunction calls, because the loader will do that automatically?
On Tue, Sep 22, 2009 at 10:32 AM, Ville Voutilainen ville.voutilainen@gmail.com wrote:
2009/9/22 Alessio Stalla alessiostalla@gmail.com:
That's what I think too, however currently the compiler emits calls to loadCompiledFunction(classname) bypassing the standard classloader machinery.
That's odd. If we have a classloader present that's capable of loading our fasls, JVM will consult that loader when it needs a class. Therefore there should be no need to emit loadCompiledFunction calls, because the loader will do that automatically?
No, because IIRC we never ask the JVM for a class, we always do everything on our own :D i.e. there's never a reference to a compiler-generated class name, besides the string passed to loadCompiledFunction. Our classloader is only used to load the main class (top-level function) directly from the byte[], then everything else is loaded by the class itself.
Ale
Alessio Stalla writes:
Exactly, we agree. The point I was trying to make is that the container class must exist, so the compiler must create it.
Or the compiler could create only the code vector and the class loader could create the class.
Good, so you should be able to write a "classloader" that supports creation of an anonymous subclass of "compiledFunction" (of which there might never be any instances) given a single argument of type byte vector.
Anonymous classes don't exist in the JVM, but apart from that, you
(Does this mean that you have to look for names that don't yet exist and worry that later you're going to be asked to create a class of the same name that you made up?)
have just described the abcl classloader ;) However, my original point
That does not need any temporary files, right?
was not about the general compilation model of abcl, which is more than fine by me; rather, it was about the specific fact that the generated code includes instructions to load the other code it needs (compiled local functions), and I think that is a Bad Idea and we can use classloaders properly to avoid it.
I guess you mean that when you call compile the result could be something like a list of pairs of code vectors and generated names and that a class loader for that list (multiple class loader?) could create classes for all of them and allow them to refer to each other as necessary. If in lisp you do (compile (defun f(x) (g x))) when g is not yet defined, what does f compile into? Can you load a class that refers to another class that does not exist? How does that work if the classes are all immutable?
On Tue, Sep 22, 2009 at 9:59 AM, Don Cohen don-sourceforge-xxz@isis.cs3-inc.com wrote:
Alessio Stalla writes:
> Exactly, we agree. The point I was trying to make is that the > container class must exist, so the compiler must create it. Or the compiler could create only the code vector and the class loader could create the class.
Right, that's what happens right now, however in my view the code vector and the class are two representations of the same thing, that is, when the compiler dumps the code vector it is writing the class in binary form.
> > Good, so you should be able to write a "classloader" that supports > > creation of an anonymous subclass of "compiledFunction" (of which > > there might never be any instances) given a single argument of type > > byte vector. > Anonymous classes don't exist in the JVM, but apart from that, you (Does this mean that you have to look for names that don't yet exist and worry that later you're going to be asked to create a class of the same name that you made up?)
Yes, or do like the abcl compiler does - invent class names sufficiently weird that you can safely hope no one will use the same name, and if one actually will, that's his problem, not yours ;)
> have just described the abcl classloader ;) However, my original point That does not need any temporary files, right?
Right, it doesn't.
> was not about the general compilation model of abcl, which is more > than fine by me; rather, it was about the specific fact that the > generated code includes instructions to load the other code it needs > (compiled local functions), and I think that is a Bad Idea and we can > use classloaders properly to avoid it. I guess you mean that when you call compile the result could be something like a list of pairs of code vectors and generated names and that a class loader for that list (multiple class loader?) could create classes for all of them and allow them to refer to each other as necessary.
The point is that classloaders already work like this! When classloader X loads a class, and that class refers to class C, X is asked by the JVM to load C as well. In abcl we are not using this feature, rather we use an ad-hoc loading model that needs temporary files. I propose to replace that completely and adopt the JVM way.
If in lisp you do (compile (defun f(x) (g x))) when g is not yet defined, what does f compile into? Can you load a class that refers to another class that does not exist? How does that work if the classes are all immutable?
Afaik you cannot have forward-referenced classes, or better: you can only until the referring class must be 'resolved' (dynamically linked). I don't know precisely when that happens but surely before instantiation. However note that your example will probably be compiled by abcl into a call like G.getSymbolFunction().execute(x) i.e. G will not be resolved to a class at compile time (else how would you handle redefinition?). However, what I'm talking about is (compile 'f (lambda (x) (flet ((g (k) k)) (g x)))) - here g is known at compile time, and compiled to its own class. What I propose is this: suppose f is compiled to a class named foo41 and g to foo42, then in one of foo41's methods you'll have code like LispObject g = new foo42(). When loading foo41 we'll use our abcl-specific classloader which knows how to resolve foo42 too (because it knows where to read it from memory or from the file system, depending on the type of compilation), and everything magically works (I hope :)
Cheers, Ale
Alessio Stalla writes:
The point is that classloaders already work like this! When classloader X loads a class, and that class refers to class C, X is asked by the JVM to load C as well. In abcl we are not using this feature, rather we use an ad-hoc loading model that needs temporary files. I propose to replace that completely and adopt the JVM way.
Ok, I think I finally understand all you said initially. I'm sorry to have put you to so much trouble explaining it. I hope a few others will benefit from this explanation.
... However note that your example will probably be compiled by abcl into a call like G.getSymbolFunction().execute(x)
Ok, that's a reasonable escape hatch.
i.e. G will not be resolved to a class at compile time (else how would you handle redefinition?). However, what I'm talking about is (compile 'f (lambda (x) (flet ((g (k) k)) (g x)))) - here g is known at compile time, and compiled to its own class.
(I understood that. The question above was a new one that arose in my mind from what you had been describing.)
What I propose is this: suppose f is compiled to a class named foo41 and g to foo42, then in one of foo41's methods you'll have code like LispObject g = new foo42(). When loading foo41 we'll use our abcl-specific classloader which knows how to resolve foo42 too (because it knows where to read it from memory or from the file system, depending on the type of compilation), and everything magically works (I hope :)
Sounds plausible but I can see why you say you hope. Good luck. And thanks for your patience.
On Tue, Sep 22, 2009 at 10:28 AM, Don Cohen don-sourceforge-xxz@isis.cs3-inc.com wrote:
Alessio Stalla writes: > The point is that classloaders already work like this! When > classloader X loads a class, and that class refers to class C, X is > asked by the JVM to load C as well. In abcl we are not using this > feature, rather we use an ad-hoc loading model that needs temporary > files. I propose to replace that completely and adopt the JVM way. Ok, I think I finally understand all you said initially. I'm sorry to have put you to so much trouble explaining it. I hope a few others will benefit from this explanation.
No problem at all, I myself find this explanation much better than the first one :D
> ... However note that your example will probably be > compiled by abcl into a call like G.getSymbolFunction().execute(x) Ok, that's a reasonable escape hatch.
I believe it works like this, but I haven't the sources at hand to check it.
> i.e. G will not be resolved to a class at compile time (else how would > you handle redefinition?). However, what I'm talking about is (compile > 'f (lambda (x) (flet ((g (k) k)) (g x)))) - here g is known at compile > time, and compiled to its own class. (I understood that. The question above was a new one that arose in my mind from what you had been describing.)
> What I propose is this: suppose f is compiled to a class named foo41 > and g to foo42, then in one of foo41's methods you'll have code like > LispObject g = new foo42(). When loading foo41 we'll use our > abcl-specific classloader which knows how to resolve foo42 too > (because it knows where to read it from memory or from the file > system, depending on the type of compilation), and everything > magically works (I hope :) Sounds plausible but I can see why you say you hope. Good luck.
I say hope because I don't believe in magic :D
And thanks for your patience.
I have great patience, however I needn't use it to answer your posts; making things clear for others makes them clearer to me, so it's not a waste (provided the questions are intelligent, and yours were).
Peace, A.
On Tue, Sep 22, 2009 at 2:03 AM, Don Cohen don-sourceforge-xxz@isis.cs3-inc.com wrote:
Matthew D. Swank writes:
> Well in the context of abcl, a Common Lisp implementation that > _targets_ the JVM, perhaps we could have a more fruitful discussion if > you focus on what jvm bytecodes represent: > http://java.sun.com/docs/books/jvms/second_edition/html/VMSpecTOC.doc.html. > > No matter how you are interacting with a class loader, no matter what
I don't understand why you have to interact with a class loader in order to create a compiled function. Perhaps you're saying that the only way to create executable code in JVM is via the class loader. In other words that JVM does not support methods that make a compiled function object out of a byte vector, and doesn't even give you any way to build such a thing. I've looked at the jvm spec a little, but it takes a lot of searching and analysis to know for sure that something like this CANNOT be done.
Maybe you would have an easier time showing it CAN be done by providing a implementation in the context of ABCL?
-- Gaby
Don Cohen writes:
I don't know what you mean by compiling code "to the jvm". Code is anything that can be interpreted by some interpreter and comes in many forms. There are even many different forms of byte code.
The most interesting aspect of ABCL is that it can compile Lisp to JVM in a fashion that allows (fsvo) arbitrary intermixture between Java and Lisp,i.e. Java sees Lisp as Java, Lisp sees Java as Lisp. A natural intermixture, no ffi involved. Hence, ABCL can not fence in itself in a closed world as you seem to suggest.
-T.
There is a little more to the explanation of how ABCL loads its FASLs that should be included for completeness, namely the use of the so-called init-fasl Lisp code. For every compilation unit ("packed FASL") there is a section of Lisp code that is read in by the Java implementation of LOAD that contains forms like:
(SYSTEM:INIT-FASL :VERSION 32) (SETQ SYSTEM:*SOURCE* #P"/Users/evenson/work/abcl/dir.lisp") (SYSTEM:%IN-PACKAGE "CL-USER") (DEFPARAMETER *TEST-PATHNAME* *LOAD-TRUENAME*) (SYSTEM:FSET 'RUN (SYSTEM:LOAD-COMPILED-FUNCTION "dir-1.cls") 71 'NIL NIL) (SYSTEM:FSET 'RUN-2 (SYSTEM:LOAD-COMPILED-FUNCTION "dir-2.cls") 485 'NIL NIL)
Special variables–like *TEST-PATHNAME* here–are declared, the proper IN-PACKAGE forms are emitted, and then forms at the end issues the loadCompiledFunction() calls to load all of the top-level compilands created by compiling "dir.lisp" which are then FSET to the proper functions. It is classes like "dir-1.cls" that may contain further static initializers
Any new load mechanism consisting purely of Java code would have to do the equivalent Java side presumably in a static initializer of some kind. The current implementation rather simplifies what sort of Java classes can exist, namely that they can only represent functions. I suppose the static initializer might be able to feed a string containing these forms to EVAL.
Alessio's musings here are interesting, but I think there is a simpler solution to the problem of removing the need for intermediate temporary files on the filesystem by changing the behavior of loadCompiledFunction() to look for some sort of special variable (like *LOAD-TRUENAME*?) to figure out where to look for the bytes to turn a into a class that can be loaded by the JVM. By "simpler solution" I mean something that can be worked out without drastic impact on the current codebase.
On Tue, Sep 22, 2009 at 3:13 PM, Mark Evenson evenson@panix.com wrote:
There is a little more to the explanation of how ABCL loads its FASLs that should be included for completeness, namely the use of the so-called init-fasl Lisp code. For every compilation unit ("packed FASL") there is a section of Lisp code that is read in by the Java implementation of LOAD that contains forms like:
(SYSTEM:INIT-FASL :VERSION 32) (SETQ SYSTEM:*SOURCE* #P"/Users/evenson/work/abcl/dir.lisp") (SYSTEM:%IN-PACKAGE "CL-USER") (DEFPARAMETER *TEST-PATHNAME* *LOAD-TRUENAME*) (SYSTEM:FSET 'RUN (SYSTEM:LOAD-COMPILED-FUNCTION "dir-1.cls") 71 'NIL NIL) (SYSTEM:FSET 'RUN-2 (SYSTEM:LOAD-COMPILED-FUNCTION "dir-2.cls") 485 'NIL NIL)
Special variables–like *TEST-PATHNAME* here–are declared, the proper IN-PACKAGE forms are emitted, and then forms at the end issues the loadCompiledFunction() calls to load all of the top-level compilands created by compiling "dir.lisp" which are then FSET to the proper functions. It is classes like "dir-1.cls" that may contain further static initializers
Any new load mechanism consisting purely of Java code would have to do the equivalent Java side presumably in a static initializer of some kind. The current implementation rather simplifies what sort of Java classes can exist, namely that they can only represent functions. I suppose the static initializer might be able to feed a string containing these forms to EVAL.
Alessio's musings here are interesting, but I think there is a simpler solution to the problem of removing the need for intermediate temporary files on the filesystem by changing the behavior of loadCompiledFunction() to look for some sort of special variable (like *LOAD-TRUENAME*?) to figure out where to look for the bytes to turn a into a class that can be loaded by the JVM. By "simpler solution" I mean something that can be worked out without drastic impact on the current codebase.
I'd like to point out that I'm not advocating a pure Java solution: the Lisp init-fasl code is perfectly fine and I don't think it should be changed. As for impact on the codebase, I might agree with you; I still need to closely look at the code to really understand the implications of what I'm suggesting. The compiler is a complex piece, maybe the most complex of abcl, and I know and understand only a tiny part of it. I think that, as a personal experiment, I'll start changing the implementation of LOAD-COMPILED-FUNCTION to just do something like (pseudocode) new 'sys::*compiler-classloader-class*'().loadClass("dir-1.cls"), move the current loadCompiledFunction to, say, AbclSystemFileClassLoader.findClass(), and change the compiler to stop emitting loadCompiledFunction calls and refer to the class directly instead. That should preserve the current behavior yet demonstrate my solution. If it turns out to be working and actually simplifying things, I will report back and, as a next step, go on and try to add the ability to work in memory.
Bye, Ale
On 9/22/09 4:52 PM, Alessio Stalla wrote: […]
I think that, as a personal experiment, I'll start changing the implementation of LOAD-COMPILED-FUNCTION to just do something like (pseudocode) new 'sys::*compiler-classloader-class*'().loadClass("dir-1.cls"), move the current loadCompiledFunction to, say, AbclSystemFileClassLoader.findClass(), and change the compiler to stop emitting loadCompiledFunction calls and refer to the class directly instead. That should preserve the current behavior yet demonstrate my solution. If it turns out to be working and actually simplifying things, I will report back and, as a next step, go on and try to add the ability to work in memory.
Great! I think experimentation with the codebase is a great way to proceed, but hopefully you aren't deterred from posting your musings here. I, for one, really appreciate both!
Mark
2009/9/22 Mark Evenson evenson@panix.com:
Great! I think experimentation with the codebase is a great way to proceed, but hopefully you aren't deterred from posting your musings here. I, for one, really appreciate both!
Me too, I'm looking forward to what this experiment leads to. I wonder why abcl does the loadCompiledFunction emission, is it just a remnant of an old design or is there some clever idea behind it?
On Tue, Sep 22, 2009 at 9:15 PM, Ville Voutilainen ville.voutilainen@gmail.com wrote:
2009/9/22 Mark Evenson evenson@panix.com:
Great! I think experimentation with the codebase is a great way to proceed, but hopefully you aren't deterred from posting your musings here. I, for one, really appreciate both!
Me too, I'm looking forward to what this experiment leads to.
I did the experiment, more or less the way I described. I haven't touched the init-fasl code, which still uses load-compiled-function; however loadCompiledFunction itself has been changed to use 2 new classloaders, one for .cls files and one for entries in jar files (it contemplates other 2 cases, zipped .abcl files inside jars and a fallback case which I don't know when it is used, I left them out). The compiler has been changed only in one part, to stop using loadCompiledFunction and directly instantiate the class.
From preliminary testing it appears to work well. It is not optimized
at all (a different classloader is created for every function to load, even if one could be reused), yet it is only slightly slower than before. On my machine compiling abcl takes a few seconds less than 2 minutes, with the changes it takes a few seconds more than 2 minutes. Code complexity imho is not particularly changed, as I basically just moved some code from loadCompiledFunction to the classloaders. The classloaders themselves are not complex, and they could be simplified by moving common things in a base superclass. This design is more modular than before, but not much, because everything is still controlled in one central place, loadCompiledFunction.
A couple of probably uninfluential limitations: - compiled classes are always in the org.armedbear.lisp package. In jar files they appear to be placed inside org/armedbear/lisp, while normal files are not, so the file classloader always removes org.armedbear.lisp. from the class name to obtain the file name. - compiled class files are always assumed by the classloader to end in .cls - some name mangling is necessary on class names and file names as not every lisp symbol is a valid Java class name. I coded the minimal amount of mangling necessary for abcl to compile itself, but there are probably some awkward symbols that can break it.
So, I think I can try to extend my experiment to memory compilation.
I wonder why abcl does the loadCompiledFunction emission, is it just a remnant of an old design or is there some clever idea behind it?
Who knows? Classloaders are a bit tricky, maybe the initial design avoided them for simplicity.
Bye, Alessio
armedbear-devel@common-lisp.net