The spec allows functions in a fasl to hard code references to each other, except when declared not-inline. Currently, we take advantage of that fact only a little bit: the only bit we eliminate is the function lookup from the symbol. Anything else works just like function calls which are not in the same file. The mail below comes from my long-standing desire to take better advantage of the room offered by the spec.
What things would I like to improve on?
1. We only inline backward referenced functions in the same fasl 2. We parse the argument list of each function call with respect to keywords and optional parameters 3. We store the names of symbols referenced in many .cls files each of which gets separately zipped 4. We store (cache) function references to inlined functions as objects
What's the basic idea?
The idea is to shift paradigms: instead of compiling each Lisp function into a class, we will start compiling each lisp function to a Java function. A FASLs would become one or more classes -- depending on if everything fits into a single class or not. The idea is by the way not to change the way we model functions in symbol function slots, but instead to change the way everything is stored in a fasl.
Does it address the items mentioned above?
1. Since class files are written once the class is complete, we have to delay serializing the lisp functions to disk until they're all known. This means that we will know all functions defined in a fasl before it's serialized. With an appropriate linker phase, we could easily resolve all forward referenced function calls.
2. Because lisp functions have become Java functions, we can't directly expose them to the Lisp world anymore. This basically creates the option to have an 'internal function signature' and an 'external function signature'. SBCL has this: XEPs -- eXternal Entry Points. These entry points into a function sort the arguments into the right order before calling the internal entry point. Code which is compiled into the same fasl pre-sorts the arguments at compile time (when possible) and calls the internal entry point -- eliminating the need to sort keyword parameters.
3. By having a single (or fewer) class files, we can achieve higher re-use ratios of the same constants which would otherwise be included in many-many .cls files.
4. Because function calls to 'inline' functions will become calls to sibling methods, they will become clear candidates for inlining to the JIT. With the object references, it is not clear this is an option.
What I think needs to happen to get this designed:
a. The file compiler and function compiler should be based on a shared 'method compiler' which takes its context from the caller instead of the existing way of basing the file compiler on the function compiler.
b. We need to implement a linker phase separate from the pass2 phase, which can re-order arguments and call sibling methods instead of Lisp Function objects.
c. We need to find a way to correctly handle the interaction between the successive IN-PACKAGE, DEFPACKAGE, EVAL-WHEN, etc, forms appearing in the input file and the initialization of fields in the resulting class file.
d. We need a way to expose the external entry points to the lisp world.
e. We need to find a way to split fasls over multiple class files.
Comments?
Bye,
Erik.
On Sat, Jan 21, 2012 at 9:55 PM, Erik Huelsmann ehuels@gmail.com wrote:
The spec allows functions in a fasl to hard code references to each other, except when declared not-inline. Currently, we take advantage of that fact only a little bit: the only bit we eliminate is the function lookup from the symbol. Anything else works just like function calls which are not in the same file. The mail below comes from my long-standing desire to take better advantage of the room offered by the spec.
Thanks a lot for thinking thoroughly about this. I have only a few minor comments/questions.
What things would I like to improve on?
- We only inline backward referenced functions in the same fasl
- We parse the argument list of each function call with respect to
keywords and optional parameters 3. We store the names of symbols referenced in many .cls files each of which gets separately zipped 4. We store (cache) function references to inlined functions as objects
What's the basic idea?
The idea is to shift paradigms: instead of compiling each Lisp function into a class, we will start compiling each lisp function to a Java function. A FASLs would become one or more classes -- depending on if everything fits into a single class or not. The idea is by the way not to change the way we model functions in symbol function slots, but instead to change the way everything is stored in a fasl.
Does it address the items mentioned above?
- Since class files are written once the class is complete, we have
to delay serializing the lisp functions to disk until they're all known. This means that we will know all functions defined in a fasl before it's serialized. With an appropriate linker phase, we could easily resolve all forward referenced function calls.
- Because lisp functions have become Java functions, we can't
directly expose them to the Lisp world anymore. This basically creates the option to have an 'internal function signature' and an 'external function signature'. SBCL has this: XEPs -- eXternal Entry Points. These entry points into a function sort the arguments into the right order before calling the internal entry point. Code which is compiled into the same fasl pre-sorts the arguments at compile time (when possible) and calls the internal entry point -- eliminating the need to sort keyword parameters.
That's awesome. To clarify, do you intend these XEPs to be method overloads in the same FASL-class?
- By having a single (or fewer) class files, we can achieve higher
re-use ratios of the same constants which would otherwise be included in many-many .cls files.
- Because function calls to 'inline' functions will become calls to
sibling methods, they will become clear candidates for inlining to the JIT. With the object references, it is not clear this is an option.
That's a very good point. Currently by always passing through getSymbolFunction() we basically disable JIT across Lisp functions. Invokedynamic would help here, but it's not behind the corner. Your proposal is very effective.
What I think needs to happen to get this designed:
a. The file compiler and function compiler should be based on a shared 'method compiler' which takes its context from the caller instead of the existing way of basing the file compiler on the function compiler.
b. We need to implement a linker phase separate from the pass2 phase, which can re-order arguments and call sibling methods instead of Lisp Function objects.
c. We need to find a way to correctly handle the interaction between the successive IN-PACKAGE, DEFPACKAGE, EVAL-WHEN, etc, forms appearing in the input file and the initialization of fields in the resulting class file.
Off the top of my head (but the standard might imply otherwise) the only problem is IN-PACKAGE (and DEFPACKAGE) and that can be solved by 1) adding a static initializer to the fasl-class that pre-installs all the packages and 2) serializing all symbols explicitly as package::symbol. Everything else (mostly EVAL-WHEN combined with stuff that affects the reader) is handled by isolating the reader used to parse serialized stuff in the class from the reader used to read forms in the fasl, and I believe it's already like that - isn't it?
d. We need a way to expose the external entry points to the lisp world.
We could lazily construct (by generating bytecode at runtime) a LispFunction instance that calls the right method. It would be generated the first time someone calls, or otherwise references, a function in the fasl, and then associated with the symbol as its symbol-function. Hopefully these runtime-generated classes will be in much smaller numbers than if we compiled each and every function to a separate class. Additionally with invokedynamic the LispFunction will only need to be generated when the function is reified (basically the first time someone directly reads a symbol's function slot), while regular function calls could be directed to the target method.
e. We need to find a way to split fasls over multiple class files.
Comments?
Bye,
Erik.
Cheers, Alessio
- Because lisp functions have become Java functions, we can't
directly expose them to the Lisp world anymore. This basically creates the option to have an 'internal function signature' and an 'external function signature'. SBCL has this: XEPs -- eXternal Entry Points. These entry points into a function sort the arguments into the right order before calling the internal entry point. Code which is compiled into the same fasl pre-sorts the arguments at compile time (when possible) and calls the internal entry point -- eliminating the need to sort keyword parameters.
That's awesome. To clarify, do you intend these XEPs to be method overloads in the same FASL-class?
Yes: the XEPs would be methods accepting an array argument, just like we accept now. The XEP would use the Closure.processArgs() code to figure out the optional and keyword arguments, just like today. Unlike today, the XEP would then forward the call unpacking the argument array into individual parameters and call the internal entry point.
[...]
c. We need to find a way to correctly handle the interaction between the successive IN-PACKAGE, DEFPACKAGE, EVAL-WHEN, etc, forms appearing in the input file and the initialization of fields in the resulting class file.
Off the top of my head (but the standard might imply otherwise) the only problem is IN-PACKAGE (and DEFPACKAGE) and that can be solved by
- adding a static initializer to the fasl-class that pre-installs all
the packages and 2) serializing all symbols explicitly as package::symbol. Everything else (mostly EVAL-WHEN combined with stuff that affects the reader) is handled by isolating the reader used to parse serialized stuff in the class from the reader used to read forms in the fasl, and I believe it's already like that - isn't it?
Yes, the reader fasl reader is indeed separated from the compile-time reader. One thing that got us in the past when serializing symbols was EXPORT, which was first evaluated and then serialized. When reading the resulting fasl, the export command tries to export a non-existing symbol.
However, serializing all symbols with their package prefix breaks a use case I know of to be actually in use - which may not be explicitly supported by the CLHS: when loading our current fasls while having a different current package than during compilation, we simply load everything into the new current package. This is because the preconditions before loading are assumed to be the same as the preconditions before compilation.
What I'm thinking now is that I might try to compile the top level forms into an initialization function in the class. That initialization function could interleave class field initialization with "lisp world initialization". That way, I think, we should be able to get our dependencies ordered correctly.
d. We need a way to expose the external entry points to the lisp world.
We could lazily construct (by generating bytecode at runtime) a LispFunction instance that calls the right method. It would be generated the first time someone calls, or otherwise references, a function in the fasl, and then associated with the symbol as its symbol-function. Hopefully these runtime-generated classes will be in much smaller numbers than if we compiled each and every function to a separate class. Additionally with invokedynamic the LispFunction will only need to be generated when the function is reified (basically the first time someone directly reads a symbol's function slot), while regular function calls could be directed to the target method.
This is one method. The other - which might have roughly equal performance characteristics - that I was thinking about is to create a class which uses introspection to look up the right method to call. The initial introspection lookup has bad performance drawbacks. However, after the initial lookup, I've been told, the performance drawbacks have been fixed for quite some time (Java 1.4?).
That way, we would not even need to increase the number of classes in the system when the number of entry points into the fasl grows. The same approach could apply when a function reference is to be returned: the function reference could simply be an encapsulating function object.
Thanks for your comments!
Bye,
Erik.
Thinking some more about my own proposal:
[ .. ]
What I think needs to happen to get this designed:
Well, I found that it will probably all work, with one exception - to which I'd love to hear your comments:
Currently, all functions are class instances. For normal functions we don't require that, because the class itself doesn't store more than the parsed function arglist. I have an idea how to handle that situation though. Anyway, that's really globally applicable data and doesn't require instance allocation in case of compiled classes.
The problem arises when a compiled class wants to allocate instance data: i.e. when it wants to be a closure. Closures store a 'context' record in their instance. Since my idea revolves around creating static functions which call each other, this is an issue.
One thing I can come up with is: make the outer-most entry point a static function, make it allocate the context record (an array of ClosureBindings) and pass that around between the static functions as the first argument. This solves the common case for closures, like this one:
(defun foo (a-list) (mapcar #'(lambda (x) (member x a-list)) '(value-1 value-2 ...)))
The closure above can't be entered through any function which requires a context record to exist on the first call. However, there are cases (even in our own code base) where there are no entry points without a context record:
(let (something) (defun foo () (setf something 'a))
(defun goo () something))
(defun moo () (goo))
Compiling a direct call from moo to goo can't be a plain static call - it'll need the context record created at function definition time. Anybody who has any ideas how to solve this case?
Thanks for your comments!
Bye,
Erik.
On Wed, Jan 25, 2012 at 11:05 PM, Erik Huelsmann ehuels@gmail.com wrote:
Thinking some more about my own proposal:
The closure above can't be entered through any function which requires a context record to exist on the first call. However, there are cases (even in our own code base) where there are no entry points without a context record:
(let (something) (defun foo () (setf something 'a))
(defun goo () something))
(defun moo () (goo))
Compiling a direct call from moo to goo can't be a plain static call - it'll need the context record created at function definition time. Anybody who has any ideas how to solve this case?
Following up on myself: the call from MOO to GOO will go through the symbol function because the DEFUNs for FOO and GOO aren't toplevel forms, which means the functions don't get recorded as 'defined in the same file'. When we set up the FOO and GOO functions correctly, they'll have associated context records.
Due to the way our compiler works, MOO *is* registered as a 'function defined in the same file' which means that if MOO were defined before FOO and GOO and either would call MOO, that call would simply be a static function call. (Still get it? :-)
Concluding: I can simply go forward in the direction that I described in my orginal mail; the "DEFUNs inside closure" issue raised above will 'solve itself'.
Bye,
Erik.
armedbear-devel@common-lisp.net