Vladimir Sedach wrote:
Hello everybody,
I just finished pushing the compiler refactoring patch that I mentioned earlier. The big change is that now the intermediate ParenScript representation is just s-expressions instead of CLOS objects, which provides several benefits: the intermediate code is now much easier to manipulate and to write code walkers for, which should make things like optimization passes easier to write, it is easily inspectable and serializable, which enables easier debugging and unit testing (I plan to add unit tests for the compiler and the printer now that they have been decoupled).
I do not see the benefits of avoiding an object representation of the syntax tree in favor of a flat SEXP representation. What sort of code walkers do you mean? Can we just expose the macro-expanded Parenscript, rather than the `internal representation?' Optimization passes often require additional information about nodes of the syntax tree. With classes/structs, this information can be added as additional slots. Doesn't a transition from objects to SEXPs hinder the ability to attach information to nodes? SLIME makes inspecting CLOS objects straightforward, and a print function can be defined to display syntax nodes in an informative fashion.
SBCL uses structs as an internal representation and some developers would like more OO functionality:
"[aside by WHN: The data representation of IR1 [(Intermediate Representation 1)] was set up before OO design was commonplace and before CLOS was part of the standard, and it shows it. On the other hand, a lot of things in it show good taste and anticipate OO design. On the third hand, a lot of things are done by mutating data structures which could be done much more cleanly by other methods, often simply by initializing something completely at constructor time. So the system could really benefit from some refactoring to take advantage of more modern design ideas (OO, invariants..) and the existence of CLOS. However, since we can't use CLOS to implement the target compiler until we restructure the system so that CLOS is built by the cross-compiler [...] most of those refactorings, even the obvious ones, can't be done as of sbcl-0.pre7.x.]" [1]
We, of course do not have to worry about cross-compilation and bootstrapping and have full CLOS at our disposal. I do not see much reason to avoid it for something like a syntax tree, which has an obvious class/object representation.
On the other hand, SBCL has to deal with the full Common Lisp specification, a multi-stage compilation/optimization pipeline, and multiple architecture and OS targets. For our (currently) very simple compiler, a SEXP representation is clearly sufficient to accomplish Parenscript's modest goals. Optimization and semantic analysis phases, my next concern, are free to build up a different representation from the primitive list format.
The other big change that accompanies this is that the decision whether to produce expression or statement code has been pushed from the printing code to the actual compiling code, which makes the aforementioned code walkers even more practical. This has also had the effect of simplifying the printing interface considerably (from three functions down to one).
Great. The decision to compile to statement/expression is best made earlier in compilation.
The other changes I made are removing the last of the namespace functions, and the compilation environments. In the place of the namespace code is a mechanism that associates Lisp packages with a string prefix: any symbol in that package is printed with that prefix. I think that unless there are requests for further namespace functionality, that's really all that is necessary for most use cases (avoiding name clashes). The compilation environment and toplevel code I have removed because I feel that the eval-when functionality can be provided in a better way by just using Lisp code, and also to simplify the compiler interface (it's now down to one function from three) and implementation (which, although much simpler now, I think still has some room for improvement).
I'll check out these changes and see what I think. Ideally we should still be able to layer a package system on top of the simple symbol package to prefix mapping.
On a more (or less?) controversial note, I have renamed all the foo-script-blah functions to be foopsblah (except for compile-script-form, which is now compile-parenscript-form). I've come to a pretty definite conclusion that it's not only clearer but also saves considerable typing.
That's all right for me.
The next course of action I'm planning is to rewrite the way printing is done to simplify things, and correct the currently wonky indentation (a part of the decision for how to indent blocks was done by the compiler code previously, and of course this is currently gone). Once the rewrite is done, I'm planning to update the documentation, fix the deprecated interface, and then make a release.
How are you planning to change the way printing is done? As far as I remember, DWIM-JOIN is one of the slower parts of the compiler, though I have not profiled in a while.
Happy hacking, Vladimir
Thanks, Red