Hello everybody,
I just finished pushing the compiler refactoring patch that I mentioned earlier. The big change is that now the intermediate ParenScript representation is just s-expressions instead of CLOS objects, which provides several benefits: the intermediate code is now much easier to manipulate and to write code walkers for, which should make things like optimization passes easier to write, it is easily inspectable and serializable, which enables easier debugging and unit testing (I plan to add unit tests for the compiler and the printer now that they have been decoupled). The other big change that accompanies this is that the decision whether to produce expression or statement code has been pushed from the printing code to the actual compiling code, which makes the aforementioned code walkers even more practical. This has also had the effect of simplifying the printing interface considerably (from three functions down to one).
The other changes I made are removing the last of the namespace functions, and the compilation environments. In the place of the namespace code is a mechanism that associates Lisp packages with a string prefix: any symbol in that package is printed with that prefix. I think that unless there are requests for further namespace functionality, that's really all that is necessary for most use cases (avoiding name clashes). The compilation environment and toplevel code I have removed because I feel that the eval-when functionality can be provided in a better way by just using Lisp code, and also to simplify the compiler interface (it's now down to one function from three) and implementation (which, although much simpler now, I think still has some room for improvement).
On a more (or less?) controversial note, I have renamed all the foo-script-blah functions to be foopsblah (except for compile-script-form, which is now compile-parenscript-form). I've come to a pretty definite conclusion that it's not only clearer but also saves considerable typing.
The next course of action I'm planning is to rewrite the way printing is done to simplify things, and correct the currently wonky indentation (a part of the decision for how to indent blocks was done by the compiler code previously, and of course this is currently gone). Once the rewrite is done, I'm planning to update the documentation, fix the deprecated interface, and then make a release.
Happy hacking, Vladimir
Vladimir Sedach wrote:
Hello everybody,
I just finished pushing the compiler refactoring patch that I mentioned earlier. The big change is that now the intermediate ParenScript representation is just s-expressions instead of CLOS objects, which provides several benefits: the intermediate code is now much easier to manipulate and to write code walkers for, which should make things like optimization passes easier to write, it is easily inspectable and serializable, which enables easier debugging and unit testing (I plan to add unit tests for the compiler and the printer now that they have been decoupled).
I do not see the benefits of avoiding an object representation of the syntax tree in favor of a flat SEXP representation. What sort of code walkers do you mean? Can we just expose the macro-expanded Parenscript, rather than the `internal representation?' Optimization passes often require additional information about nodes of the syntax tree. With classes/structs, this information can be added as additional slots. Doesn't a transition from objects to SEXPs hinder the ability to attach information to nodes? SLIME makes inspecting CLOS objects straightforward, and a print function can be defined to display syntax nodes in an informative fashion.
SBCL uses structs as an internal representation and some developers would like more OO functionality:
"[aside by WHN: The data representation of IR1 [(Intermediate Representation 1)] was set up before OO design was commonplace and before CLOS was part of the standard, and it shows it. On the other hand, a lot of things in it show good taste and anticipate OO design. On the third hand, a lot of things are done by mutating data structures which could be done much more cleanly by other methods, often simply by initializing something completely at constructor time. So the system could really benefit from some refactoring to take advantage of more modern design ideas (OO, invariants..) and the existence of CLOS. However, since we can't use CLOS to implement the target compiler until we restructure the system so that CLOS is built by the cross-compiler [...] most of those refactorings, even the obvious ones, can't be done as of sbcl-0.pre7.x.]" [1]
We, of course do not have to worry about cross-compilation and bootstrapping and have full CLOS at our disposal. I do not see much reason to avoid it for something like a syntax tree, which has an obvious class/object representation.
On the other hand, SBCL has to deal with the full Common Lisp specification, a multi-stage compilation/optimization pipeline, and multiple architecture and OS targets. For our (currently) very simple compiler, a SEXP representation is clearly sufficient to accomplish Parenscript's modest goals. Optimization and semantic analysis phases, my next concern, are free to build up a different representation from the primitive list format.
The other big change that accompanies this is that the decision whether to produce expression or statement code has been pushed from the printing code to the actual compiling code, which makes the aforementioned code walkers even more practical. This has also had the effect of simplifying the printing interface considerably (from three functions down to one).
Great. The decision to compile to statement/expression is best made earlier in compilation.
The other changes I made are removing the last of the namespace functions, and the compilation environments. In the place of the namespace code is a mechanism that associates Lisp packages with a string prefix: any symbol in that package is printed with that prefix. I think that unless there are requests for further namespace functionality, that's really all that is necessary for most use cases (avoiding name clashes). The compilation environment and toplevel code I have removed because I feel that the eval-when functionality can be provided in a better way by just using Lisp code, and also to simplify the compiler interface (it's now down to one function from three) and implementation (which, although much simpler now, I think still has some room for improvement).
I'll check out these changes and see what I think. Ideally we should still be able to layer a package system on top of the simple symbol package to prefix mapping.
On a more (or less?) controversial note, I have renamed all the foo-script-blah functions to be foopsblah (except for compile-script-form, which is now compile-parenscript-form). I've come to a pretty definite conclusion that it's not only clearer but also saves considerable typing.
That's all right for me.
The next course of action I'm planning is to rewrite the way printing is done to simplify things, and correct the currently wonky indentation (a part of the decision for how to indent blocks was done by the compiler code previously, and of course this is currently gone). Once the rewrite is done, I'm planning to update the documentation, fix the deprecated interface, and then make a release.
How are you planning to change the way printing is done? As far as I remember, DWIM-JOIN is one of the slower parts of the compiler, though I have not profiled in a while.
Happy hacking, Vladimir
Thanks, Red
Hello Red,
I do not see the benefits of avoiding an object representation of the syntax tree in favor of a flat SEXP representation.
I never said the s-exp representation was flat. It too is a tree.
What sort of code walkers do you mean? Can we just expose the macro-expanded Parenscript, rather than the `internal representation?'
I mean it is much easier to build recursive s-exp walkers than it is to build ones for a bunch of arbitrary CLOS objects, since in the latter case you have to know about their slots. If we want to build post-processing stages for ParenScript code, that will make it easier for us. The thing about the CLOS representation of ParenScript is that the different classes were really arbitrary, in the sense that inheritance could not capture the different slots they needed to have.
Optimization passes often require additional information about nodes of the syntax tree. With classes/structs, this information can be added as additional slots. Doesn't a transition from objects to SEXPs hinder the ability to attach information to nodes?
Actually it makes it easier, since now the format of the ParenScript intermediate representation is specified in two places (the special form definitions and the printer), whereas before it was specified explicitly once in the class definitions, and then twice in the special form definitions and the printer, implicitly by its usage. Since the printer interface now destructures the s-expressions with destructuring-bind (another big win for conciseness), adding additional information that the printer doesn't care about is as simple as declaring &allow-other-keys.
SLIME makes inspecting CLOS objects straightforward, and a print function can be defined to display syntax nodes in an informative fashion.
I don't think the SLIME inspector is any good for working with deeply-nested objects, which is what the ParenScript intermediate representation was. Yes, it's possible to define our own readers and printers for the CLOS representation, but given that it didn't provide any benefits (in reality, it bloated code size and introduced bugs, since inheritance allowed some operations to take place which didn't make sense), why bother?
SBCL uses structs as an internal representation and some developers would like more OO functionality:
I would really like to avoid making the ParenScript compiler as complicated and hairy as Python.
...a lot of things are done by mutating data structures which could be done much more cleanly by other methods...
...such as not mutating data structures, for which s-expressions happen to be an excellent fit. I really don't see any reason why the ParenScript compiler should be built in any way except for purely applicative (sans the macro definition state, of course).
I do not see much reason to avoid it for something like a syntax tree, which has an obvious class/object representation.
How about the fact that syntax trees have an even more obvious s-expression representation? Come on, we're programming in Lisp here! :)
For our (currently) very simple compiler, a SEXP representation is clearly sufficient to accomplish Parenscript's modest goals. Optimization and semantic analysis phases, my next concern, are free to build up a different representation from the primitive list format.
I would love to keep our compiler very simple and our formats very primitive.
Great. The decision to compile to statement/expression is best made earlier in compilation.
In particular, it's now up to the special forms themselves.
I'll check out these changes and see what I think. Ideally we should still be able to layer a package system on top of the simple symbol package to prefix mapping.
Can we keep these changes orthogonal to the compiler? Ideally, I would like the compiler to provide an interface that any namespace or code management systems would use in a modular way.
How are you planning to change the way printing is done? As far as I remember, DWIM-JOIN is one of the slower parts of the compiler, though I have not profiled in a while.
Ideally, I want to get rid of dwim-join and print directly to a stream instead of doing a ton of string consing.
Happy hacking, Vladimir
Happy hacking, Vladimir
Thanks, Red
[1] http://sbcl-internals.cliki.net/IR1 _______________________________________________ parenscript-devel mailing list parenscript-devel@common-lisp.net http://common-lisp.net/cgi-bin/mailman/listinfo/parenscript-devel
I mean it is much easier to build recursive s-exp walkers than it is to build ones for a bunch of arbitrary CLOS objects, since in the latter case you have to know about their slots. If we want to build post-processing stages for ParenScript code, that will make it easier for us. The thing about the CLOS representation of ParenScript is that
it's very rare that you want to walk an AST in such a way that you want to visit all its nodes without caring about their types and/or the structure above them. and then you will do the same with SEXP as with CLOS objects, only dealing with positions in list instead of slot names which is much cleaner/readable imho.
but yes, the class hierarchy of the parenscript AST is far from ideal and the dwim-join function is slow and frightening.
it's a completly different story, but i think many people are overusing lists in lisp. it's very rare that a tree/list of cons cells is what you really need (thinking of the common push/nreverse idiom and such things). most of the time at least vectors would be better, apart from the fact that the syntax of standard cl favours lists and the definition of sequences is half-assed in it.
but i only care about the speed of the compiler and i'm mostly a user...
so these are just my 0.02's
parenscript-devel@common-lisp.net