dear list,
some of you may already know about Maru, Ian Piumarta's tiny lisp that can self-host in about 2000 LoC through x86 asm. it was developed as part of Alan Kay's VPRI-FoNC project, and then was abandoned in a state that is more of a proof of concept than anything meant to be production ready.
https://github.com/attila-lendvai/maru
i started hacking on it mostly for fun, and made some substantial improvements, mostly driven by turning it into something that is easier to read and understand, and also to make it more aesthetically pleasing to me.
the reason i'm writing this mail is to get some feedback on whether the discussion about some lisp related linguistic features and implementation strategies are welcome here on this list... so, are they?
and if it's not, then what would be a good forum for in-depth discussion of lisp related implementation strategies.
How hard would it be to do this in LLVM, I wonder?
—Scott
On Oct 5, 2020, at 6:06 AM, Attila Lendvai attila@lendvai.name wrote:
dear list,
some of you may already know about Maru, Ian Piumarta's tiny lisp that can self-host in about 2000 LoC through x86 asm. it was developed as part of Alan Kay's VPRI-FoNC project, and then was abandoned in a state that is more of a proof of concept than anything meant to be production ready.
https://github.com/attila-lendvai/maru
i started hacking on it mostly for fun, and made some substantial improvements, mostly driven by turning it into something that is easier to read and understand, and also to make it more aesthetically pleasing to me.
the reason i'm writing this mail is to get some feedback on whether the discussion about some lisp related linguistic features and implementation strategies are welcome here on this list... so, are they?
and if it's not, then what would be a good forum for in-depth discussion of lisp related implementation strategies.
-- • attila lendvai • PGP: 963F 5D5F 45C7 DFCD 0A39 --
Good idea. As the world shifts away from x86.
On Oct 5, 2020, at 6:38 AM, Scott McKay swmckay@gmail.com wrote:
How hard would it be to do this in LLVM, I wonder?
—Scott
On Oct 5, 2020, at 6:06 AM, Attila Lendvai attila@lendvai.name wrote:
dear list,
some of you may already know about Maru, Ian Piumarta's tiny lisp that can self-host in about 2000 LoC through x86 asm. it was developed as part of Alan Kay's VPRI-FoNC project, and then was abandoned in a state that is more of a proof of concept than anything meant to be production ready.
https://github.com/attila-lendvai/maru https://github.com/attila-lendvai/maru
i started hacking on it mostly for fun, and made some substantial improvements, mostly driven by turning it into something that is easier to read and understand, and also to make it more aesthetically pleasing to me.
the reason i'm writing this mail is to get some feedback on whether the discussion about some lisp related linguistic features and implementation strategies are welcome here on this list... so, are they?
and if it's not, then what would be a good forum for in-depth discussion of lisp related implementation strategies.
-- • attila lendvai • PGP: 963F 5D5F 45C7 DFCD 0A39 --
How hard would it be to do this in LLVM, I wonder?
i've done that already, and with LLVM the self-hosting is about 3000 kLoC, although i'm looking into new features that would hopefully allow shrinking/simplifying the codebase back a little. mostly by separating the code that is strictly needed for the bootstrap process from the ever-growing set up libs and utils that get added for other tests and actual uses of the language.
most of the extra lines are needed because the LLVM IR has a strict type system that had to be accommodated when compiling e.g. FFI calls and such.
Nice work. It's worth the extra length in order to buy correctness and portability that you yourself don't have to maintain.
Lisp is the simplest language to bootstrap for sure. Would be nice to get everything out of the bootstrapper that isn't the core Lisp language. Of course, it's probably worth having handcrafted LLVM code for the primitives that greatly affect performance.
—S
On Mon, Oct 5, 2020 at 7:40 AM Attila Lendvai attila@lendvai.name wrote:
How hard would it be to do this in LLVM, I wonder?
i've done that already, and with LLVM the self-hosting is about 3000 kLoC, although i'm looking into new features that would hopefully allow shrinking/simplifying the codebase back a little. mostly by separating the code that is strictly needed for the bootstrap process from the ever-growing set up libs and utils that get added for other tests and actual uses of the language.
most of the extra lines are needed because the LLVM IR has a strict type system that had to be accommodated when compiling e.g. FFI calls and such.
Lisp is the simplest language to bootstrap for sure.
i think stack languages are even simpler to bootstrap than lisps, and modern stack languages are not that much less comfortable.
a friend of mine is actually exploring that here: https://github.com/nagydani/seedling/
and by simple i mean that he's doing it using a ZX spectrum emulator as an artificial size/speed constraint.
we are exchanging ideas as we make progress, and when seedling will be ready i'll also bootstrap Maru off of that low-level stack machine.
you can think of (the bottom language of) seedling as the new C in the sense that it can function as the first layer, or the lowest common denominator on top of the hw to build upon, and can trivially reproduce itself onto a new hw.
out of the bootstrapper that isn't the core Lisp language. Of course, it's
probably worth having handcrafted LLVM code for the primitives that greatly affect performance.
i've admittedly haven't spent much time optimizing stuff, but i also haven't found myself needing to write LLVM/asm code to optimize anything. i'm much more progressing towards handling everything from lisp, from the handling of tagged immediate values to the nifty details of the GC. and that code hardly ever needs more than pointer dereference and arithmetic operations, and are already mapped pretty directly to LLVM/asm instructions. i have macros after all that can expand to the low-level primitives.
a big question of mine currently is hygienic macros and modules, and their interaction. more specifically: whether modules should have separate symbol tables, or only separate bindings together with a global symbol table. i'm leaning towards the latter because it makes many things simpler and independent (e.g. the reader doesn't need to know about modules)... and i think the implementation of hygienic macros should be a user library anyway that doesn't rely on isolated symbol packages, but rather on parsing the code templates and implementing the isolation on the AST level (as opposed to the lower level of symbol equality).
my implementation of modules is informed by this paper:
Submodules in Racket - You Want it When, Again? by Matthew Flatt https://www.cs.utah.edu/plt/publications/gpce13-f-color.pdf