How hard would it be to do this in LLVM, I wonder?

i've done that already, and with LLVM theĀ self-hosting is about 3000 kLoC, although i'm looking into new features that would hopefully allow shrinking/simplifying the codebase back a little. mostly by separating the code that is strictly needed for the bootstrap process from the ever-growing set up libs and utils that get added for other tests and actual uses of the language.

most of the extra lines are needed because the LLVM IR has a strict type system that had to be accommodated when compiling e.g. FFI calls and such.