Hello Dima,
some bottlenecks are known and I'm working on these. The worst offenders:
- the generic function dispatch is slow
I have implemented the computation part of the fast generic function dispatch as proposed by Robert Strandh, but it needs to be integrated with the C compiler
- there is no type inference (only the type propagation)
ECL can go really fast when it knows about its types. I have plans for that, but some extra work needs to be done first
Both things hang on refactoring the compiler (that task is pending, you may see cmpc-separation branch), so it will be easier to work with the intermediate representation and experiment with backends. There are also other motivations for this refactor.
There is also the fact that ECL /compilation/ time is very slow. Currently there is not much we can do about this, because most of the time is spend in GCC (so nothing to optimize for us).
Another problem is FASL loading - when ECL loads a fasl then it replays necessary side-effects and that is time consuming (you may notice this for example when you REQUIRE ASDF). This is not much of the problem in itself, but said side-effects need to be replayed even when we build an executable, so the startup suffers. Other implementations hide that startup time by dumping images, where all side effects are already present.
Also, if you are not using the C compiler (i.e only the bytecode), then the result is not optimized at all - the bytecodes compiler performs only the minimal compilation.
All that said, when both fast gf and type inference are implemented, I will try to identify further bottlenecks if things still doesn't look good.
None of these possible improvements will be part of the upcoming release. We are currently in the testing phase (not thanks to me, I'm disappointingly not very active on this front at the moment - sorry Marius!).
Here are a few hints that will help you to produce better optimized code:
- avoid generic functions
- declare types wherever feasible
- lower safety to 1, raise speed to 3 (don't use safety 0, there are known bugs)
There are also more mundane ways to improve the performance:
- inline partial dispatch tables for arithmetic operators
- work harder on IR to optimize it (using SSA and adding more passes is pending™)
Best regards,
Daniel
p.s we should also introduce more immediate types on 64bit platforms - we are currently using only two available bits for tagging while we could use three, but I'm not working on that at the moment - single-float could be unboxed in that case
--
Daniel Kochmański ;; aka jackdaniel | Przemyśl, Poland
TurtleWare - Daniel Kochmański | www.turtleware.eu
"Be the change that you wish to see in the world." - Mahatma Gandhi
------- Original Message -------
On Friday, July 14th, 2023 at 2:19 PM, Dima Pasechnik <dimpase+ecl(a)gmail.com> wrote:
> It's well-known that ECL-compiled CL projects are considerably slower
> than ones where SBCL is used. Examples are e.g. Maxima, FriCAS - there
> speed might be few times (sic!) slower.
>
> Is there an effort to find out bottlenecks, or is it known where these
> bottlenecks are?
>
> Best,
> Dima