Hello Dima,
some bottlenecks are known and I'm working on these. The worst offenders:
- the generic function dispatch is slow
I have implemented the computation part of the fast generic function dispatch as proposed by Robert Strandh, but it needs to be integrated with the C compiler
- there is no type inference (only the type propagation)
ECL can go really fast when it knows about its types. I have plans for that, but some extra work needs to be done first
Both things hang on refactoring the compiler (that task is pending, you may see cmpc-separation branch), so it will be easier to work with the intermediate representation and experiment with backends. There are also other motivations for this refactor.
There is also the fact that ECL /compilation/ time is very slow. Currently there is not much we can do about this, because most of the time is spend in GCC (so nothing to optimize for us).
Another problem is FASL loading - when ECL loads a fasl then it replays necessary side-effects and that is time consuming (you may notice this for example when you REQUIRE ASDF). This is not much of the problem in itself, but said side-effects need to be replayed even when we build an executable, so the startup suffers. Other implementations hide that startup time by dumping images, where all side effects are already present.
Also, if you are not using the C compiler (i.e only the bytecode), then the result is not optimized at all - the bytecodes compiler performs only the minimal compilation.
All that said, when both fast gf and type inference are implemented, I will try to identify further bottlenecks if things still doesn't look good.
None of these possible improvements will be part of the upcoming release. We are currently in the testing phase (not thanks to me, I'm disappointingly not very active on this front at the moment - sorry Marius!).
Here are a few hints that will help you to produce better optimized code: - avoid generic functions - declare types wherever feasible - lower safety to 1, raise speed to 3 (don't use safety 0, there are known bugs)
There are also more mundane ways to improve the performance: - inline partial dispatch tables for arithmetic operators - work harder on IR to optimize it (using SSA and adding more passes is pending™)
Best regards, Daniel
p.s we should also introduce more immediate types on 64bit platforms - we are currently using only two available bits for tagging while we could use three, but I'm not working on that at the moment - single-float could be unboxed in that case
-- Daniel Kochmański ;; aka jackdaniel | Przemyśl, Poland TurtleWare - Daniel Kochmański | www.turtleware.eu
"Be the change that you wish to see in the world." - Mahatma Gandhi
------- Original Message ------- On Friday, July 14th, 2023 at 2:19 PM, Dima Pasechnik dimpase+ecl@gmail.com wrote:
It's well-known that ECL-compiled CL projects are considerably slower than ones where SBCL is used. Examples are e.g. Maxima, FriCAS - there speed might be few times (sic!) slower.
Is there an effort to find out bottlenecks, or is it known where these bottlenecks are?
Best, Dima
As a side note, and I have not attempted that and that would require further investigation, be we could incorporate the concept of sealed domains as presented by Marco Heisig in the library fast-generic-functions. We already have the concept of sealing classes, but FGF allows to partially seal a generic function. That could do wonders for inlining of effective methods specialized on system classes.
-- Daniel Kochmański ;; aka jackdaniel | Przemyśl, Poland TurtleWare - Daniel Kochmański | www.turtleware.eu
"Be the change that you wish to see in the world." - Mahatma Gandhi
------- Original Message ------- On Friday, July 14th, 2023 at 2:37 PM, Daniel Kochmański daniel@turtleware.eu wrote:
Hello Dima,
some bottlenecks are known and I'm working on these. The worst offenders:
- the generic function dispatch is slow
I have implemented the computation part of the fast generic function dispatch as proposed by Robert Strandh, but it needs to be integrated with the C compiler
- there is no type inference (only the type propagation)
ECL can go really fast when it knows about its types. I have plans for that, but some extra work needs to be done first
Both things hang on refactoring the compiler (that task is pending, you may see cmpc-separation branch), so it will be easier to work with the intermediate representation and experiment with backends. There are also other motivations for this refactor.
There is also the fact that ECL /compilation/ time is very slow. Currently there is not much we can do about this, because most of the time is spend in GCC (so nothing to optimize for us).
Another problem is FASL loading - when ECL loads a fasl then it replays necessary side-effects and that is time consuming (you may notice this for example when you REQUIRE ASDF). This is not much of the problem in itself, but said side-effects need to be replayed even when we build an executable, so the startup suffers. Other implementations hide that startup time by dumping images, where all side effects are already present.
Also, if you are not using the C compiler (i.e only the bytecode), then the result is not optimized at all - the bytecodes compiler performs only the minimal compilation.
All that said, when both fast gf and type inference are implemented, I will try to identify further bottlenecks if things still doesn't look good.
None of these possible improvements will be part of the upcoming release. We are currently in the testing phase (not thanks to me, I'm disappointingly not very active on this front at the moment - sorry Marius!).
Here are a few hints that will help you to produce better optimized code:
- avoid generic functions
- declare types wherever feasible
- lower safety to 1, raise speed to 3 (don't use safety 0, there are known bugs)
There are also more mundane ways to improve the performance:
- inline partial dispatch tables for arithmetic operators
- work harder on IR to optimize it (using SSA and adding more passes is pending™)
Best regards, Daniel
p.s we should also introduce more immediate types on 64bit platforms - we are currently using only two available bits for tagging while we could use three, but I'm not working on that at the moment - single-float could be unboxed in that case
-- Daniel Kochmański ;; aka jackdaniel | Przemyśl, Poland TurtleWare - Daniel Kochmański | www.turtleware.eu
"Be the change that you wish to see in the world." - Mahatma Gandhi
------- Original Message ------- On Friday, July 14th, 2023 at 2:19 PM, Dima Pasechnik dimpase+ecl@gmail.com wrote:
It's well-known that ECL-compiled CL projects are considerably slower than ones where SBCL is used. Examples are e.g. Maxima, FriCAS - there speed might be few times (sic!) slower.
Is there an effort to find out bottlenecks, or is it known where these bottlenecks are?
Best, Dima
It's well-known that ECL-compiled CL projects are considerably slower than ones where SBCL is used. Examples are e.g. Maxima, FriCAS - there speed might be few times (sic!) slower.
Is there an effort to find out bottlenecks, or is it known where these bottlenecks are?
Best, Dima
Dear Dima,
it is also worth noting that the next release will already bring some improvements. Maxima's test suite runs about 10% faster on the release candidate than on the last release.
But I also have to add that in my experience of using sagemath (in particular the manifolds package) the largest opportunities for speed improvements usually come from improvements in sagemath itself. The fastest code is the one which you don't have to run at all and at this point there is in my experience still a lot of improvement possible in sagemath through smarter algorithms. I am less familiar with Maxima or FriCAS but I would not be surprised if the same was not true for them as well. Although speeding up ECL is surely important as well and there is some ongoing work doing that, this is probably not the most efficient way to speed get sagemath to run faster.
Best regards,
Marius
Am 14.07.23 um 14:19 schrieb Dima Pasechnik:
It's well-known that ECL-compiled CL projects are considerably slower than ones where SBCL is used. Examples are e.g. Maxima, FriCAS - there speed might be few times (sic!) slower.
Is there an effort to find out bottlenecks, or is it known where these bottlenecks are?
Best, Dima