My apologies for taking so long to answer. I have verified that the new gref* can now be used without consing and comes relatively close to the unconvenient cffi:mem-aref solution in terms of computation time. Many thanks for that! I also tried the functions on matrices, which still results in some consing, but this doesn't seem to be time-critical here. In my experiments, most of the consing could be eliminated by storing the linearized index in an auxiliary variable, but this even resulted in a slight runtime increase. For my original application, the GSLL solution is now comparable in speed to Mathematica for small dimensionality (dim < about 50); for a fair comparison of both solutions for larger dim, I'll have to rework my very naive computation of the Hessian matrix.
Judging from the git commits, you have also changed something related to complex-valued arrays. Did you implement similar optimizations as in the real-valued case? I'll try that once I have repaired my Hessian.
Best regards, Sebastian