OK, this turned out to be a lot harder than I thought. I have done three things: 1) I have defined gref* and (setf gref*) methods specific to each of the foreign-array types that call cffi:mem-aref. This gives about 20x speedup because I am now passing the literal type to cffi:mem-aref, so it can work that fact in at compile time. 2) There is now a compiler macro to turn a grid:gref into a grid:gref* if there's only one index. This gives about 2x speed up when gref is used. 3) There is another compiler macro that turns a grid:gref* into a cffi:mem-aref directly if the foreign array is declared. This gives about a 400x speedup overall, similar to your "hardwired" result. It is a bit slower because I'm not able to precompute the pointer, it has to be recomputed each time on the gref* call.
On the last point, there is a caveat. I tried to make it work when the foreign array has been declared with a standard (declare ...) form. This has a chance of working on SBCL because of its support for the CLtL2 function variable-information, which was removed from CL before it was sent to ANSI standardization. However, it did not work for me; I will continue to try to get this working. In the meantime, the only way to do a declaration to take advantage of 3 is with a 'the form, e.g. (grid:gref (the vector-double-float zvector) i). This is kind of annoying, but it is portable, and allows you to avoid going to lower level functions (i.e., cffi:mem-aref).
So for example see my rewrite of your function (in foreign-array/tests/fast-array-access.lisp) (defun gref-access (dim) "Given an integer dim, this constructs a function that, when supplied with a N-dimensional vector Z and some output vector (-> pointer?), yields the corresponding forces" (let ((temp-values (make-array 2 :element-type 'double-float :initial-element 0.0d0))) (lambda (zvector output) (declare (fixnum dim) (optimize (speed 3) (safety 0) (debug 0)) (type vector-double-float zvector)) ;;; <--- this is useless, but ought not to be! (do ((i 0 (1+ i))) ((= i dim)) (declare (fixnum i)) (setf (aref temp-values 0) 0.0d0) (do ((m 0 (1+ m))) ((> m i)) (declare (fixnum m)) (do ((n i (1+ n))) ((= n dim)) (declare (fixnum n)) (setf (aref temp-values 1) 0.0d0) (do ((k m (1+ k))) ((> k n)) (declare (fixnum k)) (incf (aref temp-values 1) (grid:gref (the vector-double-float zvector) k))) ; This declaration does the work! (incf (aref temp-values 0) (expt (aref temp-values 1) -2)))) (setf (grid:gref output i) (- (grid:gref (the vector-double-float zvector) i) ; This one does too! (aref temp-values 0))))))) is now (almost) as fast as your cffi-access.
There is still a bunch of stuff to be done --- the optimizations only work for vectors, not higher dimensional arrays, and I haven't defined a compiler macro for setf yet on 3). But it's a start; try it in your problem and let me know how it performs.
Liam