Hi Sumant,
Thanks for this interesting analysis. It is encouraging to me that the FFT itself is only a small fraction of the total time taken, and therefore there is opportunity for real speed improvement. Here is my breakdown:
1. FFT only (Checks 2 and 3): 10% 2. make-urand-vector: 90% 1. making and setting the vector (Check 4) (make-and-init-vector, (setf grid:aref)): 40% of the overall 2. pointless coercing of (complex (urand) (urand)) to (complex double-float), which it already is (Check 5): 20% of the overall
I don't understand why the coerce costs so much, as it's essentially a no-op (complex (urand) (urand)) is already a (complex double-float) but regardless, it should be an easy win of 20% to just remove the coerce.
With some care, I bet make-urand-vector could be made much faster than the remaining 70%. One thing is inlining #'urand in make-urand-vector and declaring as much as possible like declaring urand-seed fixnum. Another is to call make-foreign-array with :initial-contents instead of setfing each element, I'm not sure if that's faster or slower, but it would be interesting to find out because it might lead us to rethink how to do that if it is slower. (BTW on SBCL there might be a price paid here in that we are making a static vector). I don't think your declaration (type grid:vector-complex-double-float vec) does anything because the compiler won't use that information, and we don't read the environment (we would need variable-information from CLtL2 but removed from the CL standard; it is used in e.g. grid::declared-type-dimension).
Liam