Hi Liam,
Thanks for the tips. If you remember, though, I previously reported that using :initial-contents (from a list) is slow, in fact, it's slower than setf-ing each element by hand (at least on my system). I don't know (and haven't tried) to use :initial-contents with anything else than a list. E.g. when I read a file containing ascii data, there are two ways I tried to read that into an array: read the whole table into a list and use that as :initial-contents, or first make the array and read value by value into the array. The latter was quite a bit faster.
Regarding the coerce, I know it should be a no-op. The reason it's there, is that the tests call for real and complex single and double float values, but in the previous e-mail I only put the tests where the coerce *should* be a no-op. A coerce could be avoided by making separate functions based on type, I guess.
I don't really understand why the declarations don't do anything; I thought that I had copy-pasted that from the antik manual tips on speed. Apparently I did something wrong.
To me, these tests show that the speed problem is basically a problem in CL handling the grids/foreign arrays, and not with the FFI, as witnessed by the speedy FFTs themselves. A significant part of my data processing only relies for one or two operations on GSL algorithms, and the biggest part are simple operations on data manipulations that I would do directly in CL (reading and writing data, cutting, pasting, transforming rows and columns).
As such, I think the problem exceeds the domain of FFT tests, and aside from tinkering with the code here and there, and trying some things out, I'm not sure of how much use I can be. I will still be looking into it, but I also don't have too much time to spend. So if there's anyone that can help on this issue, or provide more insights, that would be a great help.
-Sumant
On Sat, Nov 26, 2011 at 10:24:18AM -0500, Liam Healy wrote:
Hi Sumant,
Thanks for this interesting analysis.� It is encouraging to me that the FFT itself is only a small fraction of the total time taken, and therefore there is opportunity for real speed improvement. Here is my breakdown:
- FFT only (Checks 2 and 3): 10%
- make-urand-vector: 90%
����� 1. making and setting the vector (Check 4) (make-and-init-vector, (setf grid:aref)): 40% of the overall ����� 2. pointless coercing of (complex (urand) (urand)) to (complex double-float), which it already is (Check 5): 20% of the overall
I don't understand why the coerce costs so much, as it's essentially a no-op (complex (urand) (urand)) is already a (complex double-float) but regardless, it should be an easy win of 20% to just remove the coerce.
With some care, I bet make-urand-vector could be made much faster than the remaining 70%.� One thing is inlining #'urand in make-urand-vector and declaring as much as possible like declaring urand-seed fixnum.�� Another is to call make-foreign-array with :initial-contents instead of setfing each element, I'm not sure if that's faster or slower, but it would be interesting to find out because it might lead us to rethink how to do that if it is slower.� (BTW on SBCL there might be a price paid here in that we are making a static vector).� I don't think your declaration (type grid:vector-complex-double-float vec) does anything because the compiler won't use that information, and we don't read the environment (we would need variable-information from CLtL2 but removed from the CL standard;� it is used in e.g. grid::declared-type-dimension).
Liam