Hello all.
I've been trying to speed up BINOMIAL-COEFFICIENT for combinatorial calculations and ended up writing my own functions which were originally based off of Alexandria's design. For computing the binomial coefficient of (10^11, 10^5), Alexandria's takes about 9.990 seconds on my 64-bit Windows machine with a relatively recent SBCL with (optimize speed (safety 0) (debug 0)).
I've ended up writing my own version of %MULTIPLY-RANGE, which I ended up calling RANGE-PRODUCT, and it's definition is:
(declaim (ftype (function (integer integer) integer) range-product)) (defun range-product (lower upper) "Compute LOWER * (LOWER+1) * ... * (UPPER-1) * UPPER." (assert (<= lower upper)) (case (- upper lower) ((0) lower) ((1) (* lower upper)) (otherwise (let ((mid (floor (+ lower upper) 2))) (* (range-product lower mid) (range-product (1+ mid) upper))))))
[ from QTILITY: https://bitbucket.org/tarballs_are_good/qtility/src/65eff10fdece/arithmetic.... ]
From that, the definition of FACTORIAL follows:
(declaim (ftype (function ((integer 0)) (integer 1)) factorial)) (defun factorial (n) "Compute the factorial of N, N! = 1 * 2 * ... * N." (if (zerop n) 1 (range-product 1 n)))
[ from QTILITY: https://bitbucket.org/tarballs_are_good/qtility/src/65eff10fdece/arithmetic.... ]
Then, I wrote BINOMIAL-COEFFICIENT that uses a closure to direct the swapping of the K and N-K, instead of mutating them. This seemed to at least help with SBCL's flow analysis. I also chose not to use CHECK-TYPE in favor of declarations (which I understand are not the same, but I think they should attempt to act the same given a sufficient compiler policy).
(declaim (ftype (function ((integer 0) (integer 0)) (integer 0)) binomial-coefficient)) (defun binomial-coefficient (n k) "Binomial coefficient of N and K." (assert (>= n k)) (labels ((core (k n-k) (if (= 1 n-k) n (nth-value 0 (floor (range-product (+ k 1) n) (factorial n-k)))))) (declare (inline core) (ftype (function ((integer 0) (integer 0)) (integer 0)) core)) (if (or (zerop k) (= n k)) 1 (let ((n-k (- n k))) (declare (type (integer 0) n-k)) (if (< k n-k) (core n-k k) (core k n-k))))))
[ from QTILITY: https://bitbucket.org/tarballs_are_good/qtility/src/65eff10fdece/arithmetic.... ]
With the code above, I can compute the same binomial coefficient in about 7.503 seconds, so about a 2.5 second speed up.
I'd like to submit a patch using the code above, but I could see how these speed ups are both implementation-dependent and platform-dependent. A particular unfortunate consequence is that we don't have those tweakable limits which (in theory; as the comments say) could be computed at compile time.
Any thoughts?
-Robert