Matrix Multiplication in Clojure vs Numpy Matrix Multiplication in Clojure vs Numpy python python

Matrix Multiplication in Clojure vs Numpy


The Python version is compiling down to a loop in C while the Clojure version is building a new intermediate sequence for each of the calls to map in this code. It is likely that the performance difference you see is coming from the difference of data structures.

To get better than this you could play with a library like Incanter or write your own version as explained in this SO question. see also this one, neanderthal or nd4j. If you really want to stay with sequences to keep the lazy evaluation properties etc. then you may get a real boost by looking into transients for the internal matrix calculations

EDIT: forgot to add the first step in tuning clojure, turn on "warn on reflection"


Numpy is linking to BLAS/Lapack routines that have been optimized for decades at the level of machine architecture while the Clojure is a implementing the multiplication in the most straightforward and naive manner.

Any time you have non-trivial matrix/vector operations to perform, you should probably link to BLAS/LAPACK.

The only time this won't be faster is for small matrices from languages where the overhead of translating the data representation between the language runtime and the LAPACK exceed the time spent doing the calculation.


I've just staged a small shootout between Incanter 1.3 and jBLAS 1.2.1. Here's the code:

(ns ml-class.experiments.mmult  [:use [incanter core]]  [:import [org.jblas DoubleMatrix]])(defn -main [m]  (let [n 23 m (Integer/parseInt m)        ai (matrix (vec (double-array (* m n) (repeatedly rand))) n)        ab (DoubleMatrix/rand m n)        ti (copy (trans ai))        tb (.transpose ab)]    (dotimes [i 20]      (print "Incanter: ") (time (mmult ti ai))      (print "   jBLAS: ") (time (.mmul tb ab)))))

In my test, Incanter is consistently slower than jBLAS by about 45% in plain matrix multiplication. However, Incanter trans function does not create a new copy of a matrix, and therefore (.mmul (.transpose ab) ab) in jBLAS takes twice as much memory and is only 15% faster than (mmult (trans ai) ai) in Incanter.

Given Incanter rich feature set (especially it's plotting library), I don't think I'll switch to jBLAS any time soon. Still, I would love to see another shootout between jBLAS and Parallel Colt, and maybe it's worth considering to replace Parallel Colt with jBLAS in Incanter? :-)


EDIT: Here are absolute numbers (in msec.) I got on my (rather slow) PC:

Incanter: 665.362452   jBLAS: 459.311598   numpy: 353.777885

For each library, I've picked the best time out of 20 runs, matrix size 23x400000.

PS. Haskell hmatrix results are close to numpy, but I am not sure how to benchmark it correctly.