How to speed up matrix multiplication in C++? How to speed up matrix multiplication in C++? arrays arrays

How to speed up matrix multiplication in C++?


Speaking of speed-up, your function will be more cache-friendly if you swap the order of the k and j loop iterations:

matrix mult_std(matrix a, matrix b) { matrix c(a.dim(), false, false); for (int i = 0; i < a.dim(); i++)  for (int k = 0; k < a.dim(); k++)   for (int j = 0; j < a.dim(); j++)  // swapped order    c(i,j) += a(i,k) * b(k,j); return c;}

That's because a k index on the inner-most loop will cause a cache miss in b on every iteration. With j as the inner-most index, both c and b are accessed contiguously, while a stays put.


Make sure that the members dim() and operator()() are declared inline, and that compiler optimization is turned on. Then play with options like -funroll-loops (on gcc).

How big is a.dim() anyway? If a row of the matrix doesn't fit in just a couple cache lines, you'd be better off with a block access pattern instead of a full row at-a-time.


You say you don't want to modify the algorithm, but what does that mean exactly?

Does unrolling the loop count as "modifying the algorithm"? What about using SSE/VMX whichever SIMD instructions are available on your CPU? What about employing some form of blocking to improve cache locality?

If you don't want to restructure your code at all, I doubt there's more you can do than the changes you've already made. Everything else becomes a trade-off of minor changes to the algorithm to achieve a performance boost.

Of course, you should still take a look at the asm generated by the compiler. That'll tell you much more about what can be done to speed up the code.