Get GNU Octave to work with a multicore processor. (Multithreading) Get GNU Octave to work with a multicore processor. (Multithreading) linux linux

Get GNU Octave to work with a multicore processor. (Multithreading)


Solution

Octave itself is a single-thread application that runs on one core. You can get octave to use some libraries like ATLAS which utilize multiple cores. So while Octave only uses one core, when you encounter a heavy operation, octave calls functions in ATLAS that utilize many CPU's.

I was able to do this. First compile 'ATLAS' from source code and make it available to your system so that octave can find it and use those library functions. ATLAS tunes itself to your system and number of cores. When you install octave from source and specify ATLAS, it uses it, so when octave does a heavy operation like a huge matrix multiplication, ATLAS decides how many cpu's to use.

I was unable to get this to work for Fedora, but on Gentoo I could get it to work.

I used these two links:ftp://ftp.gnu.org/gnu/octave/

http://math-atlas.sourceforge.net/

I ran the following octave core before and after ATLAS install:

ticbigMatrixA = rand(3000000,80);bigMatrixB = rand(80,30);bigMatrixC = bigMatrixA * bigMatrixB;tocdisp("done");

The matrix multiplication goes much faster using multiple processors, which was 3 times faster than before with single core:

Without Atlas: Elapsed time is 3.22819 seconds.With Atlas:    Elapsed time is 0.529 seconds.

The three libraries I am using which speed things up are blas-atlas,cblas-atlas,lapack-atlas.

If octave can use these instead of the default blas, and lapack libraries, then it will utilize multi core.

It is not easy and takes some programming skill to get octave to compile from source with ATLAS.

Drabacks to using Atlas:

This Atlas software uses a lot of overhead to split your octave program into multiple threads. Sure it goes much faster if all you are doing is huge matrix multiplications, but most commands can't be multi-threaded by atlas. If extracting every bit of processing power/speed out of your cores is top priority then you'll have much better luck just writing your program to be run in parallel with itself. (Split your program into 8 equivalent programs that work on 1/8th of the problem and run them all simultaneously, when all are done, reassemble the results).

Atlas helps a single threaded octave program behave a little bit more like a multi-threaded app but it is no silver bullet. Atlas won't make your single threaded Octave program max out your 2,4,6,8 core processor. You'll notice a performance boost, but the boost will leave you searching for a better way to use all the processor. The answer is writing your program to run in parallel with itself, and this takes a lot of programming skill.

Suggestion

Put your energy into vectorizing your heaviest operations and distributing the process over n simultaneous running threads. If you are waiting too long for a process to run, most likely the lowest hanging fruit to speed it up is using a more efficient algorithm or data structure.


On Octave-Forge are two packages dealing with parallel computing:

It is also possible to spawn subprocesses using the fork() function.


As suggested by Eric I tried using ATLAS and it improved my performance 3x (in NN learning application, main cost being matrix multiplication). Surprisingly it seemed still to use only one core. After further research I stumbled upon OpenBLAS and it started to use multiple cores out of the box and improved the performance further 2 times (I had only 2 cores though). If you want to squeeze out more you can also try using MKL, but it is heavy on the disk space due to dependencies.

I was using Arch Linux with packages community/atlas-lapack-base and aur/openblas-lapack. Installing each of them switched the default one used in Octave.

Here is a nice benchmark comparing those libraries: http://www.tcm.phy.cam.ac.uk/~mjr/linpack/