Improving image processing speed Improving image processing speed multithreading multithreading

Improving image processing speed


The easier way, I think, could be pipelining frame operations.

You could work with a thread pool, allocating sequentially a frame memory buffer to the first available thread, to be released to pool when the algorithm step on the associated frame has completed.

This could leave practically unchanged your current (debugged :) algorithm, but will require substantially more memory for buffering intermediate results.

Of course, without details about your task, it's hard to say if this is appropriate...


There is one important thing about increasing speed in OpenCV not related to processor nor algorithm and it is avoiding extra copying when dealing with matrices. I will give you an example taken from the documentation:

"...by constructing a header for a part of another matrix. It can be a single row, single column, several rows, several columns, rectangular region in the matrix (called a minor in algebra) or a diagonal. Such operations are also O(1), because the new header will reference the same data. You can actually modify a part of the matrix using this feature, e.g."

// add 5-th row, multiplied by 3 to the 3rd rowM.row(3) = M.row(3) + M.row(5)*3;// now copy 7-th column to the 1-st column// M.col(1) = M.col(7); // this will not workMat M1 = M.col(1);M.col(7).copyTo(M1);

Maybe you already knew this issue but I think it is important to highlight headers in openCV as an important and efficient coding tool.


Assuming I have an x-core processor, does splitting the processing into x threads actually speed things up?

Yes, although it very heavily depends on the particular algorithm being used, as well as your skill in writing threaded code to handle things like synchronization. You didn't really provide enough detail to make a better assessment than that.

Some algorithms are extremely easy to parallelize, like ones that have the form:

for (i=0; i < DATA_SIZE; i++){   output[i] = f(input[i]);}

for some function f. These are known as embarassingly parallelizable; you can simply split the data into N blocks and have N threads process each block individually. Libraries like OpenMP make this kind of threading extremely simple.