Improving image processing speed

c++ multithreading image-processing opencv

The easier way, I think, could be pipelining frame operations.

You could work with a thread pool, allocating sequentially a frame memory buffer to the first available thread, to be released to pool when the algorithm step on the associated frame has completed.

This could leave practically unchanged your current (debugged :) algorithm, but will require substantially more memory for buffering intermediate results.

Of course, without details about your task, it's hard to say if this is appropriate...

c++ multithreading image-processing opencv

There is one important thing about increasing speed in OpenCV not related to processor nor algorithm and it is avoiding extra copying when dealing with matrices. I will give you an example taken from the documentation:

"...by constructing a header for a part of another matrix. It can be a single row, single column, several rows, several columns, rectangular region in the matrix (called a minor in algebra) or a diagonal. Such operations are also O(1), because the new header will reference the same data. You can actually modify a part of the matrix using this feature, e.g."

// add 5-th row, multiplied by 3 to the 3rd rowM.row(3) = M.row(3) + M.row(5)*3;// now copy 7-th column to the 1-st column// M.col(1) = M.col(7); // this will not workMat M1 = M.col(1);M.col(7).copyTo(M1);

Maybe you already knew this issue but I think it is important to highlight headers in openCV as an important and efficient coding tool.

c++ multithreading image-processing opencv

Assuming I have an x-core processor, does splitting the processing into x threads actually speed things up?

Yes, although it very heavily depends on the particular algorithm being used, as well as your skill in writing threaded code to handle things like synchronization. You didn't really provide enough detail to make a better assessment than that.

Some algorithms are extremely easy to parallelize, like ones that have the form:

for (i=0; i < DATA_SIZE; i++){   output[i] = f(input[i]);}

for some function f. These are known as embarassingly parallelizable; you can simply split the data into N blocks and have N threads process each block individually. Libraries like OpenMP make this kind of threading extremely simple.

CodeHunter

Improving image processing speed

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last