In-depth analysis of the difference between the CPU and GPU [closed]

multithreading performance architecture multicore gpu

GPUs are basically massively parallel computers. They work well on problems that can use large scale data decomposition and they offer orders of magnitude speedups on those problems.

However, individual processing units in a GPU cannot match a CPU for general purpose performance. They are much simpler and do not have optimizations like long pipelines, out-of-order execution and instruction-level-parallelizaiton.

They also have other drawbacks. Firstly, you users have to have one, which you cannot rely on unless you control the hardware. Also there are overheads in transferring the data from main memory to GPU memory and back.

So it depends on your requirements: in some cases GPUs or dedicated processing units like Tesla are the clear winners, but in other cases, your work cannot be decomposed to make full use of a GPU and the overheads then make CPUs the better choice.

multithreading performance architecture multicore gpu

First watch this demonstration:

http://www.nvidia.com/object/nvision08_gpu_v_cpu.html

That was fun!

So what's important here is that the "CPU" can be controlled to perform basically any calculation on command; For calculations that are unrelated to each other, or where each computation is strongly dependent on its neighbors (rather than merely the same operaton), you usually need a full CPU. As an example, compiling a large C/C++ project. The compiler has to read each token of each source file in sequence before it can understand the meaning of the next; Just because there are lots of source files to process, they all have different structure, and so the same calculations don't apply accros the source files.

You could speed that up by having several, independent CPU's, each working on separate files. Improving the speed by a factor of X means you need X CPU's which will cost X times as much as 1 CPU.

Some kinds of task involve doing exactly the same calculation on every item in a dataset; Some physics simulations look like this; in each step, each 'element' in the simulation will move a little bit; the 'sum' of the forces applied to it by its immediate neighbors.

Since you're doing the same calculation on a big set of data, you can repeat some of the parts of a CPU, but share others. (in the linked demonstration, the air system, valves and aiming are shared; Only the barrels are duplicated for each paintball). Doing X calculations requires less than X times the cost in hardware.

The obvious disadvantage is that the shared hardware means that you can't tell a subset of the parallel processor to do one thing while another subset does something unrelated. the extra parallel capacity would go to waste while the GPU performs one task and then another different task.

CodeHunter

In-depth analysis of the difference between the CPU and GPU [closed]

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last