Which is more Efficient? More Cores or More CPUs Which is more Efficient? More Cores or More CPUs multithreading multithreading

Which is more Efficient? More Cores or More CPUs


That's not an easy question to answer. Computer architecture is unsurprisingly rather complicated. Below are some guidelines but even these are simplifications. A lot of this will come down to your application and what constraints you're working within (both business and technical).

CPUs have several (2-3 generally) levels of caching on the CPU. Some modern CPUs also have a memory controller on the die. That can greatly improve the speed of swapping memory between cores. Memory I/O between CPUs will have to go on an external bus, which tends to be slower.

AMD/ATI chips use HyperTransport, which is a point-to-point protocol.

Complicating all this however is the bus architecture. Intel's Core 2 Duo/Quad system uses a shared bus. Think of this like Ethernet or cable internet where there is only so much bandwidth to go round and every new participant just takes another share from the whole. Core i7 and newer Xeons use QuickPath, which is pretty similar to HyperTransport.

More cores will occupy less space, use less space and less power and cost less (unless you're using really low powered CPUs) both in per-core terms and the cost of other hardware (eg motherboards).

Generally speaking one CPU will the the cheapest (both in terms of hardware AND software). Commodity hardware can be used for this. Once you go to the second socket you tend to have to use different chipsets, more expensive motherboards and often more expensive RAM (eg ECC fully buffered RAM) so you take a massive cost hit going from one CPU to two. It's one reason so many large sites (including Flickr, Google and others) use thousands of commodity servers (although Google's servers are somewhat customized to include things like a 9V battery but the principle is the same).

Your edits don't really change much. "Performance" is a highly subjective concept. Performance at what? Bear in mind though that if your application isn't sufficiently multithreaded (or multiprocess) to take advantage of extra cores then you can actually decrease performance by adding more cores.

I/O bound applications probably won't prefer one over the other. They are, after all, bound by I/O not CPU.

For compute-based applications well it depends on the nature of the computation. If you're doing lots of floating point you may benefit far more by using a GPU to offload calculations (eg using Nvidia CUDA). You can get a huge performance benefit from this. Take a look at the GPU client for Folding@Home for an example of this.

In short, your question doesn't lend itself to a specific answer because the subject is complicated and there's just not enough information. Technical architecture is something that has to be designed for the specific application.


Well, the point is that all other factors can't really be equal.

The main problem with multi-CPU is latency and bandwidth when the two CPU sockets have to intercommunicate. And this has to happen constantly to make sure their local caches aren't out of sync. This incurs latency, and sometimes can be the bottleneck of your code. (Not always of course.)


More cores on fewer CPUs is definitely faster as SPWorley writes. His answer is close to three years old now but the trends are there and I believe his answer needs some clarification. First some history.

In the early eighties the 80286 became the first microprocessor where virtual memory was feasible. Not that it hadn't been tried before, but intel integrated the management of virtual memory onto the chip (on-die) instead of having an off-die solution. This resulted in their memory management solution being much faster than those of their competitors because all memory management (especially the translation of virtual to physical addresses) was designed into and part of the generic processing.

Remember those big clunky P2 & P3 processors from intel and early athlon & durons from AMD that were set on a side and contained in a big plastic package? The reason for this was to be able to fit a cache chip next to the processor chip since the fabrication processes of the time made it unfeasible to fit the cache onto the processor die itself. VoilĂ  an off-die, on-processor solution. These cache chips would, due to timing limitations, run at a fraction (50% or so) of the CPUs clock frequency. As soon as the manufacturing processes caught up, caches were moved on-die and began to run at the internal clock frequency.

A few years ago AMD moved the RAM memory controller from the Northbridge (off-die) and onto the processor (on-die). Why? Because it makes memory operations more efficient (faster) by eliminating external addressing wiring by half and eliminates going through the Northbridge (CPU-wiring-Northbridge-wiring-RAM became CPU-wiring-RAM). The change also made it possible to have several independent memory controllers with their own sets of RAM operating simultaneously on the same die which increases the memory bandwidth of the processor.

To get back to the clarification we see a long-term trend toward moving performance-critical functionality from the motherboard and onto the processor die. In addition to those mentioned we have seen the integration of multiple cores onto the same die, off-die L2/on-die L1 caches have become off-die L3 /on-die L1 and L2 caches which are now on-die L1, L2 and L3 caches. The caches have become larger and larger to the extent that they take up more space than the cores themselves.

So, to sum up: anytime you need to go off-die things slow down dramatically. The answer: make sure to stay on-die as much as possible and streamline the design of anything that needs to go off-die.