Multicore + Hyperthreading - how are threads distributed? Multicore + Hyperthreading - how are threads distributed? multithreading multithreading

Multicore + Hyperthreading - how are threads distributed?


Linux has quite a sophisticated thread scheduler which is HT aware. Some of its strategies include:

Passive Loadbalancing: If a physical CPU is running more than one task the scheduler will attempt to run any new tasks on a second physical processor.

Active Loadbalancing: If there are 3 tasks, 2 on one physical cpu and 1 on the other when the second physical processor goes idle the scheduler will attempt to migrate one of the tasks to it.

It does this while attempting to keep thread affinity because when a thread migrates to another physical processor it will have to refill all levels of cache from main memory causing a stall in the task.

So to answer your question (on Linux at least); given 2 threads on a dual core hyperthreaded machine, each thread will run on its own physical core.


A sane OS will try to schedule computationally intensive tasks on their own cores, but problems arise when you start context switching them. Modern OS's still have a tendency to schedule things on cores where there is no work at scheduling time, but this can result in processes in parallel applications getting swapped from core to core fairly liberally. For parallel apps, you do not want this, because you lose data the process might've been using in the caches on its core. People use processor affinity to control for this, but on Linux, the semantics of sched_affinity() can vary a lot between distros/kernels/vendors, etc.

If you're on Linux, you can portably control processor affinity with the Portable Linux Processor Affinity Library (PLPA). This is what OpenMPI uses internally to make sure processes get scheduled to their own cores in multicore and multisocket systems; they've just spun off the module as a standalone project. OpenMPI is used at Los Alamos among a number of other places, so this is well-tested code. I'm not sure what the equivalent is under Windows.


I have been looking for some answers on thread scheduling on Windows, and have some empirical information that I'll post here for anyone who may stumble across this post in the future.

I wrote a simple C# program that launches two threads. On my quad core Windows 7 box, I saw some surprising results.

When I did not force affinity, Windows spread the workload of the two threads across all four cores. There are two lines of code that are commented out - one that binds a thread to a CPU, and one that suggests an ideal CPU. The suggestion seemed to have no effect, but setting thread affinity did cause Windows to run each thread on their own core.

To see the results best, compile this code using the freely available compiler csc.exe that comes with the .NET Framework 4.0 client, and run it on a machine with multiple cores. With the processor affinity line commented out, Task Manager showed the threads spread across all four cores, each running at about 50%. With affinity set, the two threads maxed out two cores at 100%, with the other two cores idling (which is what I expected to see before I ran this test).

EDIT:I initially found some differences in performance with these two configurations. However, I haven't been able to reproduce them, so I edited this post to reflect that. I still found the thread affinity interesting since it wasn't what I expected.

using System;using System.Collections.Generic;using System.Linq;using System.Diagnostics;using System.Runtime.InteropServices;using System.Threading.Tasks;class Program{    [DllImport("kernel32")]    static extern int GetCurrentThreadId();    static void Main(string[] args)    {        Task task1 = Task.Factory.StartNew(() => ThreadFunc(1));        Task task2 = Task.Factory.StartNew(() => ThreadFunc(2));        Stopwatch time = Stopwatch.StartNew();        Task.WaitAll(task1, task2);        Console.WriteLine(time.Elapsed);    }    static void ThreadFunc(int cpu)    {        int cur = GetCurrentThreadId();        var me = Process.GetCurrentProcess().Threads.Cast<ProcessThread>().Where(t => t.Id == cur).Single();        //me.ProcessorAffinity = (IntPtr)cpu;     //using this line of code binds a thread to each core        //me.IdealProcessor = cpu;                //seems to have no effect        //do some CPU / memory bound work        List<int> ls = new List<int>();        ls.Add(10);        for (int j = 1; j != 30000; ++j)        {            ls.Add((int)ls.Average());        }    }}