SwitchToThread vs Sleep(1) SwitchToThread vs Sleep(1) multithreading multithreading

SwitchToThread vs Sleep(1)


There are two differences. The first is mentioned in the MSDN docs for SwitchToThread:

The yield of execution is limited to the processor of the calling thread. The operating system will not switch execution to another processor, even if that processor is idle or is running a thread of lower priority.

Sleep(0) will allow threads on other processors to run, as well.

SwitchToThread only yields to a single thread scheduling context, as well. Sleep, on the other hand, has multiple conditions for which it waits. The docs for SleepEx spell this out in detail:

* An I/O completion callback function is called* An asynchronous procedure call (APC) is queued to the thread.* The time-out interval elapses

This will yield to multiple threads.

In general, Sleep(0) will be much more likely to yield a timeslice, and will ALWAYS yield to the OS, even if there are no other threads waiting. This is why adding a Sleep(0) in a loop will take the processor usage from 100% (per core) to near 0% in many cases. SwitchToThread will not, unless another thread is waiting for a time slice.


SwitchToThread() is a "smarter" version of Sleep(0). It is not well documented, but in my understanding, it works the following way:

  1. when there are other threads in the ready state (i.e. there are more threads wanting to run than the logical processors are available) and these threads are of the same or higher priority than the thread that calls SwitchToThread(), it behaves the same way as Sleep(0) - i.e. cedes the logical processor to one of these threads, at an expensive cost of a context switch;
  2. when there threads in the ready state with lower priority, it just exits, i.e. the thread that has called SwitchToThread() continues execution without any expense of a context switch or a 3 to ring 0 transitions (it does not leave the user mode) -- this is contrary to how Sleep(0) behaves that always cedes control to even lowest priority threads;
  3. when there are no threads in the ready state, SwitchToThread() also just exits like Sleep(0) - so if you do this in a loop, you are just getting 100% load of the current logical processor, i.e. burning the power.

Sleep(1) is the same as Sleep(0) but with a 1 millisecond delay aftwerwards. This 1 millisecond delay frees the logical processor and doesn't burn any power. SwitchToThread, to the contrary, never experiences any delay.

So it's better to compare SwitchToThread with Sleep(0), not with Sleep(1), because Sleep(1) is the same as Sleep(0) + delay of 1 millisecond.

I've borrowed some ideas on this issue from the "Intel 64 and IA-32 Architectures Optimization Reference Manual" and "Intel 64 and IA-32 Architectures Software Developer’s Manual", which favor calling some pause CPU instructions (also available as intrinsics) over SwitchToThread() or Sleep(0) if your wait is very short. Please note that SwitchToThread() or Sleep(0) are almost immediate, while Sleep(1) lasts at least one millisecond.

The following should also be taken into consideration:

  • Each call to Sleep() or SwitchToThread() experiences the expensive cost of a context switch, which can be 10000+ cycles.
  • It also suffers the cost of ring 3 to ring 0 transitions, which can be 1000+ cycles.
  • SwitchToThread() or Sleep(0) may be of no use if no threads are in the ready state, but Sleep(1) waits for at least one millisecond regardless of whether there are other threads in the `ready' state or not.

If your wait loop tends to be very short, please consider executing some some pause CPU instructions first. By slowing down the “spin-wait” with the some pause CPU instructions before the SwitchToThread() or a Sleep() call, the multi-threading software gains:

  • Performance by facilitating the waiting tasks to acquire resources more easily from a busy wait.
  • Power-savings by both using fewer parts of the pipeline while spinning.
  • Elimination of great majority of unnecessarily executed instructions caused by the overhead of a SwitchToThread() or Sleep(0) or Sleep(1) call.

However, if you are going to call Sleep(1) which runs at least one millisecond which is very long in terms of CPU cycles, than you are expecting that your wait cycle will be very long, so the pause instructions will be futile in this case.

When the wait loop is expected to last long, it is preferable to yield to the operating system by calling one of the OS synchronization API functions, such as WaitForSingleObject on Windows OS, but not a SwitchToThread() or Sleep(0) or Sleep(1), since they are very wasteful on long waits. Moreover, Sleep(1) is very slow and OS synchronization functions like WaitForSingleObject or EnterCriticalSection will react much faster and they are more resource-friendly.

My conclusion: it is better to not to use Sleep(0) or Sleep(1) or SwitchToThread(). Avoid the “spin-wait” loops at all cost. Use high-level synchronization functions like WaitForMultipleObjects(), SetEvent(), and so on -- they are the best from the terms of performance, efficiency and power saving. Although they also suffer from expensive context switches and ring 3 to ring 0 transitions, these expenses are infrequent and are more than reasonable, compared to what you would have spent in the “spin-wait” loops with Sleep() or SwitchToThread().

On a processor supporting HT Technology, spin-wait loops can consume a significant portion of the execution bandwidth of the processor. One logical processor executing a spin-wait loop can severely impact the performance of the other logical processor. That's why sometimes disabling HT may improve performance.

Consistently polling for a devices or a file or other data source for state changes can cause the computer to consume more power, to put stress on memory and the system bus, and to provide unnecessary page faults (use the Task Manager in Windows to see which applications produce most page faults while in idle - these are most inefficient applications since they are using "polling"). Minimize polling whenever possible and use an event-driven way of writing applications. This is the best practice that I highly recommend. You application should literally sleep all the time, waiting for multiple events set up in advance. A good example of an event-driven application is Nginx under Linux. Take an example with polling for power source changes. If an operating system provides notification services (even a WM_ message) for various device state changes, such as transition the power source from AC to battery, use these notification services instead of polling for device state changes. Such an approach reduces the overhead for the code to poll the status of the power source, because the code can get notifications asynchronously when status changes happen.

Contrary to what some people wrote, Sleep(0) does not reduce CPU consumption to near zero. It releases execution to other threads that are in 'ready' state, but if there are no such threads, it just wastes thousands of CPU cycles and consumes 100% CPU cycles of the current threads, as have been also demonstrated by stackoverflow members - and I have also just re-checked this again - the Sleep(0) loop consumes 100% CPU of the current thread on Windows 10 64-bit.