What are the common causes for high CPU usage? What are the common causes for high CPU usage? multithreading multithreading

What are the common causes for high CPU usage?


Personally I'd be pretty annoyed if my threads had work to do, and there were idle cores on my machine because the OS wasn't giving them high CPU usage. So I don't really see that there's any a problem here [Edit: turns out your busy looping is a problem, but in principle there's nothing wrong with high CPU usage].

The OS/scheduler pretty much doesn't predict the amount of work a thread will do. A thread is (to over-simplify) in one of three states:

  1. blocked waiting for something (sleep, a mutex, I/O, etc)
  2. runnable, but not currently running because other things are
  3. running.

The scheduler will select as many things to run as it has cores (or hyperthreads, whatever), and run each one either until it blocks or until an arbitrary period of time called a "timeslice" expires. Then it will schedule something else if it can.

So, if a thread spends most of its time in computation rather than blocking, and if there's a core free, then it will occupy a lot of CPU time.

There's a lot of detail in how the scheduler chooses what to run, based on things like priority. But the basic idea is that a thread with a lot to do doesn't need to be predicted as compute-heavy, it will just always be available whenever something needs scheduling, and hence will tend to get scheduled.

For your example loop, your code doesn't actually do anything, so you'd need to check how it has been optimized before judging whether 5-7% CPU makes sense. Ideally, on a two-core machine a processing-heavy thread should occupy 50% CPU. On a 4 core machine, 25%. So unless you have at least 16 cores then your result is at first sight anomalous (and if you had 16 cores, then one thread occupying 35% would be even more anomalous!). In a standard desktop OS most cores are idle most of the time, so the higher the proportion of CPU that your actual programs occupy when they run, the better.

On my machine I frequently hit one core's worth of CPU use when I run code that is mostly parsing text.

if exactly one thread enqueue items to queue, then is it safe if exactly one thread deque items from it?

No, that is not safe for std::queue with a standard container. std::queue is a thin wrapper on top of a sequence container (vector, deque or list), it doesn't add any thread-safety. The thread that adds items and the thread that removes items modify some data in common, for example the size field of the underlying container. You need either some synchronization, or else a safe lock-free queue structure that relies on atomic access to the common data. std::queue has neither.


Edit: Ok, since you are using busy spin to block on the queue, this is most likely the cause for high CPU usage. The OS is under the impression that your threads are doing useful work when they are actually not, so they get full CPU time. There was interesting discussion here: Which one is better for performance to check another threads boolean in java

I advise you to either switch to events or other blocking mechanisms or use some synchronized queue instead and see how it goes.

Also, that reasoning about the queue being thread-safe "because only two threads are using it" is very dangerous.

Assuming the queue is implemented as a linked list, imagine what can happen if it has only one or two elements remaining. Since you have no way of controlling the relative speeds of the producer and the consumer, this may well be the case and so you're in big trouble.


Before you can start thinking about how to optimize your threads to consume less CPU you need to have an idea of where is all that CPU time spent. One way to obtain this information is by using a CPU profiler. If you don't have one, then give Very Sleepy a try. It's easy to use, and free.

The CPU profiler will monitor your running application and take notes of where time is spent. As a result it will give you a list of functions sorted by how much CPU they've used during the sampled period, how many times were called, etc. Now you need to look at the profiling results starting from the most CPU intensive functions and see what you can change in those to reduce the CPU usage.

The important thing is that once you have profiler results you have actual data that tells you what parts of your application you can optimize to obtain the biggest return.

Now let's consider the kinds of things you can find that are consuming a lot of CPU.

  • A worker thread is typically implemented as a loop. At the top of the loop a check is made to decide if there is work to do and any available work is executed. A new iteration of the loop begins the cycle again.

    You may find that with a setup like this most of the CPU time allocated to this thread is spent looping and checking, and very little is spent actually doing work. This is the so called busy-wait problem. To partially address this you can add a sleep in between loop iterations, but this isn't the best solution. The ideal way to address this problem is to put the thread to sleep when there is no work to do, and when some other thread generates work for the sleeping thread it sends a signal to awaken it. This practically eliminates the looping overhead, the thread will only use CPU when there is work to do. I typically implement this mechanism with semaphores, but on Windows you can also use an Event object. Here is a sketch of an implementation:

    class MyThread {private:    void thread_function() {        while (!exit()) {            if (there_is_work_to_do())                do_work();            go_to_sleep();        }    }    // this is called by the thread function when it    // doesn't have any more work to do    void go_to_sleep() {        sem.wait();    }public:    // this is called by other threads after they add work to    // the thread's queue    void wake_up() {        sem.signal();    }};

    Note that in the above solution the thread function always tries to go to sleep after executing one task. If the thread's queue has more work items, then the wait on the semaphore will return immediately, since each time an item was added to the queue the originator must have called the wake_up() function.

  • The other thing you may see in the profiler output is that most of the CPU is spent in functions executed by the worker thread while it is doing work. This is actually not a bad thing, if most of the time is spent working, then that means that the thread had work to do and there was CPU time available to do that work, so in principle there is nothing wrong here.

    But still, you may not be happy that your application uses so much CPU, so then you need to look at ways to optimize your code so that it does the the work more efficiently.

    For example, you may find that some little auxiliary function was called millions of times, so while a single run of the function is quick, if you multiply that by a few million it becomes a bottle neck for the thread. At this point you should look at ways to make optimizations to reduce the CPU usage in this function, either by optimize its code, or by optimizing the caller(s) to call the function less times.

    So the strategy here is to start from the most expensive function according to the profiling report and try to make a small optimization. Then you rerun the profiler to see how things changed. You may find that a small change to the most CPU intensive function moves it down to 2nd or 3rd place, and as a result the overall CPU usage was reduced. After you congratulate yourself for the improvement, you repeat the exercise with the new top function. You can continue this process until you are satisfied that your application is as efficient as it can be.

Good luck.