Linux Kernel- task_h_load Linux Kernel- task_h_load linux linux

Linux Kernel- task_h_load


Load balancing is invoked if SMP is configured.Linux uses Completely Fair Scheduler(CFS) to schedule each task so that each one obtain “fair” share of the processor time.. CFS uses concept of red-black tree.

enter image description here

Scheduler records the current time when a task enters into the run queue. While the process waits for processor time, it's “wait” value gets incremented by an amount derived from the total number of tasks currently in the run queue and the process priority. When processor runs this task ,this "wait" value gets decremented. If this value drops below certain value then scheduler will preempt the task and other task come closer to processor to execute. CFS always try to keep situation ideal by keeping "wait" value zero.

There are two functions in linux load_balance and select_task_rq_fair() which perform the task of load balance.

In simple word CFS load balance mechanism offload the busy CPU to less busy or ideal one.

task_h_load is used to calculate weight of task.

What I don't understand is what/how the load is calculated in task_h_load. how it is calculated in the task_h_load function?

This weight factor depends on nice value of the process.

weighting factor = weight of process with nice value 0 / weight of current task;

where ‘weight’ is approximately equivalent to 1024 * (1.25)^(-nice)

For example : weight is 820 for nice value 1weight is 1277 for nice value -1

task_h_load
For more load-balance methods and basics refer kernel commenttask_h_load use update_cfs_rq_h_load to calculate hierarchical load of run queue for fair time scheduling and which use cfs_rq_load_avg return average load of runqueue load_avg.

Virtual run time is the weighted time a task has run on the CPU. CFS always try to keep this Red-black tree in balance.

enter image description here

<--smaller value -----------vruntime value--------larger value-->

Each runnable task is placed in self balancing Red Black tree based on vruntime. If task is ready to run (means task is not waiting for any resource), It is placed to tree. If task is waiting for some resource (i.e waiting for I/O )then it is removed. Tasks those have less processing time (means smaller vruntime ) are towards left side of tree and tasks having more processing time are right side of tree.

Left node has the smallest key value (For CFS ,it is task with higher priority). Self balancing red black tree need O(lgN) operation to navigate towards left node. Scheduler caches this value in rb_leftmost. By retrieving only this value ,scheduler determine which task to run next

This load balance is used for non real task only for real time task push-pull operations are used which is developed by Steven Rostedt and Gregory Haskins

One more thing about CFS is also helpful in Fair Group Scheduling .Consider below figure

enter image description here

move_task is simply try to move up imbalance weighted load (means after calculating load factor as shown above ) from busies to this_rq. It try to equal the load balance of both run queue so that both can spare "fair" processor time.

detach_task detach the task for the migration specified in env pointer of linux kernel.

detach_one_task tries to dequeue exactly one task from env->src_rq.

detach_tasks() tries to detach up to imbalance weighted load frombusiest_rq. And it returns number of detached tasks or else zero if fail to detach.

To attach this detached task to new rq(run queue) , attach_task, attach_one_task,attach_tasks is used according to situation.

New warning check lockdep_assert_held() is introduced in detach_tasks which was not present in move_tasks

On multi-processor it is not easy to move task so easily so cfs do domain specific load-balancing as below:enter image description here

For all this to understand I would like you to go through following reference

  1. Per-Entity-Load-Tracking metric for load balancing

  2. weight factor

  3. Open-suse

  4. Robert-love Chapter 3

  5. CPU-scheduling

I specially read all these documentation to answer your question I hope you don't mind to comment if something is missing.


CFS holds a tree of "scheduling entities". Each scheduling entity can have its own tree, and so on in a recursive way...(This is useful, for example, for grouping all the processes of a specific user into one scheduling entity; thus preventing a user that has many tasks from consuming more cpu time than users with fewer processes)

task_h_load - stands for "task hierarchical load"

Since a task can be nested in a few trees, then calculating its load is not so simple...

static unsigned long task_h_load(struct task_struct *p){    struct cfs_rq *cfs_rq = task_cfbs_rq(p);    update_cfs_rq_h_load(cfs_rq);    return div64_ul(p->se.avg.load_avg * cfs_rq->h_load,                           cfs_rq_load_avg(cfs_rq) + 1);}

At the beginning cfs_rq points to the immediate tree at which p is found in. If we had only two nested trees then calculating the load of p would have been simple:

task_h_load = task_load_in_its_tree * ( load_of_immediate_tree / load_of_containing_tree);

(while immediate_tree refers to the tree that contains the task, and containing_tree refers to the tree that contains the tree that contains the task.)

But this is not the case. Our tree might be a nested tree inside a scheduling entity, which is itself just a leaf in another tree.

So, the first thing we do is to call update_cfs_rq_h_load(cfs_rq) which computes the hierarchical load factor for cfs_rq and all its ascendants (ancestors): this function goes up the tree hierarchy all the way to the root, and down from the root to our cfs_rq while computing the hierarchical load factor for each tree in the hierarchy.

The computation is done in a similar way:

cfs_rq_h_load = cfs_rq_load_in_its_tree * (load_of_immediate_tree / load_of_containing_tree)

So, finally we have the fractional load of cfs_rq and all we have to do is to calculate the h_load using the same formula.

task_h_load = task_load_in_its_tree * (load_of_immediate_tree / load_of_containing_tree)