Ensuring task execution order in ThreadPool Ensuring task execution order in ThreadPool multithreading multithreading

Ensuring task execution order in ThreadPool


Something like the following will allow serial and parallel tasks to be queued, where serial tasks will be executed one after the other, and parallel tasks will be executed in any order, but in parallel. This gives you the ability to serialize tasks where necessary, also have parallel tasks, but do this as tasks are received i.e. you do not need to know about the entire sequence up-front, execution order is maintained dynamically.

internal class TaskQueue{    private readonly object _syncObj = new object();    private readonly Queue<QTask> _tasks = new Queue<QTask>();    private int _runningTaskCount;    public void Queue(bool isParallel, Action task)    {        lock (_syncObj)        {            _tasks.Enqueue(new QTask { IsParallel = isParallel, Task = task });        }        ProcessTaskQueue();    }    public int Count    {        get{lock (_syncObj){return _tasks.Count;}}    }    private void ProcessTaskQueue()    {        lock (_syncObj)        {            if (_runningTaskCount != 0) return;            while (_tasks.Count > 0 && _tasks.Peek().IsParallel)            {                QTask parallelTask = _tasks.Dequeue();                QueueUserWorkItem(parallelTask);            }            if (_tasks.Count > 0 && _runningTaskCount == 0)            {                QTask serialTask = _tasks.Dequeue();                QueueUserWorkItem(serialTask);            }        }    }    private void QueueUserWorkItem(QTask qTask)    {        Action completionTask = () =>        {            qTask.Task();            OnTaskCompleted();        };        _runningTaskCount++;        ThreadPool.QueueUserWorkItem(_ => completionTask());    }    private void OnTaskCompleted()    {        lock (_syncObj)        {            if (--_runningTaskCount == 0)            {                ProcessTaskQueue();            }        }    }    private class QTask    {        public Action Task { get; set; }        public bool IsParallel { get; set; }    }}

Update

To handle task groups with serial and parallel task mixes, a GroupedTaskQueue can manage a TaskQueue for each group. Again, you do not need to know about groups up-front, it is all dynamically managed as tasks are received.

internal class GroupedTaskQueue{    private readonly object _syncObj = new object();    private readonly Dictionary<string, TaskQueue> _queues = new Dictionary<string, TaskQueue>();    private readonly string _defaultGroup = Guid.NewGuid().ToString();    public void Queue(bool isParallel, Action task)    {        Queue(_defaultGroup, isParallel, task);    }    public void Queue(string group, bool isParallel, Action task)    {        TaskQueue queue;        lock (_syncObj)        {            if (!_queues.TryGetValue(group, out queue))            {                queue = new TaskQueue();                _queues.Add(group, queue);            }        }        Action completionTask = () =>        {            task();            OnTaskCompleted(group, queue);        };        queue.Queue(isParallel, completionTask);    }    private void OnTaskCompleted(string group, TaskQueue queue)    {        lock (_syncObj)        {            if (queue.Count == 0)            {                _queues.Remove(group);            }        }    }}


Thread pools are good for cases where the relative order of the tasks doesn't matter, provided they all get done. In particular, it must be OK for them all to be done in parallel.

If your tasks must be done in a specific order, then they are not suitable for parallelism, so a thread pool is not appropriate.

If you want to move these serial tasks off the main thread, then a single background thread with a task queue would be appropriate for those tasks. You can continue to use a thread pool for the remaining tasks which are suitable for parallelism.

Yes, it means you have to decide where to submit the task depending on whether it is an in-order task or a "may be parallelized" task, but this is not a big deal.

If you have groups that must be serialized, but which can run in parallel with other tasks then you have multiple choices:

  1. Create a single task for each group, which does the relevant group tasks in order, and post this task to the thread pool.
  2. Have each task in a group explicitly wait for the previous task in the group, and post them to the thread pool. This requires that your thread pool can handle the case where a thread is waiting for a not-yet-scheduled task without deadlocking.
  3. Have a dedicated thread for each group, and post group tasks on the appropriate message queue.


Basically, there are a number of pending tasks. Some of the tasks can only be performed when one or more other pending tasks have finished executing.

The pending tasks can be modeled in a dependency graph:

  • "task 1 -> task2" means "task 2 can be executed only after task 1 is finished." the arrows point in the direction of execution order.
  • the indegree of a task (the number of tasks pointing to it) determines whether the task is ready for execution. If the indegree is 0, it can be executed.
  • sometimes a task must wait for multiple tasks to finish, the indegree is then >1.
  • if a task doesn't have to wait for other tasks to finish anymore (its indegree is zero), it can be submitted to the thread pool with worker threads, or the queue with tasks waiting to be picked up by a worker thread. You know the submitted task will not cause deadlock, because the task isn't waiting for anything. As an optimization, you can use a priority queue, e.g. in which tasks that more tasks in the dependency graph depend on will be executed first. This also can't provoke deadlock, because all tasks in the thread pool can be executed. It can provoke starvation, however.
  • If a task finishes execution, it can be removed from the dependency graph, possibly reducing the indegree of other tasks, which can in turn be submitted to the pool of working threads.

So there is (at least) one thread used to add/remove pending tasks, and there is a thread pool of working threads.

When a task is added to the dependency graph, you must check:

  • how the task is connected in the dependency graph: what tasks must it wait for to finish and what tasks must wait for it to finish? Draw connections from and to the new task accordingly.
  • once the connections are drawn: did the new connections cause any cycles in the dependency graph? If so, there is a deadlock situation.

Performance:

  • this pattern is slower than sequential execution if parallel execution is in fact rarely possible, because you need extra administration to do everything almost sequentially anyway.
  • this pattern is fast if many tasks can be performed simultaneously in practice.

Assumptions:

As you may have read between the lines, you must design the tasks so that they don't interfere with other tasks. Also, there must be a way to determine the priority of the tasks. The task priority should include the data handled by each task. Two tasks may not alter the same object simultaneously; one of the tasks should get priority over the other one instead, or the performed operations on the object must be thread-safe.