|
| 1 | +# Waiting in a task_arena |
| 2 | + |
| 3 | +For more details on waiting for work in a task arena, see |
| 4 | +[the corresponding RFC proposal](../../proposed/task_arena_waiting/readme.md). |
| 5 | +This document covers parts that have been implemented in oneTBB. |
| 6 | + |
| 7 | +## Motivation |
| 8 | + |
| 9 | +Task arenas in oneTBB are the places for threads to share and execute tasks. |
| 10 | +A `task_arena` instance represents a configurable execution context for parallel work. |
| 11 | + |
| 12 | +There are two primary ways to submit work to an arena: the `execute` and `enqueue` functions. |
| 13 | +Both take a callable object and run it in the context of the arena. The callable object |
| 14 | +might start more parallel work in the arena by invoking a oneTBB algorithm, running a flow graph, |
| 15 | +or submitting work into a task group. |
| 16 | +`execute` is a blocking call: the calling thread does not return until the callable object |
| 17 | +completes. `enqueue` is a “fire-and-forget” call: the calling thread submits the callable |
| 18 | +object as a task and returns immediately, providing no way to synchronize with the completion |
| 19 | +of the task. |
| 20 | + |
| 21 | +Therefore, there was no convenient way to submit work for asynchronous execution **and** later wait |
| 22 | +for completion of that work. |
| 23 | + |
| 24 | +### Earlier solution: combining with a task group |
| 25 | + |
| 26 | +In oneTBB, asynchronous execution is supported by `task_group` and the flow graph API; both allow |
| 27 | +submitting a job and waiting for its completion later. |
| 28 | +Notably, both require calling `wait`/`wait_for_all` to ensure that |
| 29 | +the work will be done. The `task_arena::enqueue`, on the other hand, being "fire-and-forget", enforces |
| 30 | +availability of another thread in the arena to execute the task (so-called *mandatory concurrency*). |
| 31 | + |
| 32 | +So, a reasonable solution for the described use cases seems to combine a `task_arena` with a `task_group`. |
| 33 | +However, it was notoriously non-trivial to do right. For example, the following "naive" attempt is subtly |
| 34 | +incorrect: |
| 35 | +```cpp |
| 36 | +tbb::task_arena ta{/*args*/}; |
| 37 | +tbb::task_group tg; |
| 38 | +ta.enqueue([&tg]{ tg.run([]{ foo(); }); }); |
| 39 | +bar(); |
| 40 | +ta.execute([&tg]{ tg.wait(); }); |
| 41 | +``` |
| 42 | +The problem is that `enqueue` submits a task that calls `tg.run` to add `[]{ foo(); }` to the task group, |
| 43 | +but it is unknown if that task was actually executed prior to `tg.wait`. Simply put, |
| 44 | +the task group might yet be empty, in which case `tg.wait` exits prematurely. |
| 45 | +
|
| 46 | +To avoid that, `execute` can be used instead of `enqueue`, but then the mentioned |
| 47 | +thread availability guarantee is lost. The approach with `execute` is shown in the |
| 48 | +[oneTBB Developer Guide](https://oneapi-src.github.io/oneTBB/main/tbb_userguide/Guiding_Task_Scheduler_Execution.html) |
| 49 | +as an example to split the work across several NUMA domains. The example utilizes the fork-join |
| 50 | +synchronization pattern to ensure that the work is complete |
| 51 | +in all the arenas. It also illustrates that the problem stated in this proposal is relevant. |
| 52 | +
|
| 53 | +A better way of using these classes together, however, is the following: |
| 54 | +```cpp |
| 55 | +tbb::task_arena ta{/*args*/}; |
| 56 | +tbb::task_group tg; |
| 57 | +ta.enqueue(tg.defer([]{ foo(); })); |
| 58 | +bar(); |
| 59 | +ta.execute([&tg]{ tg.wait(); }); |
| 60 | +``` |
| 61 | +In this case, the task group "registers" a deferred task to run `foo()`, which is then enqueued |
| 62 | +to the task arena. The task is added by the calling thread, so we can be sure that `tg.wait` will not |
| 63 | +return until the task completes. |
| 64 | + |
| 65 | +## Implemented improvements |
| 66 | + |
| 67 | +To address extra complexity and verbosity of using together `task_arena` and `task_group`, the `enqueue` method |
| 68 | +of `task_arena` is overloaded to take `task_group` as the second argument, and a new method is added to wait |
| 69 | +for a task group: |
| 70 | +```cpp |
| 71 | +ta.enqueue([]{ foo(); }, tg); // corresponds to: ta.enqueue(tg.defer([]{ foo(); })); |
| 72 | +ta.wait_for(tg); // corresponds to: ta.execute([&tg]{ tg.wait(); }); |
| 73 | +``` |
| 74 | +
|
| 75 | +This API has been implemented since oneTBB 2022.3. |
| 76 | +See [Improve interoperability with task groups](task_group_interop.md) for more details. |
| 77 | +
|
| 78 | +The example code to split work across NUMA-bound task arenas can now look like this (assuming also |
| 79 | +a special function that creates and initializes a vector of arenas): |
| 80 | +```cpp |
| 81 | +std::vector<tbb::task_arena> numa_arenas = |
| 82 | + initialize_constrained_arenas(/*some arguments*/); |
| 83 | +std::vector<tbb::task_group> task_groups(numa_arenas.size()); |
| 84 | +
|
| 85 | +for(unsigned j = 0; j < numa_arenas.size(); j++) { |
| 86 | + numa_arenas[j].enqueue( (){/*some parallel stuff*/}, task_groups[j] ); |
| 87 | +} |
| 88 | +
|
| 89 | +for(unsigned j = 0; j < numa_arenas.size(); j++) { |
| 90 | + numa_arenas[j].wait_for( task_groups[j] ); |
| 91 | +} |
| 92 | +``` |
0 commit comments