Skip to content

Commit 215a17d

Browse files
author
pavelkumbrasev
committed
Initial commit
Signed-off-by: pavelkumbrasev <[email protected]>
1 parent 2a7e0db commit 215a17d

File tree

2 files changed

+108
-0
lines changed

2 files changed

+108
-0
lines changed
Lines changed: 108 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,108 @@
1+
# Adding API for parallel block to task_arena to warm-up/retain/release worker threads
2+
3+
## Introduction
4+
5+
In oneTBB, there has never been an API that allows users to block worker threads within the arena.
6+
This design choice was made to preserve the composability of the application.<br>
7+
Since oneTBB is a dynamic runtime based on task stealing, threads will migrate from one arena to
8+
another while they have tasks to execute.<br>
9+
Before PR#1352, workers moved to the thread pool to sleep once there were no arenas with active
10+
demand. However, PR#1352 introduced a busy-wait block time that blocks a thread for an
11+
`implementation-defined` duration if there is no active demand in arenas. This change significantly
12+
improved performance in cases where the application is run on high thread count systems.<br>
13+
The main idea is that usually, after one parallel computation ends,
14+
another will start after some time. The default block time is a heuristic to utilize this,
15+
covering most cases within its duration.
16+
17+
The default behavior of oneTBB with these changes does not affect performance when oneTBB is used
18+
as the single parallel runtime.<br>
19+
However, some cases where several runtimes are used together might be affected. For example, if an
20+
application builds a pipeline where oneTBB is used for one stage and OpenMP is used for a
21+
subsequent stage, there is a chance that oneTBB workers will interfere with OpenMP threads.
22+
This interference might result in slight oversubscription,
23+
which in turn might lead to underperformance.
24+
25+
This problem can be resolved with an API that indicates when parallel computation is done,
26+
allowing worker threads to be released from the arena,
27+
essentially overriding the default block-time.<br>
28+
29+
This problem can be considered from another angle. Essentially, if the user can indicate where
30+
parallel computation ends, they can also indicate where they start.
31+
32+
<img src="parallel_block_introduction.png" width=800>
33+
34+
With this approach, the user not only releases threads when necessary
35+
but also specifies a programmable block where worker threads should stick to the
36+
executing arena.
37+
38+
## Proposal
39+
40+
Let's consider the guarantees that an API for explicit parallel blocks can provides:
41+
* Start of parallel block:
42+
* Indicates the point from which the scheduler can use a hint and stick threads to the arena.
43+
* Serve as a warm-up hint to the scheduler, making some worker threads immediately available
44+
at the start of the real computatin.
45+
* "Parallel block" itself:
46+
* Scheduler can implement different busy-wait policies to retain threads in the arena.
47+
* End of parallel block:
48+
* Indicates the point from which the scheduler can drop a hint
49+
and unstick threads from the arena.
50+
* Indicates that worker threads should ignore
51+
the default block time (introduced by PR#1352) and leave.
52+
53+
Start of parallel block:<br>
54+
The warm-up hint should have similar guarantees as `task_arena::enqueue` from a signal standpoint.
55+
Users should expect the scheduler will do its best to make some threads available in the arena.
56+
57+
"Parallel block" itself:<br>
58+
The guarantee for retaining threads is a hint to the scheduler;
59+
thus, no real guarantee is provided. The scheduler can ignore the hint and
60+
move threads to another arena or to sleep if conditions are met.
61+
62+
End of parallel block:<br>
63+
It can indicate that worker threads should ignore the default block time but
64+
if work was submitted immediately after the end of the parallel block,
65+
the default block time will be restored.
66+
67+
But what if user would like to disable default block time entirely?<br>
68+
Because the heuristic of extended block time is unsuitable for the task submitted
69+
in unpredictable pattern and duration. In this case, there should be an API to disable
70+
the default block time in the arena entirely.
71+
72+
```cpp
73+
class task_arena {
74+
void indicate_start_of_parallel_block(bool do_warmup = false);
75+
void indicate_end_of_parallel_block(bool disable_default_block_time = false);
76+
void disable_default_block_time();
77+
void enable_default_block_time();
78+
};
79+
80+
namespace this_task_arena {
81+
void indicate_start_of_parallel_block(bool do_warmup = false);
82+
void indicate_end_of_parallel_block(bool disable_default_block_time = false);
83+
void disable_default_block_time();
84+
void enable_default_block_time();
85+
}
86+
```
87+
88+
If the end of the parallel block is not indicated by the user, it will be done automatically when
89+
the last public reference is removed from the arena (i.e., task_arena is destroyed or a thread
90+
is joined for an implicit arena). This ensures correctness is
91+
preserved (threads will not stick forever).
92+
93+
## Considerations
94+
95+
The retaining of worker threads should be implemented with care because
96+
it might introduce performance problems if:
97+
* Threads cannot migrate to another arena because they
98+
stick to the current arena.
99+
* Compute resources are not homogeneous, e.g., the CPU is hybrid.
100+
Heavier involvement of less performant core types might result in artificial work
101+
imbalance in the arena.
102+
103+
104+
## Open Questions in Design
105+
106+
Some open questions that remain:
107+
* Are the suggested APIs sufficient?
108+
* Are there additional use cases that should be considered that we missed in our analysis?
107 KB
Loading

0 commit comments

Comments
 (0)