-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Open
Labels
Description
Hi everyone,
I am currently investigating how to best integrate TBB into an application (a database) that already uses its own thread-pool to distribute work. To combat issues of the data-parallelism in combination with skew, we'd like to employ TBB as a task-based parallelism alternative in certain situations.
The worker threads of the application are either executing something or waiting on a queue (a condition variable) for more work to arrive (e.g. a query run by the user).
application_worker_thread() {
while(true) {
task = queue.next(); // blocking wait
task.run(); // This may spawn TBB tasks
}
}The 2 main goals we'd like to address are
- Reducing oversubscription: Assuming all application worker threads are running some work and one of them spawns a large amount of TBB tasks, we are oversubscribing the CPU because the entire application thread pool and the TBB thread pool will be busy. The goal is to reduce the oversubscription, e.g. in this scenario that only the one application thread that spawned TBB task would execute them.
- Respecting resource group limits: The application also has a mechanism to limit resources depending on the workload (e.g. a query may only use 50% of all CPUs). Essentially we'd like to limit
#active application workers + #active tbb workers <= resource_limit
The directions I already thought about are as follows, but all of them have some drawbacks
- No explicit task_arenas at all: This means all application workers have their implicit task_arena and this leads to oversubscription if one application task spawns many TBB tasks.
- Explicit task arenas for each resource group: First it is a bit tricky to control when a thread enters these arenas, as application threads may switch between executing tasks from different resource groups. Second assume a resource limit of 50%, together with the TBB threads we will be using more than 50% of CPU.
- Essentially what I'd like to achieve is that a TBB worker thread is only active if and only if an application thread is currently waiting for a new task. I was thinking to regularly modify the global_control in the application worker thread loop, but I'm doubting whether this is a good idea.
gc = global_control(max_allowed_parallelism = hardware_concurrency)
num_active_workers = 0
application_worker_thread() {
while(true) {
task = queue.next(); // blocking wait
num_active_workers += 1
gc = global_control(max_allowed_parallelism = hardware_concurrency - num_active_workers)
task.run(); // This may spawn TBB tasks
num_active_workers -= 1
gc = global_control(max_allowed_parallelism = hardware_concurrency - num_active_workers)
}
}- another approach is that an application thread can join into a task_arena and execute TBB tasks until a new application task is available, but I didn't find a way to do this properly.
- running the application tasks directly with TBB could also help, but there are some TLS variables that are required to be setup correctly, so this is also a bit compicated.
Do you have any hints or recommendations on how to best integrate TBB into such an environment?