Rework get_task and steal_task to better interact with out_of_work checks #1779

akukanov · 2025-07-11T22:43:34Z

This PR "revives" the improvement originated in #417, but takes a different approach to the problem.

To remind, when a thread looks for a task to take from a task pool, it might skip some tasks due to affinity or isolation restrictions, The task pool still contains pointers to the tasks, but the observable limits of the pool (the head, modified by thieves, and the tail, modified by the owning thread) might temporary exclude the skipped tasks. Due to that, another thread that inspects the arena for work availability might find the task pool "empty" and potentially mark the whole arena empty, causing premature leaving of worker threads. The current implementation mitigates that by issuing a "work advertisement" signal when the skipped tasks are "returned" to the observable pool.

The PR #417 tried to improve the implementation by adding "shadow" head and tail indexes for the slot inspection, which are not changed until an operation on the task pool is complete, and so they should never exclude "skipped" tasks. In my opinion, however, it puts the burden on the wrong side and complicates the arbitration protocol between the pool owner and thieves. As implemented, it also does not achieve the goal, as in the case of pool "exhaustion" the shadow limits would be temporarily reset, similar to the real limits.

This PR takes a different approach and puts more burden on the inspecting thread, which anyway has no tasks to execute. If that thread suspects the task pool to be empty after comparing its head and tail, it locks the pool and re-reads its state. By locking, any temporary modifications by stealing threads are prevented. To coordinate the inspection with changes made by the owning thread, a new flag is added into arena_slot. The flag is set by the owning thread in get_task if it skips one or more tasks, and is reset once the pool limits are restored. The flag is read and tested by the inspecting thread, and the slot is only considered empty when both the pool limits show no tasks and the skipping flag is not set.

Tests

not needed, existing tests should be sufficient

Documentation

not needed

Breaks backward compatibility?

No - the changes are not exposed in API or ABI

…ition

src/tbb/arena_slot.cpp

akukanov · 2025-07-17T10:43:26Z

@kboyarinov @isaevil @dnmokhov Please take a look.

src/tbb/arena_slot.cpp

kboyarinov · 2025-07-23T14:05:37Z

src/tbb/arena_slot.cpp

+
+    if ( tasks_skipped ) {
+        __TBB_ASSERT( is_task_pool_published(), nullptr ); // the pool was not reset
+        tail.store(T0, std::memory_order_release);


Do I understand correctly that we do not need to restore head here since it is a stealing thread responsibility and is done in the steal_task?

That is correct.

Generally, H0 represents the state of the pool head as it was seen by the owner; it might get outdated at any time. The core principle therefore is that the owner only works with the tail and does not change the head.

Indeed, if there was no conflict for the last task, the owner has no idea what the proper value for the head should be. And in case of a conflict the pool lock is taken and the head is re-read, and we can be sure that there is no skipped tasks beyond the head, so there is no need to change anything.

Prior to the patch, there is a store of H0 to the head - but it is done at the point where the pool is temporarily quiescent, and therefore it is safe. It "optimizes" the case when the task at the head is taken while others were skipped. In the patch, the pool is not reset if tasks were skipped, as that would also mislead observers. So this optimization cannot be safely performed anymore.

Co-authored-by: Konstantin Boyarinov <[email protected]>

akukanov · 2025-07-25T10:15:09Z

The commit 796db43 is for code refactoring and is not strictly necessary. It significantly changes get_task, and though the core logic there remains the same, some code blocks and checks had to be reordered. It's likely better to review it separately, on top of the previous commits. I can as well revert it in case you prefer to keep refactoring separate from the substantial changes.

src/tbb/arena_slot.h

isaevil

The PR looks good to me, though I think we probably should do some performance evaluation prior merging. Do we need to review #1805 first or can we merge this one and rebase #1805 on latest master?
I also have one more observation. So the inspecting thread checks whether task pools are empty during out_of_work call. And if during thorough inspection in case of skipped tasks it determines that the arena, in fact, still has some tasks (they will become available once owning thread restores the tail), the thread won't leave the arena since waiter.continue_execution(slot, t) will return true and the thread will be in the stealing loop again. But as I understand, tail might still not be restored, meaning that waiter.pause(slot) will again be invoked. But the backoff of waiter object is already in its reached threshold, meaning out_of_work will be instantly called. Shouldn't the backoff be reset once it returns back or that is not a big deal?

akukanov · 2025-08-07T12:14:01Z

... tail might still not be restored, meaning that waiter.pause(slot) will again be invoked. But the backoff of waiter object is already in its reached threshold, meaning out_of_work will be instantly called. Shouldn't the backoff be reset once it returns back or that is not a big deal?

I tend to think that it's not a big deal, and will in practice happen rather rarely - as there is seemingly much more for that inspecting thread to do in receive_or_steal_task than for the owner thread that at most has to check the last task in its pool. Also, since the inspection gives no clue where specifically a task was found, the thread after inspection will still do random victim selection, likely missing the slot where the tasks are - so the problem you described is already there it seems.

Do we need to review #1805 first or can we merge this one and rebase #1805 on latest master?

Unless this patch adds much overhead - which I do not think it does - merging it first seems better to me.

... I think we probably should do some performance evaluation prior merging.

Agreed.

akukanov changed the title ~~Change get_task to not reset the task pool if some tasks were skipped~~ Rework get_task and steal_task to better interact with out_of_work checks Jul 11, 2025

akukanov force-pushed the dev/improve-task-omission-akukanov branch from 0ec492d to 0e277fe Compare July 11, 2025 23:02

Change get_task to not reset the task pool if some tasks were skipped

13d8ce9

akukanov force-pushed the dev/improve-task-omission-akukanov branch from 0e277fe to 13d8ce9 Compare July 11, 2025 23:08

akukanov added 5 commits July 12, 2025 01:26

Change get_task to never modify head, instead leave a hole at its pos…

2a20605

…ition

Change "omitted" to "skipped" in internal names

c7122e8

Add a flag to arena_slot indicating the it might have skipped tasks

e43b9a5

Examine a task pool thoroughly if it appears empty

c1f635d

Update copyrights

deda35d

akukanov force-pushed the dev/improve-task-omission-akukanov branch from abde43c to deda35d Compare July 14, 2025 17:42

akukanov requested review from aleksei-fedotov, dnmokhov, isaevil, kboyarinov and vossmjp July 14, 2025 19:21

akukanov marked this pull request as ready for review July 14, 2025 19:22

akukanov commented Jul 14, 2025

View reviewed changes

src/tbb/arena_slot.cpp Outdated Show resolved Hide resolved

akukanov added 3 commits July 14, 2025 21:49

Remove useless code

c975c58

Cleanup

c068596

Improve processing of the last task

8782e6b

akukanov assigned kboyarinov, isaevil and dnmokhov Jul 17, 2025

akukanov mentioned this pull request Jul 17, 2025

Reworked missed work avoidance mechanism #417

Closed

kboyarinov reviewed Jul 23, 2025

View reviewed changes

src/tbb/arena_slot.cpp Show resolved Hide resolved

Initialize has_skipped_tasks at arena construction

1eefab2

kboyarinov reviewed Jul 23, 2025

View reviewed changes

akukanov and others added 2 commits July 23, 2025 20:18

Remove an outdated comment

ddd859c

Co-authored-by: Konstantin Boyarinov <[email protected]>

Refactor get_task

796db43

akukanov force-pushed the dev/improve-task-omission-akukanov branch from 1da1b25 to 796db43 Compare July 25, 2025 09:08

Change the flag to cover the whole duration of get_task

6f0d728

akukanov force-pushed the dev/improve-task-omission-akukanov branch from 5469345 to 6f0d728 Compare July 28, 2025 21:50

Remove an incorrectly added assertion

bedfb38

isaevil reviewed Jul 30, 2025

View reviewed changes

src/tbb/arena_slot.h Outdated Show resolved Hide resolved

Fix a bug found during code review

e35c2c2

akukanov requested review from isaevil and kboyarinov July 30, 2025 18:49

Add a short comment for 'break'

baa3f2c

akukanov mentioned this pull request Aug 5, 2025

Specialize get_task for the case of no isolation #1805

Open

3 tasks

isaevil approved these changes Aug 7, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Rework get_task and steal_task to better interact with out_of_work checks #1779

Rework get_task and steal_task to better interact with out_of_work checks #1779

Uh oh!

akukanov commented Jul 11, 2025 •

edited

Loading

Uh oh!

Uh oh!

akukanov commented Jul 17, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

kboyarinov Jul 23, 2025

Uh oh!

akukanov Jul 23, 2025 •

edited

Loading

Uh oh!

akukanov commented Jul 25, 2025 •

edited

Loading

Uh oh!

Uh oh!

isaevil left a comment

Uh oh!

akukanov commented Aug 7, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Rework get_task and steal_task to better interact with out_of_work checks #1779

Are you sure you want to change the base?

Rework get_task and steal_task to better interact with out_of_work checks #1779

Uh oh!

Conversation

akukanov commented Jul 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Tests

Documentation

Breaks backward compatibility?

Uh oh!

Uh oh!

akukanov commented Jul 17, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

kboyarinov Jul 23, 2025

Choose a reason for hiding this comment

Uh oh!

akukanov Jul 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

akukanov commented Jul 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

isaevil left a comment

Choose a reason for hiding this comment

Uh oh!

akukanov commented Aug 7, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

akukanov commented Jul 11, 2025 •

edited

Loading

akukanov Jul 23, 2025 •

edited

Loading

akukanov commented Jul 25, 2025 •

edited

Loading