Skip to content

Commit e3ab04b

Browse files
Extend documentation wrt mixing with other threading packages (#1702)
Co-authored-by: Alexandra <[email protected]>
1 parent 326ab8f commit e3ab04b

File tree

4 files changed

+220
-52
lines changed

4 files changed

+220
-52
lines changed

doc/GSG/next_steps.rst

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -19,9 +19,10 @@ After installing |short_name|, set the environment variables:
1919

2020
.. tip::
2121

22-
oneTBB can coordinate with Intel(R) OpenMP on CPU resources usage
23-
to avoid excessive oversubscription when both runtimes are used within a process.
24-
To enable this feature set up ``TCM_ENABLE`` environment variable to ``1``.
22+
oneTBB can coordinate with Intel(R) OpenMP on CPU resources usage to avoid
23+
excessive oversubscription when both runtimes are used within a process. To
24+
enable this feature set ``TCM_ENABLE`` environment variable to ``1``. For
25+
more details, see :ref:`avoid_cpu_overutilization`.
2526

2627

2728
Build and Run a Sample

doc/main/examples_testing/CMakeLists.txt

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -39,3 +39,11 @@ file(GLOB_RECURSE DOC_EXAMPLES_LIST "${_reference_examples_path}/*.cpp" "${_user
3939
foreach(_doc_example_path IN LISTS DOC_EXAMPLES_LIST)
4040
add_doc_example(${_doc_example_path})
4141
endforeach()
42+
43+
find_package(OpenMP)
44+
if (OpenMP_FOUND)
45+
target_compile_options(tbb_mixing_other_runtimes_example PRIVATE "${OpenMP_CXX_FLAGS}")
46+
target_link_options(tbb_mixing_other_runtimes_example PRIVATE "${OpenMP_CXX_FLAGS}")
47+
endif()
48+
49+

doc/main/tbb_userguide/appendix_B.rst

Lines changed: 85 additions & 49 deletions
Original file line numberDiff line numberDiff line change
@@ -4,71 +4,107 @@ Appendix B Mixing With Other Threading Packages
44
===============================================
55

66

7-
|full_name| can be mixed with other
8-
threading packages. No special effort is required to use any part of
9-
oneTBB with other threading packages.
7+
Correct Interoperability
8+
^^^^^^^^^^^^^^^^^^^^^^^^
9+
10+
You can use |short_name| with other threading packages. No additional
11+
effort is required.
1012

1113

1214
Here is an example that parallelizes an outer loop with OpenMP and an
13-
inner loop with oneTBB.
15+
inner loop with |short_name|.
1416

17+
.. literalinclude:: ./examples/tbb_mixing_other_runtimes_example.cpp
18+
:language: c++
19+
:start-after: /*begin outer loop openmp with nested tbb*/
20+
:end-before: /*end outer loop openmp with nested tbb*/
1521

16-
::
22+
23+
``#pragma omp parallel`` instructs OpenMP to create a team of
24+
threads. Each thread executes the code block statement associated with
25+
the directive.
26+
27+
``#pragma omp for`` indicates that the compiler should distribute
28+
the iterations of the following loop among the threads in the existing
29+
thread team, enabling parallel execution of the loop body.
30+
31+
32+
See the similar example with the POSIX\* Threads:
33+
34+
.. literalinclude:: ./examples/tbb_mixing_other_runtimes_example.cpp
35+
:language: c++
36+
:start-after: /*begin pthreads with tbb*/
37+
:end-before: /*end pthreads with tbb*/
38+
39+
40+
.. _avoid_cpu_overutilization:
41+
42+
Avoid CPU Overutilization
43+
^^^^^^^^^^^^^^^^^^^^^^^^^
44+
45+
While you can safely use |short_name| with other threading packages
46+
without affecting the execution correctness, running a large number of
47+
threads from multiple thread pools concurrently can lead to
48+
oversubscription. This may significantly overutilize system resources,
49+
affecting the execution performance.
1750

1851

19-
int M, N;
20-
 
52+
Consider the previous example with nested parallelism, but with an
53+
OpenMP parallel region executed within the parallel loop:
2154

22-
struct InnerBody {
23-
...
24-
};
25-
 
55+
.. literalinclude:: ./examples/tbb_mixing_other_runtimes_example.cpp
56+
:language: c++
57+
:start-after: /*begin outer loop tbb with nested omp*/
58+
:end-before: /*end outer loop tbb with nested omp*/
2659

27-
void TBB_NestedInOpenMP() {
28-
#pragma omp parallel
29-
{
30-
#pragma omp for
31-
for( int i=0; i<M; ++ ) {
32-
parallel_for( blocked_range<int>(0,N,10), InnerBody(i) );
33-
}
34-
}
35-
}
3660

61+
Due to the semantics of the OpenMP parallel region, this composition of
62+
parallel runtimes may result in a quadratic number of simultaneously
63+
running threads. Such oversubscription can degrade the performance.
3764

38-
The details of ``InnerBody`` are omitted for brevity. The
39-
``#pragma omp parallel`` causes the OpenMP to create a team of threads,
40-
and each thread executes the block statement associated with the pragma.
41-
The ``#pragma omp for`` indicates that the compiler should use the
42-
previously created thread team to execute the loop in parallel.
4365

66+
|short_name| solves this issue with Thread Composability Manager (TCM).
67+
It is an experimental CPU resource coordination layer that enables
68+
better cooperation between different threading runtimes.
4469

45-
Here is the same example written using POSIX\* Threads.
4670

71+
By default, TCM is disabled. To enable it, set ``TCM_ENABLE``
72+
environment variable to ``1``. To make sure it works as intended set
73+
``TCM_VERSION`` environment variable to ``1`` before running your
74+
application and check the output for lines starting with ``TCM:``. The
75+
``TCM: TCM_ENABLE 1`` line confirms that Thread Composability Manager is
76+
active.
77+
78+
79+
Example output:
4780

4881
::
4982

83+
TCM: VERSION 1.3.0
84+
<...>
85+
TCM: TCM_ENABLE 1
86+
87+
88+
When used with the OpenMP implementation of Intel(R) DPC++/C++ Compiler,
89+
TCM allows to avoid simultaneous scheduling of excessive threads in the
90+
scenarios similar to the one above.
91+
92+
93+
Submit feedback or ask questions about Thread Composability
94+
Manager through |short_name| `GitHub Issues
95+
<https://github.com/uxlfoundation/oneTBB/issues>`_ or `Discussions
96+
<https://github.com/uxlfoundation/oneTBB/discussions>`_.
97+
98+
99+
.. note::
100+
Coordination on the use of CPU resources requires support for Thread
101+
Composability Manager. For optimal coordination, make sure that each
102+
threading package in your application integrates with TCM.
103+
104+
105+
.. rubric:: See also
50106

51-
int M, N;
52-
 
53-
54-
struct InnerBody {
55-
...
56-
};
57-
 
58-
59-
void* OuterLoopIteration( void* args ) {
60-
int i = (int)args;
61-
parallel_for( blocked_range<int>(0,N,10), InnerBody(i) );
62-
}
63-
 
64-
65-
void TBB_NestedInPThreads() {
66-
std::vector<pthread_t> id( M );
67-
// Create thread for each outer loop iteration
68-
for( int i=0; i<M; ++i )
69-
pthread_create( &id[i], NULL, OuterLoopIteration, NULL );
70-
// Wait for outer loop threads to finish
71-
for( int i=0; i<M; ++i )
72-
pthread_join( &id[i], NULL );
73-
}
107+
* `End Parallel Runtime Scheduling Conflicts with Thread Composability
108+
Manager
109+
<https://www.intel.com/content/www/us/en/developer/videos/threading-composability-manager-with-onetbb.html>`_
74110

Lines changed: 123 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,123 @@
1+
#include <cstdint>
2+
#include <vector>
3+
#include <omp.h>
4+
#ifndef _WIN32
5+
#include <pthread.h>
6+
#endif
7+
#include <oneapi/tbb/global_control.h>
8+
#include <oneapi/tbb/parallel_for.h>
9+
10+
namespace nesting_tbb {
11+
12+
/*begin outer loop openmp with nested tbb*/
13+
int M, N;
14+
15+
struct InnerBody {
16+
int i;
17+
void operator()(tbb::blocked_range<int> const& r) const {
18+
for (auto j = r.begin(); j != r.end(); ++j) {
19+
// do the work for (i, j) element
20+
}
21+
}
22+
};
23+
24+
void TBB_NestedInOpenMP() {
25+
#pragma omp parallel
26+
{
27+
#pragma omp for
28+
for(int i = 0; i < M; ++i) {
29+
tbb::parallel_for(tbb::blocked_range<int>(0, N, 10), InnerBody(i));
30+
}
31+
}
32+
}
33+
/*end outer loop openmp with nested tbb*/
34+
35+
void test() {
36+
M = 2; N = 100;
37+
TBB_NestedInOpenMP();
38+
}
39+
40+
} // namespace nesting_tbb
41+
42+
#ifndef _WIN32
43+
namespace pthreads_and_tbb {
44+
45+
/*begin pthreads with tbb*/
46+
int M, N;
47+
48+
struct InnerBody {
49+
int i;
50+
void operator()(tbb::blocked_range<int> const& r) const {
51+
for (auto j = r.begin(); j != r.end(); ++j) {
52+
// do the work for (i, j) element
53+
}
54+
}
55+
};
56+
57+
void* OuterLoopIteration(void* args) {
58+
int i = reinterpret_cast<intptr_t>(args);
59+
tbb::parallel_for(tbb::blocked_range<int>(0, N, 10), InnerBody(i));
60+
return nullptr;
61+
}
62+
63+
void TBB_NestedInPThreads() {
64+
std::vector<pthread_t> id(M);
65+
// Create thread for each outer loop iteration
66+
for(int i = 0; i < M; ++i) {
67+
std::intptr_t arg = i;
68+
pthread_create(&id[i], NULL, OuterLoopIteration, (void*)arg);
69+
}
70+
// Wait for outer loop threads to finish
71+
for(int i = 0; i < M; ++i)
72+
pthread_join(id[i], NULL);
73+
}
74+
/*end pthreads with tbb*/
75+
76+
void test() {
77+
M = 2; N = 100;
78+
TBB_NestedInPThreads();
79+
}
80+
81+
} // namespace pthreads_and_tbb
82+
#endif // _WIN32
83+
84+
namespace nesting_omp {
85+
86+
/*begin outer loop tbb with nested omp*/
87+
int M, N;
88+
89+
void InnerBody(int i, int j) {
90+
// do the work for (i, j) element
91+
}
92+
93+
void OpenMP_NestedInTBB() {
94+
tbb::parallel_for(0, M, [&](int i) {
95+
#pragma omp parallel for
96+
for(int j = 0; j < N; ++j) {
97+
InnerBody(i, j);
98+
}
99+
});
100+
}
101+
/*end outer loop tbb with nested omp*/
102+
103+
void test() {
104+
M = 2; N = 100;
105+
OpenMP_NestedInTBB();
106+
}
107+
108+
} // namespace nesting_omp
109+
110+
111+
int main() {
112+
// Setting maximum number of threads for both runtimes to avoid
113+
// oversubscription issues
114+
constexpr int max_threads = 2;
115+
omp_set_num_threads(max_threads);
116+
tbb::global_control gl(tbb::global_control::max_allowed_parallelism, max_threads);
117+
118+
nesting_tbb::test();
119+
#ifndef _WIN32
120+
pthreads_and_tbb::test();
121+
#endif
122+
nesting_omp::test();
123+
}

0 commit comments

Comments
 (0)