@@ -4,71 +4,107 @@ Appendix B Mixing With Other Threading Packages
44===============================================
55
66
7- |full_name | can be mixed with other
8- threading packages. No special effort is required to use any part of
9- oneTBB with other threading packages.
7+ Correct Interoperability
8+ ^^^^^^^^^^^^^^^^^^^^^^^^
9+
10+ You can use |short_name | with other threading packages. No additional
11+ effort is required.
1012
1113
1214Here is an example that parallelizes an outer loop with OpenMP and an
13- inner loop with oneTBB .
15+ inner loop with | short_name | .
1416
17+ .. literalinclude :: ./examples/tbb_mixing_other_runtimes_example.cpp
18+ :language: c++
19+ :start-after: /*begin outer loop openmp with nested tbb */
20+ :end-before: /*end outer loop openmp with nested tbb */
1521
16- ::
22+
23+ ``#pragma omp parallel `` instructs OpenMP to create a team of
24+ threads. Each thread executes the code block statement associated with
25+ the directive.
26+
27+ ``#pragma omp for `` indicates that the compiler should distribute
28+ the iterations of the following loop among the threads in the existing
29+ thread team, enabling parallel execution of the loop body.
30+
31+
32+ See the similar example with the POSIX\* Threads:
33+
34+ .. literalinclude :: ./examples/tbb_mixing_other_runtimes_example.cpp
35+ :language: c++
36+ :start-after: /*begin pthreads with tbb */
37+ :end-before: /*end pthreads with tbb */
38+
39+
40+ .. _avoid_cpu_overutilization :
41+
42+ Avoid CPU Overutilization
43+ ^^^^^^^^^^^^^^^^^^^^^^^^^
44+
45+ While you can safely use |short_name | with other threading packages
46+ without affecting the execution correctness, running a large number of
47+ threads from multiple thread pools concurrently can lead to
48+ oversubscription. This may significantly overutilize system resources,
49+ affecting the execution performance.
1750
1851
19- int M, N;
20-
52+ Consider the previous example with nested parallelism, but with an
53+ OpenMP parallel region executed within the parallel loop:
2154
22- struct InnerBody {
23- ...
24- };
25-
55+ .. literalinclude :: ./examples/tbb_mixing_other_runtimes_example.cpp
56+ :language: c++
57+ :start-after: /* begin outer loop tbb with nested omp */
58+ :end-before: /* end outer loop tbb with nested omp */
2659
27- void TBB_NestedInOpenMP() {
28- #pragma omp parallel
29- {
30- #pragma omp for
31- for( int i=0; i<M; ++ ) {
32- parallel_for( blocked_range<int>(0,N,10), InnerBody(i) );
33- }
34- }
35- }
3660
61+ Due to the semantics of the OpenMP parallel region, this composition of
62+ parallel runtimes may result in a quadratic number of simultaneously
63+ running threads. Such oversubscription can degrade the performance.
3764
38- The details of ``InnerBody `` are omitted for brevity. The
39- ``#pragma omp parallel `` causes the OpenMP to create a team of threads,
40- and each thread executes the block statement associated with the pragma.
41- The ``#pragma omp for `` indicates that the compiler should use the
42- previously created thread team to execute the loop in parallel.
4365
66+ |short_name | solves this issue with Thread Composability Manager (TCM).
67+ It is an experimental CPU resource coordination layer that enables
68+ better cooperation between different threading runtimes.
4469
45- Here is the same example written using POSIX\* Threads.
4670
71+ By default, TCM is disabled. To enable it, set ``TCM_ENABLE ``
72+ environment variable to ``1 ``. To make sure it works as intended set
73+ ``TCM_VERSION `` environment variable to ``1 `` before running your
74+ application and check the output for lines starting with ``TCM: ``. The
75+ ``TCM: TCM_ENABLE 1 `` line confirms that Thread Composability Manager is
76+ active.
77+
78+
79+ Example output:
4780
4881::
4982
83+ TCM: VERSION 1.3.0
84+ <...>
85+ TCM: TCM_ENABLE 1
86+
87+
88+ When used with the OpenMP implementation of Intel(R) DPC++/C++ Compiler,
89+ TCM allows to avoid simultaneous scheduling of excessive threads in the
90+ scenarios similar to the one above.
91+
92+
93+ Submit feedback or ask questions about Thread Composability
94+ Manager through |short_name | `GitHub Issues
95+ <https://github.com/uxlfoundation/oneTBB/issues> `_ or `Discussions
96+ <https://github.com/uxlfoundation/oneTBB/discussions> `_.
97+
98+
99+ .. note ::
100+ Coordination on the use of CPU resources requires support for Thread
101+ Composability Manager. For optimal coordination, make sure that each
102+ threading package in your application integrates with TCM.
103+
104+
105+ .. rubric :: See also
50106
51- int M, N;
52-
53-
54- struct InnerBody {
55- ...
56- };
57-
58-
59- void* OuterLoopIteration( void* args ) {
60- int i = (int)args;
61- parallel_for( blocked_range<int>(0,N,10), InnerBody(i) );
62- }
63-
64-
65- void TBB_NestedInPThreads() {
66- std::vector<pthread_t> id( M );
67- // Create thread for each outer loop iteration
68- for( int i=0; i<M; ++i )
69- pthread_create( &id[i], NULL, OuterLoopIteration, NULL );
70- // Wait for outer loop threads to finish
71- for( int i=0; i<M; ++i )
72- pthread_join( &id[i], NULL );
73- }
107+ * `End Parallel Runtime Scheduling Conflicts with Thread Composability
108+ Manager
109+ <https://www.intel.com/content/www/us/en/developer/videos/threading-composability-manager-with-onetbb.html> `_
74110
0 commit comments