How Many Threads Does It Take?
Tip: How Many Threads Does It Take?  
Sometimes we can observe an OpenMP program using a different number of threads each time it is run. Why does that happen?

For example, here is a program that appears in the OpenMP User's Guide to demonstrate nested parallelism. A team of more than one thread is executing a nested parallel region:

#include <omp.h>
#include <stdio.h>
void report_num_threads(int level)
{ 
#pragma omp single 
	{
		printf("Level %d: number of threads in the team - %d\n",
 			level, omp_get_num_threads()); 
	}
}
int main()
{
 	omp_set_dynamic(0);
 	#pragma omp parallel num_threads(2) 
	{ 
		report_num_threads(1);
 		#pragma omp parallel num_threads(2)
 		{ 
			report_num_threads(2); 
			#pragma omp parallel num_threads(2) 
			{ 
				report_num_threads(3);
 			}
 		}
 	}
 return(0);
 }  

Compiling and running this program with nested parallelism enabled produces the following output:

% setenv OMP_NESTED TRUE
% a.out
Level 1: number of threads in the team - 2
Level 2: number of threads in the team - 2
Level 2: number of threads in the team - 2
Level 3: number of threads in the team - 2
Level 3: number of threads in the team - 2
Level 3: number of threads in the team - 2
Level 3: number of threads in the team - 2

At level one two threads are created and each of those threads creates two more threads, and so on.

Compare this with the result by running the same program with nested parallelism disabled:

% setenv OMP_NESTED FALSE
% a.out
Level 1: number of threads in the team - 2
Level 2: number of threads in the team - 1
Level 3: number of threads in the team - 1
Level 2: number of threads in the team - 1
Level 3: number of threads in the team - 1

 

The User Guide goes on to demonstration how setting the SUNW_MP_MAX_POOL_THREADS environment variable can control the number of threads in the pool:

The thread pool consists of only non-user threads that the runtime library creates. It does not include the master thread or any thread created explicitly by the user's program. If this environment variable is set to zero, the thread pool will be empty and all parallel regions will be executed by one thread.

The following example shows that a parallel region can get fewer threads if there are not sufficient threads in the pool. The code is the same as above. The number of threads needed for all the parallel regions to be active at the same time is eight. The pool needs to contain at least seven idle threads. If we set SUNW_MP_MAX_POOL_THREADS to 5, two of the four inner-most parallel regions may not be able to get all the slave threads they ask for. One possible result is shown below.

% setenv OMP_NESTED TRUE
% setenv SUNW_MP_MAX_POOL_THREADS 5
% a.out
Level 1: number of threads in the team - 2
Level 2: number of threads in the team - 2
Level 2: number of threads in the team - 2
Level 3: number of threads in the team - 2
Level 3: number of threads in the team - 2
Level 3: number of threads in the team - 1
Level 3: number of threads in the team - 1

 

But you may run the same program and get the following output:

 % a.out
 Level 1: number of threads in the team - 2
 Level 2: number of threads in the team - 2
 Level 2: number of threads in the team - 2
 Level 3: number of threads in the team - 2   
 Level 3: number of threads in the team - 1  
Level 3: number of threads in the team - 2
Level 3: number of threads in the team - 2

Note here that there are seven level 3 threads, not the six shown in the first run. Is this a bug, or expected? And how can it be explained?

Well, note that the program can have at most eight (2x2x2) level 3 threads. Depending on how the operating system schedules the threads, a user may see six, seven, or eight level 3 threads.

At level 2, there are four threads, T1, T2, T3, T4. Each wants to create a parallel region with a team of two threads. The maximum number of threads in this progress is six (SUNW_MP_MAX_POOL_THREADS+1), so there are two threads can be used as slave threads at level 3.

If T1, T2, T3, and T4 try to acquire the slave threads at the same time, and T1 gets one, T2 gets one, but T3 and T4 are not able to get one. Then there are 2+2+1+1=6 level 3 threads. If T1 gets one and T2 gets one, and T1 finishes its parallel region and returns the slave thread it gets to the pool just at the moment that T3 tries to get a slave thread, it may be able to get the one returned by thread T1. Suppose thread T4 does not get one. Then there are 2+2+2+1=7 level 3 threads. If T4 is also able to get the one returned by T2, then there will be 2+2+2+2=8 level 3 threads. Any of these scenarios are possible, depending on the timing of the events and the scheduling of operating system.

And that is why the User Guide uses the phrase "one possible result".


(Page last updated May 3, 2005)
 
Rate and Review Tell us what you think of the content of this page. Excellent   Good   Fair   Poor   Comments:
If you would like a reply to your comment, please submit your email address:
Note: We may not respond to all submitted comments.
Close    To Top
  • Prev Article-OS:
  • Next Article-OS:
  • Now: Tutorial for Web and Software Design > OS > Solaris > OS Content
    Photoshop Tutorial
     

    Special Effect

      3D Effect
      Photoshop Articles
    Programming Tutorial
     

    C/C++ Tutorial

      Visual Basic
      C# Tutorial
    Database Tutorial
     

    MySQL Tutorial

      MS SQL Tutorial
      Oracle Tutorial
    Geek Tutorial
     

    Blogging Tutorial

      RSS Tutorial
      Podcasting Tutorial
    Graphic Design Tutorial
      Coreldraw Tutorial
      Illustrator Tutorial
      3D Tutorials
    Webmaster Articles
     

    Domain Service

      Web Hosting
      Site Promotion
    Java Tutorial/ Articles
     

    Java Servlets

      JavaEE Tutorial
     

    JavaBeans Tutorial

    XML Tutorial/ Articles
     

    XML Style

      AJAX Tutorial
      XML Mobile
    Flash Tutorial/ Articles
     

    Flash Video

      Action Script
      Flash Articles
    OS Tutorial/ Articles
      Linux Tutorial
      Symbian Tutorial
      MacOS Tutorial
    Personal Tech
      Hardware Tutorial
      Software Tutorial
      Online Auction