The Solaris Process Model: Managing Thread Execution and Wait Times in the System Clock Handler
The Solaris Process Model: Managing Thread Execution and Wait Times in the System Clock Handler   By Jim Mauro, 2000  

Abstract

This article examines the Solaris clock interrupt handler, in the context of dispatcher support functions.


Prioritizing and scheduling threads

Timing, as they say, is everything. When it comes to the prioritization and scheduling of kernel threads in Solaris, time is everything. Specifically, how much time threads have been running on a processor and how much time threads have waited to run on a processor, drive the priority re-calculation of timeshare (TS) and interactive (IA) class threads. Realtime (RT) threads run at a fixed priority, so no priority adjustment is necessary, although execution time is still tracked because a time quantum is applied to RT threads. System (SYS) class threads are even simpler as far as the kernel is concerned: they execute until they voluntarily surrender the processor.

The Solaris kernel handles the updating and tracking of thread execution and wait times in the system clock handler, which runs at regular intervals. On all current UltraSPARC systems, a clock interrupt is generated 100 times a second, or every 10 milliseconds. The kernel clock handler does two passes through the list of CPUs on the system by walking the linked list of CPU structures. The first pass simply checks for wait I/O, or swap wait, by examining the per-processor status information maintained in the cpu_stat structures. The count of runnable threads is also determined by summing the disp_nrunnable values in each processor's dispatch structure, where a total count of runnable threads on all the dispatch queues for the processor is maintained.

The second loop through the processor list does a bit more work. This is where we gather data on system-wide processor utilization by determining if a processor is idle, running a thread in user mode, or running in kernel mode. For every processor running a thread that is not an interrupt thread, the system will do the necessary "tick" processing on the thread, which involves updating the appropriate fields in the thread structure to track execution time. There are two members of the thread structure that get checked and updated up front. The t_lbolt field stores the lbolt value from the last clock tick, and t_pctcpu stores the percentage of CPU time used by the thread since the last clock tick. The kernel maintains a timer called lbolt, which counts the number of clock ticks since boot time. It is incremented every clock tick (clock interrupt). The code checks the difference between the current lbolt value, and the threads t_lbolt value, and if t_lbolt is less than lbolt, you need to do tick processing for the thread. Before actually calling the kernel clock_tick() code, the threads t_pctcpu value is re-calculated, and the lbolt value is set in the thread structure (i.e., thread_structure.t_lbolt = lbolt).

Clock_tick() and other tick() functions

The clock_tick() function is a routine defined in the kernel clock handler, and is used to update the user or system time charged to the thread, depending on its mode. Before that happens, however, the scheduling class-specific clock tick handler is invoked via the CL_TICK(t) macro, where "t" is a pointer to the kernel thread. Threads in the TS or IA scheduling class will resolve the macro to the TS class ts_tick() routine, and RT class threads will resolve to the rt_tick(). There is not a clock tick handler for SYS class threads.

The ts_tick() routine first checks whether or not the kernel thread is running in kernel mode (i.e., running at a SYS priority). If it is, the majority of ts_tick processing is circumvented. You'll revisit this situation, but first, walk through the code path for threads not running in kernel mode. Remember the class-specific data structures to which every kernel thread has a link (or pointer)? The class data structure, ts_data, which is used for both TS and IA threads, maintains (among other things) a "timeleft" field that tracks how much time the thread has left in its time quantum. If the thread has used its time quantum up (after decrementing ts_timeleft, the value is <= 0), the kernel must prepare the thread to relinquish the processor and get context switched off. First, however, the operating system must determine if preemption control has been applied to this thread, and if so, whether the thread has been given enough extra CPU cycles.

Preemption control is an implementation of Solaris scheduler activations, and is a fairly recent addition to Solaris (release 2.6). The interfaces for preemption control are now part of the standard Solaris APIs and, as such, are documented via man pages schedctl_init(3X), schedctl_start(3X), etc. Scheduler activations gives an application the ability to "ask" the operating system not to remove a thread from a processor (context switch off) even if the thread's time quantum has expired. The purpose of such an interface is to provide a little hint to the operating system for threads that are holding a critical resource, such as a semaphore or mutex lock. Such threads should execute until they're done with whatever they're doing, so they can release the resource.

Consider this simple scenario: An application uses a mutex lock to synchronize access to a shared memory segment. The shared segment is a critical resource, since much of the application work requires that the threads can read/write data in the shared segment at some point during execution. A thread executes, grabs the mutex, and before finishing, gets context switched out because it has used up its time quantum. Other threads get scheduled to do work, they start running and attempt to get the mutex, but it's being held so they block. The thread holding the mutex is sitting on a dispatch queue, waiting for its turn to run again. Only when that happens will it (hopefully) have a chance to finish what it's doing with the shared segment and release the lock. With scheduler activations, you can issue a schedctl_start(3X) to notify the dispatcher that you're entering a critical code section and do a subsequent schedctl_stop(3X) when you leave the critical code segment. Note that there's some setup code to do prior to issuing a schedctl_start(3X); see the man pages for details.

Going back to the ts_tick() code, the kernel checks to see if a scheduler activation has been issued for the thread, and if it has, it will not force the thread to be preempted even if it has used up its time quantum. As kind of a fail-safe mechanism, this is not allowed to go on indefinitely. If the thread has not been preempted for a couple clock ticks, the kernel will set it up for preemption. In the absence of any scheduler activations, the thread will get a new priority based on the tqexp field of the TS dispatch table, indexed via the threads current priority. For example, if the thread is currently at priority 50, the corresponding tqexp value in the table is 40, resulting in the thread's ts_cpupri field being set to 40--a worse priority, because the thread ran through its time quantum. That done, the thread's new ts_umdpri is determined based on any priocntl(2) hints that may have been set for the thread. The kernel code will adjust the thread's position on the run queue if the result of the new priority requires it.

If, upon entering the ts_tick() routine, the thread has not yet used its time quantum, none of the above code is applied. Rather, the kernel will check to see if the thread's current priority is less than the highest priority thread sitting on any of the processor's dispatch queues (the queue's disp_maxrunpri field), and if it is, the thread will be forced to surrender the CPU. This situation can happen if a higher priority thread has been placed on the dispatch queue at some point since this thread was last running on a processor when a clock interrupt came in.

At this point we return to the clock routine to do a little additional housekeeping. The user mode time or system mode time fields are incremented according to the mode the thread is running in the lwp (lwp_utime, lwp_stime) and at the process level (p_utime, p_stime). If interval timers have been enabled (setitimer(2) system call) for virtual time keeping or profiling, timer expiration is checked and the appropriate signal is sent (e.g., SIGVTALRM for virtual time expiration or SIGPROF for profiling timer expiration). Lastly, the CPU rlimit is enforced, and if the thread has exceeded its CPU limit, a SIGXCPU signal is sent causing the process to exit and coredump. That is a "CPU" Limit Exceeded coredump" message will be sent to stderr.

For RT threads, the tick handler is quite simple. If an RT thread does not have an infinite time quantum, and it has used its alloted time quantum, it is forced to surrender the processor. Also, if a higher priority thread is now on the dispatch queue (e.g., disp_maxrunpri is greater than the current thread's priority), it will surrender the processor. If none of the aforementioned conditions are true, the thread will remain on the processor. This housekeeping, described above, will be done when rt_tick() returns to the clock handler. There's no priority re-calculation done, as RT threads are fixed priority.


RESOURCES
About the author

Jim Mauro is a Senior Staff Engineer in the Performance and Availability Engineering group at Sun Microsystems, where he focuses on system availability and failure recovery. When not working or writing, Jim enjoys building Legos with his 2 sons, reading a wide variety of fiction and non-fiction, listening to music, and drooling over the next upgrade of his stereo system.

2000


Reprinted with permission from the March 1999 edition of SunWorld magazine. Copyright Web Publishing Inc., an IDG Communications company.

Rate and Review Tell us what you think of the content of this page. Excellent   Good   Fair   Poor   Comments:
If you would like a reply to your comment, please submit your email address:
Note: We may not respond to all submitted comments.
Close    To Top
  • Prev Article-OS:
  • Next Article-OS:
  • Now: Tutorial for Web and Software Design > OS > Solaris > OS Content
    Photoshop Tutorial
     

    Special Effect

      3D Effect
      Photoshop Articles
    Programming Tutorial
     

    C/C++ Tutorial

      Visual Basic
      C# Tutorial
    Database Tutorial
     

    MySQL Tutorial

      MS SQL Tutorial
      Oracle Tutorial
    Geek Tutorial
     

    Blogging Tutorial

      RSS Tutorial
      Podcasting Tutorial
    Graphic Design Tutorial
      Coreldraw Tutorial
      Illustrator Tutorial
      3D Tutorials
    Webmaster Articles
     

    Domain Service

      Web Hosting
      Site Promotion
    Java Tutorial/ Articles
     

    Java Servlets

      JavaEE Tutorial
     

    JavaBeans Tutorial

    XML Tutorial/ Articles
     

    XML Style

      AJAX Tutorial
      XML Mobile
    Flash Tutorial/ Articles
     

    Flash Video

      Action Script
      Flash Articles
    OS Tutorial/ Articles
      Linux Tutorial
      Symbian Tutorial
      MacOS Tutorial
    Personal Tech
      Hardware Tutorial
      Software Tutorial
      Online Auction