SolarisTM
Resource Management
Peter Baer Galvin
With SolarisTM 9, Sun is bundling the previously unbundled Solaris
Resource Manager. How does it work, and how well does it work? This month, the
Solaris Companion takes it for a spin.
Before Solaris 9, some rudimentary resource management was included in Solaris.
For example, the psrset command controls processor sets. This command
has been available since Solaris 2.6. With this command, you can create and
delete processor sets (which are identified groups of processors), assign CPUs
to those sets, assign threads to those sets, and display information about the
system's processor sets. In this manner, certain tasks can be run on certain
CPUs, and those tasks can be limited to not using other CPUs. This base functionality
can be useful in a small number of situations, but for fine-grained management
of threads on a system, memory, I/O, disk use, and more flexible CPU control
are all required. Enter Solaris Resource Manager.
The Solaris Resource Manager (SRM) was an unbundled product before Solaris
9, and is now included at no cost in the Solaris 9 release from Sun. This review
covers that free version (as implemented in the 12/02 release). It is included
as part of the full operating system installation, so no extra effort is needed
to make it available on an S9 system.
Concepts
SRM consists of two rather disparate functions -- resource limitations and
fair share scheduling. Think of the first as an extension to the standard "limits"
that are settable within Solaris. The second is a new scheduler that manages
CPU scheduling based on allocated shares, rather than the usual use-the-most-CPU-cycles
kind of scheduling. The new scheduler will be described in next month's Solaris
Companion.
To clarify, SRM is in no way a replacement for domaining or other "pure" resource
use limiters. That is, a crash of the operating system will take down all processes
on that system (or within that domain), including all SRM jobs. So SRM can help
optimize use of a system and it can allow programs that might usually be mutually
exclusive to live in harmony on a system.
So how would you choose between multiple domains and dynamic reconfiguration
(DR), and Solaris Resource Manager? Domaining provides absolute operating system
separation, so a task within one domain cannot affect other domains. DR allows
resources to move between domains, but testing and planning must occur, and
issues like memory allocation must be resolved (as an application suddenly has
more memory available to it). SRM is more flexible but does not provide that
wall between applications. It should be used when fine-grained resource control
is required, when resource use changes might be frequent, and on systems without
domaining available. Of course, it could be used in conjunction with domains
for the most complete set of solutions.
Resource Limitations Theory
There are several concepts to understand before making use of limit management
within SRM, which include processes, tasks, and projects. Processes, tasks,
and projects are units of resource allocation. A task consists of one or more
processes, and a project is one or more tasks. For example, a process, task,
or project may be limited in how much CPU time it can use. If a project is limited,
then all tasks in that process inherit that limit. Likewise, a task limit is
applied to the resource use of all processes in that task.
With SRM, processes are assigned to tasks or projects at login or through
newtask, at, batch, or cron commands. Once these
logical collections are made, you can use commands such as prctl and
newtask to manage resource use by those groups, and commands like ps,
id, prstat, and the accounting subsystem to view system activities
based on those groups.
New resources that can be managed in this way include use of CPU cycles, number
of threads, amount of CPU time, and maximum address space (virtual memory).
This list expands the previous limitable resources of number of open files,
maximum file size, core dump size, and data and stack virtual memory size. One
key resource not yet included is physical memory. Network use is manageable
by the separate IPQoS facility (not discussed here, but possibly a topic for
a future column).
These resources can be set to have threshold values, and when a threshold
is reached a local or global action can be triggered. For example, the process
could be killed, or the event could simply be logged. These thresholds have
three privilege levels, as UNIX administrators might expect. "Basic" means that
the owner of the calling process can modify it; "privileged" means that only
the superuser can modify it; and "system" is fixed at boot time by the kernel.
System thresholds are set to the maximum of the resource that the kernel is
capable of providing.
Resource Limitations Fact
The Sun documentation about SRM is very good, with quite a few examples. It
is weird that Sun mixes network management and resource management into one
document, though. The manual is available at docs.sun.com: "System Administration
Guide: Resource Management and Network Services".
The definition and management of projects is done via configuration files
and command-line functions. (It can also be done via the Solaris Management
Console.) /etc/project is much like /etc/passwd in its format
and function. It provides project information that coincides with processes
on the system. As a simple example, the file can be edited with vi, or
the projadd, projmod and projdel commands can be used:
system:0::::
user.root:1::::
noproject:2::::
default:3::::
group.staff:10::::
testproject:11:For testing:pbg::
dontuse:12:Unused:::
projects lists for a user what projects are available:
$ projects
default testproject
All but the last two lines of the configuration file were there from the system
installation. Thus, by default, root processes run in project "system", and most
others in "default". This can be seen in the abridged ps output:
$ ps -eo user,project,comm
USER PROJECT COMMAND
root system sched
root system /etc/init
root system pageout
root default /usr/dt/bin/dtlogin
root system /usr/openwin/bin/fbconsole
pbg default dtaction
pbg default /usr/openwin/bin/speckeysd
pbg default /bin/ksh
For this example, "testproject" is used. If a user is listed as a valid member
of a project, he or she may execute tasks within that project. Only the superuser
can execute tasks within a project without being a project member.
A project can be further refined via this configuration file or commands.
The configuration file approach has the benefit of being resilient to reboots.
The file is read at boot time, or when SRM commands are executed. However, changes
made to the configuration file do not affect processes already running. This
example shows commands to manage the project space.
First, let's create a task within the "testproject" project with the newtask
command:
$ id -p
uid=101(pbg) gid=14(sysadmin) projid=3(default)
$ newtask -p testproject csh
% id -p
uid=101(pbg) gid=14(sysadmin) projid=11(testproject)
Also, any new child processes of a project member are also members of that project.
Note the membership enforcement:
$ newtask -p dontuse
newtask: user "pbg" is not a member of project "dontuse"
The most important resource management command is prctl. It cannot create
a project, but once processes are running within a project, it can manage their
resources.
For example, let's limit the number of threads within a task (assuming a process
is running in project "testproject"). The first command sets the "basic" limit
at five threads, and the second line sets the privileged limit at eight (that
command must be run as root, although the first one needn't). The third command
confirms those operations:
# prctl -n task.max-lwps -v 5 -e deny -i project testproject
# prctl -n task.max-lwps -t privileged -v 8 -e deny -i project testproject
# prctl -n task.max-lwps -i project testproject
2642: sh
task.max-lwps
5 basic deny
8 privileged deny
2147483647 system deny [ max ]
#
Next we spawn some threads in that project, within a task, to see the results:
$ newtask -p testproject
$ csh
sunny% csh
sunny% csh
sunny% csh
sunny% csh
sunny% csh
sunny% csh
sunny% csh
Vfork failed
Notice that the basic rule was not enforced, but that the privileged one was.
It is unclear what the basic resource limit priority is for, but privileged obviously
works. Of course another task could have been spawned, and it to would be allowed
eight threads in this example.
What if monitoring was desired, but not enforced limits? Enter the rctladm
command. But first, the action of our limit needs to change from deny to allow
(i.e., "none"):
# prctl -n task.max-lwps -t privileged -v 8 -d all -i project testproject
# prctl -n task.max-lwps -i project testproject
2847: sh
task.max-lwps
8 privileged none
2147483647 system deny [ max ]
# rctladm -e syslog task.max-lwps
# rctladm
process.max-address-space syslog=off [ lowerable deny no-local-action ]
process.max-file-descriptor syslog=off [ lowerable deny ]
process.max-core-size syslog=off [ lowerable deny no-local-action ]
process.max-stack-size syslog=off [ lowerable deny no-local-action ]
process.max-data-size syslog=off [ lowerable deny no-local-action ]
process.max-file-size syslog=off [ lowerable deny file-size ]
process.max-cpu-time syslog=off [ lowerable no-deny cpu-time inf ]
task.max-cpu-time syslog=off [ no-deny cpu-time no-obs inf ]
task.max-lwps syslog=notice
project.cpu-shares syslog=off [ no-basic no-local-action ]
The rctladm command tells the system to use syslog whenever the max-lwps
resource limit is reached. Note that for longer-term settings, /etc/rctladm.conf
is used. Now when the thread limit is exceeded, the offending command is allowed
but a syslog entry is made:
# tail -1/var/adm/messages
Feb 9 20:55:07 sunny genunix: [ID 883052 kern.notice] privileged rctl task.max-
lwps (value 8) exceeded by task 25
So on the whole the facility works nicely, although its rather limited in, well,
what can be limited.
Some other useful project-enabled commands include:
- prstat -J or -T -- Dynamically updated process list, including project
or task summaries
- pgrep -J or -T -- Display the process IDs of processes in the specified
project or task
- pkill -J or -T -- Kill only processes in the specified project or
task
Summary
Solaris Resource Manager is a welcome addition to the core operating system.
It allows control over processes and resources that was previously available
only via commercial tools. This kind of functionality continues the trend of
Solaris moving from a technical computing operating system to one that can accommodate
both technical and business uses, even within the same operating system instance.
This column described the concepts and showed some basic uses, but there is
quite a lot to this new Solaris facility. There are plenty of details that must
be considered as resource management is configured, initialized, and used. Many
were discussed here, but some that were not touched on include resource control
prioritization, and using global naming services such as LDAP and NIS+ for resource
management information. Also, extended accounting can be used to monitor resource
use on a project or task basis.
Overall, SRM is worth learning to allow systems managers and administrators
to gain more control over who is doing what on the computers they manage. Next
month, the Solaris Companion will look at the second half of SRM -- the Fair
Share Scheduler.
Peter Baer Galvin (http://www.petergalvin.info) is the Chief Technologist
for Corporate Technologies (www.cptech.com), a premier systems integrator
and VAR. Before that, Peter was the systems manager for Brown University's Computer
Science Department. He has written articles for Byte and other magazines,
and previously wrote Pete's Wicked World, the security column, and Pete's Super
Systems, the systems management column for Unix Insider (http://www.unixinsider.com).
Peter is coauthor of the Operating Systems Concepts and Applied Operating
Systems Concepts textbooks. As a consultant and trainer, Peter has taught
tutorials and given talks on security and systems administration worldwide.