C++ Compile-Time and Run-Time Performance
C++ Compile-Time and Run-Time Performance   By Lawrence Crowl, Sun Studio Development Engineering, Revised March 14, 2006  

The organization of your C++ software can have a significant impact on both compile-time and run-time performance. We present a number of techniques for improving performance. Topics include compilation, build environment, include files, inline functions, class design, function design, catching errors, memory allocation, and template design and instantiation. Because inconsiderate pursuit of performance can lead to unmaintainable software, we also discuss the engineering and maintenance implications of the techniques.

Introduction

The goal of any software development effort should be to produce a quality product in a timely manner. The run-time performance of the product contributes to its quality by delivering results faster. The compile-time performance of the product contributes to its timeliness by shortening the edit-compile-debug cycle. However, both run-time performance and compile-time performance are secondary factors in achieving timely quality. Therefore, you should consider run-time and compile-time performance improvements only when justified by improvements in overall product quality and timeliness.

This paper presents a number of techniques for improving the performance of C++ programs ["Programming languages -- C++", ISO/IEC 14882:2003] using the Sun C++ compiler. Most of these techniques are not specific to the Sun C++ compiler and may be more generally applied. However, these techniques are not a complete list, so also see the references. The techniques presented are tactical in nature, providing generally local and incremental improvements in performance. You can often gain significantly more performance by improvements in data structures, algorithms, abstractions, goals, and management.

Many of the techniques presented in this paper are not always helpful. For instance, some run-time performance improvements will degrade compile time and coding time. Likewise, some compile-time performance improvements will degrade run time. Fortunately, while all of a program contributes to compile time and coding time, usually only a small fraction of the program contributes most of the execution time. You can concentrate run-time performance improvements on the fraction of the program that contributes significantly to execution time, and apply compile-time performance improvements to the rest of the program.

The fraction of the program that contributes most of the execution time is usually not obvious. Furthermore, evidence indicates that developers are usually very poor judges of where the time is spent in programs. Therefore, you should apply performance improvements only after the performance is measured, and the hot spots are known, not suspected.

There are a number of tools available to measure performance. We recommend the Sun Studio Performance Analyzer [http://docs.sun.com/doc/819-3687], which is part of our product. For more system-level effects, truss and especially dtrace, can also be useful tools.

Compilation

We are continually improving our compilers, both in their compile time and the run time of generated programs. Therefore, the most cost-effective technique for improving run-time and compile-time performance is often to simply use the newest compiler.

Turnaround

To achieve the fastest build times, compile without -g and without optimization.

Optimization

The compiler flag -O expands to -xO3, which has good balance between compile time and run time. For faster compilation, use -xO1 or -xO2. For faster execution, there are a number of options, including -xO4, -xO5, -xipo, -xprofile, -xlinkopt, and -xalias_level.

Finally, a note of caution about the -fast flag. This flag is carefully interpreted to provide maximum performance on a single machine. It is not good choice for programs that are to run on a wide variety of machines.

Build Environment

The build environment can have a significant affect on product build times. Build times can easily be reduced by an order of magnitude by paying careful attention to system-level effects. The following rules are usually helpful, but their significance to overall build times can vary quite a bit.

Use file systems that are local to the process. NFS file access is significantly slower than local file access, and the compiler may need to read and write many files.

Use make, not scripts fired by make. Taking the time to write explicit dependencies on directory creation, etc, will generally yield faster build times.

Use flat make structures and efficient makefiles. The cost of yet another invocation of make can be quite large, particularly when the makefiles use macros that resolve to executing a program. [Miller 1997]

Use dmake, which is part of our product. Even on a single-processor machine, dmake can sometimes halve build times by overlapping computation and I/O.

Install a local compiler. For single compilations, the cost of loading the compiler from an NFS file system will never be amortized.

Libraries

Most C++ libraries are designed for good performance over a wide range of uses. For example, the C++ standard library embodies good algorithms with a significant number of services. However, their general applicability means that such libraries are not optimal for any given application. You can achieve better run-time performance, and often better compile-time performance, with your own custom library. However, designing, documenting, coding, and debugging a custom library is detrimental to timeliness. Make your own custom library only when you need product quality more than you need product timeliness.

Include Files

A significant cost in compilation of C++ programs is simply reading what can be a very large number of include files. Program organizations that reduce the number of files included can significantly reduce compile times.

Abstraction

The most effective technique for reducing file inclusion is to reduce interface complexity. Reducing interface complexity generally improves the understandability of the product and hence increases quality and timeliness, so good interfaces generally improve both compilation time and debugging time. You should understand abstractions that can help organize your program effectively [Gamma 1995].

However, unbridled abstraction can be detrimental to the product. For example, when a project group must use an external interface, the group often wraps the external interface by an internal interface. This internal interface is often just a simple renaming of the external interface, providing none of the decoupling benefits that abstraction normally provides, but consuming all of the costs in development time, compile time, and run time. So do not create abstractions that do not provide recognizable services.

As an example of how abstractions can improve compile times, consider the case of a service provided through an abstract base class and factory, rather than through full classes and the new operator. For example:

class service {
  public:
    virtual answer * query() = 0;
    virtual void statement( data * ) = 0;
    static service * generate();
  protected:
    service();
};
...
service * instance = service::generate;
instance->statement( data1 );
answer * result = instance->query();

In this example, the entire structure of the actual service is hidden from the clients of the service, and those clients need not spend the time to compile that structure, or include other structures implied by the internals of the service. Just as importantly, clients need not be recompiled when the internals of the service change, which can dramatically reduce incremental product build times.

Multiple Classes

Often several class definitions are needed to provide a service. These classes should all be defined in a single include file, rather than one-class per include file. This technique reduces the number of files opened, which can have a significant effect on compilation time.

Incomplete Classes

Many classes only get used indirectly within an include file. For instance, the answer and data classes in the above example are used only through pointers. In these cases, the compiler does not need the full definition of the classes, and incomplete definitions suffice. So, instead of

#include "answer.h"

use

class answer;

Guarded Includes

Because managing the proper inclusion of class definitions can often become difficult, programs tend to rely on idempotent, or self-protecting, include files. For example,

#ifndef SERVICE_H
  #define SERVICE_H
  class service { ... };
#endif

If the file has not been previously read, the class is defined, otherwise, the file is essentially empty. We can avoid the unnecessary read of the file by testing the guard on the outside of the include as well.

#ifndef SERVICE_H
  #include "service.h"
#endif

This technique is most effective when the compiler is hitting the limits of available main memory and as a result the file cache is ineffective.

Many compilers, including Sun Studio 11, will recognize the typical include guards and never open the file a second time. Thus, the additional guard at the include is redundant. So, this technique is only useful with some compilers.

Inline Functions

Inline function expansion can significantly increase run time, but the cost is significantly increased compilation time.

Do not use inline functions when you anticipate changes to the function definition and recompiling all callers is expensive. In particular, an inline function definition commits you to a much larger ABI. Do not commit yourself without due consideration.

Cost/Benefit

Calls to small and quick functions can be smaller and quicker when expanded inline than when called normally. Conversely, calls to large or slow functions can be larger and slower when expanded inline than when branched to. Furthermore, all calls to an inline function must be recompiled whenever the function definition changes. Consequently, the decision to use inline functions requires considerable care.

The general advice is to use inline functions when the inlined function results in fewer instructions than the call to the function, or when the application performs significantly faster with the function inline. Usually this advice means that inline functions are either trivial, or used extensively in the hot areas of the program.

Un-Inlinable Functions

The compiler cannot inline all function calls, so making the most effective use of function inlining may require some source changes. Use the +w option to learn when function inlining does not occur. The compiler currently will not inline a function below -xO4 under the following circumstances:

The function contains difficult control constructs, such as loops, switch statements, and try/catch statements.
Many times these functions execute the difficult control constructs infrequently. To inline such a function, split the function into two parts, an inner part that contains the difficult control constructs and an outer part that decides whether or not to call the inner part. This technique of separating the infrequent part from the frequent part of a function can improve performance even when the compiler can inline the full function.
The inline function body is large or complicated.
Apparently simple function bodies may be complicated because of calls to other inline functions within the body, or because of implicit constructor and destructor calls (as often occurs in constructors and destructors for derived classes). For such functions, inline expansion rarely provides significant performance improvement, and the function is best left un-inlined. As a general rule of thumb, constructors should be out-of-line until proven effective inline.
The arguments to an inline function call are large or complicated.
The compiler is particularly sensitive when the object for an inline member function call is itself the result of an inline function call. To inline functions with complicated arguments, simply compute the function arguments into local variables and then pass the variables to the function. This rewrite is particularly effective when the parameters are references.
The function contains static local variables.

However, our compiler's inlining effectiveness is growing, so what is true this release may not be true next release.

Conditional Inlining

Inlining reduces run time, but increases compile time. Unfortunately, the value of run time versus compile time is not fixed throughout program development. During initial development and debugging, compile time is significantly more important than run time. Conversely, during final program build and benchmarking, run time is more important than compile time. You can control the amount of inlining by making inline definitions conditional on a preprocessor variable. The technique has the general structure:

service.h:
#ifndef SERVICE_H
  #define SERVICE_H
  // Include files needed by the class definitions.
  // Declare incomplete classes needed by the class definitions.
  // Define the classes.
  // Declare functions that are never inlined.
  #ifdef INLINING
    #include "service.ii"
  #else // INLINING
    // Declare other functions when not inlined.
  #endif // INLINING
#endif // SERVICE_H
service.ii
// Include other files needed by inline function definitions.
// Declare incomplete classes needed by inline function definitions.
// For each function definition:
#ifdef INLINING
  inline
#endif INLINING
// Provide the function header and body.
service.cc
#include "service.h"
// Include other files needed for _all_ function definitions.
#ifndef INLINING
  #include "service.ii"
#endif // INLINING
// Define the functions that are never inlined.

The following example illustrates the above technique, as well as some techniques described earlier. (The code itself is meaningless.)

store.hstore.cc
#ifndef STORE_H
  #define STORE_H
  class store {
    public:
      store();
    private:
      long variable;
  };
#endif // STORE_H
#include "store.h"
store::store() { }
type.htype.ii
#ifndef TYPE_H
  #define TYPE_H
  #include "store.h"
  class sym;
  class type : store {
    public:
      sym* mine();
      type* strip();
    private:
      sym* my_var;
  };
  #ifdef INLINING
    #include "type.ii"
  #endif // INLINING
#endif // TYPE_H
#ifdef INLINING
  inline
#endif
sym* type::mine() {
    return my_var;
};
type.cc
#include "type.h"
#include "sym.h"
#ifndef INLINING
  #include "type.ii"
#endif // INLINING
type* type::strip() {
    return mine()->mine();
}
sym.hsym.ii
#ifndef SYM_H
  #define SYM_H
  #include "store.h"
  class type;
  class sym : store {
    public:
      sym* up();
      type* mine();
    private:
      sym* up_var;
      type* my_var;
  };
  extern sym* inner( sym* );
  #ifdef INLINING
    #include "sym.ii"
  #else
    extern sym* outer( sym* );
  #endif // INLINING
#endif // SYM_H
#include "type.h"
#ifdef INLINING
  inline
#endif // INLINING
type* sym::mine() {
    return my_var;
};
#ifdef INLINING
  inline
#endif // INLINING
sym* outer( sym* arg ) {
    return inner( arg->up() );
}
sym.cc
#include "sym.h"
#include "type.h"
#ifndef INLINING
  #include "sym.ii"
#endif // INLINING
sym* sym::up() {
    return up_var;
}
sym* inner( sym* arg ) {
    return arg->mine()->mine();
}
main.cc
#include "type.h"
#include "sym.h"
int main() {
    sym sv;
    sym* sp = sv.mine()->strip()->mine();
    return &sv == outer( sp );
}

This technique is particularly valuable during debugging because the compiler does not inline when the -g switch is present. As a consequence, during debugging, inline functions consume object space and compile time without any performance gain. Likewise, the reduced number of file inclusions needed by inlining also improves compile time. However, when inlining is enabled, compilations will generally include more files than they would without the technique. However, because inlining is generally most useful during optimization, and optimization itself is slow, the relative cost in compile time due to extra includes is low. So, for debuggable builds, use -g and do not define INLINING. For fast builds without debugging, do not use -g and do not define INLINING. For run-time optimized builds, use one of the optimization flags and define INLINING.

There is one strong note of caution in using this technique. All objects including files testing INLINING must be compiled with the same definition of INLINING. In practice, this restriction means that only application programmers can use the technique.

Class Design

The design of classes and class hierarchies can have a significant impact on run-time performance, through increased operation efficiency, reduced data size, and reduced code size.

Bit-Fields

Convert boolean and small integer values into bit-fields, and then place these fields adjacent to each other. This technique can substantially reduce data size, though it generally requires more instructions to implement. The cost of reading bit-fields depends primarily on how much instruction parallelism is available, while the cost of writing bit-fields is usually double the cost of writing a non-bit-field.

Field Order

Order data fields to limit the size of unused padding between fields. In particular, order fields by their types in the order long double, double, long long int, pointers, long int, float, int, short, and char. This transformation may not be feasible or reliable when the program uses typedefs that change with the build environment.

Devirtualizing Methods

Convert a virtual method to a non-virtual method when the virtualness of the method is not used. This technique will improve the speed of calls, and enable inlining for further speed improvements. In general, this technique should be applied towards the end of program development, when the abstractions used by the program have become stable and program run time is more important than development flexibility.

Converter Methods

The standard mechanism for dynamic cast is very general, and consequently is more expensive than most specific needs warrant. You can achieve similar functionality by providing dynamic converter methods rather than using dynamic_cast. For example, instead of

car* car_ptr = dynamic_cast<car*>( vehicle_ptr )

use

car* car_ptr vehicle_ptr->to_car()

where

class vehicle {
    virtual car* to_car() { return (car*)0; }
};
class car : vehicle {
    virtual car* to_car() { return this; }
};

Duplicate Methods

Rather than duplicate methods in different derived classes, define them in their least common base class. This transformation reduces the working set, which improves run time, and reduces the number of functions, which improves compile time.

Default Operators

Use the default operators. If a class definition does not declare a parameterless constructor, a copy constructor, a copy assignment operator, or a destructor, the compiler will implicitly declare them. These are called default operators. A C-like struct has these default operators. When the compiler builds a default operator, it knows a great deal about the work that needs to be done and can produce very good code. This code is often much faster than user-written code because the compiler can take advantage of assembly-level facilities while the programmer usually cannot. So, when the default operators do what is needed, the program should not declare user-defined versions of these operators.

Default operators are inline functions, so do not use default operators when inline functions are inappropriate (see the previous section). Otherwise, default operators are appropriate when:

  • The user-written parameterless constructor would only call parameterless constructors for its base objects and member variables. Primitive types effectively have "do nothing" parameterless constructors.
  • The user-written copy constructor would simply copy all base objects and member variables.
  • The user-written copy assignment operator would simply copy all base objects and member variables.
  • The user-written destructor would be empty.

Some C++ programming texts suggest that class programmers always define all operators so that any reader of the code will know that the class programmer did not forget to consider the semantics of the default operators. Obviously, this advice interferes with the optimization discussed above. The resolution of the conflict is to place a comment in the code stating that the class is using the default operator.

Value Classes

C++ classes, including structures and unions, are passed and returned by value. For Plain-Old-Data (POD) classes, the C++ compiler is required to pass the struct as would the C compiler. Objects of these classes are passed directly. For objects of classes with user-defined copy constructors, the compiler is effectively required to construct a copy of the object, pass a pointer to the copy, and destruct the copy after the return. Objects of these classes are passed indirectly. For classes that fall between these two requirements, the compiler can choose. However, this choice affects binary compatibility, so the compiler must choose consistently for every class.

For most compilers, passing objects directly can result in faster execution. This execution improvement is particularly noticeable with small value classes, such as complex numbers or probability values. You can sometimes improve program efficiency by designing classes that are more likely to be passed directly than indirectly.

In compatibility mode (-compat[=4]), a class is passed indirectly if it has any one of the following:

  • A user-defined constructor
  • A virtual function
  • A virtual base class
  • A base that is passed indirectly
  • A non-static data member that is passed indirectly

Otherwise, the class is passed directly.

In standard mode (the default mode), a class is passed indirectly if it has any one of the following:
  • A user-defined copy constructor
  • A user-defined destructor
  • A base that is passed indirectly
  • A non-static data member that is passed indirectly

Otherwise, the class is passed directly.

To maximize the chance that a class will be passed directly:

  • Use default constructors, especially the default copy constructor, where possible.
  • Use the default destructor where possible. The default destructor is not virtual, therefore a class with a default destructor should generally not be a base class.
  • Avoid virtual functions and virtual bases.

Classes (and unions) that are passed directly by the C++ compiler are passed exactly as the C compiler would pass a struct (or union). However, C++ structs and unions are passed differently on different architectures.

SPARC V8 (32-bit addresses)
Structs and unions are passed and returned by allocating storage within the caller and passing a pointer to that storage. (That is, all structs and unions are passed by reference.)
SPARC V9 (64-bit addresses)
Structs with a size no greater than 16 bytes (32 bytes) are passed (returned) in registers. Unions and all other structs are passed and returned by allocating storage within the caller and passing a pointer to that storage. (That is, small structs are passed in registers; unions and large structs are passed by reference.) As a consequence, small value classes are passed as efficiently as primitive types.
IA32 (32-bit addresses)
Structs and unions are passed by allocating space on the stack and copying the argument onto the stack. Structs and unions are returned by allocating a temporary object in the caller's frame and passing the address of the temporary object as an implicit first parameter. Because all parameters are passed on the stack, small value classes are passed as efficiently as primitive types.
AMD64 (64-bit addresses)
Structs with a size no greater than 16 bytes are passed and returned in registers. Unions and other structs are passed by allocating space on the stack and copying the argument onto the stack. Unions and other structs are returned by allocating a temporary object in the caller's frame and passing the address of the temporary object as an implicit first parameter.
(That is, small structs are passed in registers; unions and large structs are passed on the stack.) As a consequence, small value classes are passed as efficiently as primitive types.

Function Design

Function efficiency is the core of good run-time performance, and C++ functions often have subtle inefficiencies because the language often hides significant work behind very terse syntax.

Reference Parameters

Programmers often code functions with reference parameters rather than value parameters because "it is more efficient", rather than because the semantics are appropriate. However, as we saw above, value parameters may be more efficient, and even when they are not directly more efficient, the compiler knows that a value parameter cannot be aliased, and so can better optimize access to the parameter.

However, switching between reference parameters and value parameters is not neutral with respect to semantics, so care must be taken that the switch is appropriate. In particular, if the class has virtual functions, passing the class as a value parameter is probably not appropriate.

Temporary Objects

C++ functions often produce implicit temporary objects, each of which must be created and destroyed. For non-trivial classes, the creation and destruction of temporary objects can be expensive in terms of processing time and memory usage. The C++ compiler does eliminate some temporary objects, but it cannot eliminate all of them.

Write functions to minimize the number of temporary objects as long as your programs remain comprehensible. Techniques include using explicit variables rather than implicit temporary objects and using reference parameters rather than value parameters. Another technique is to implement and use operations such as += rather than implementing and using only + and =. For example, the first line below introduces a temporary object for the result of a + b, while the second line does not.

T x = a + b;
T x( a ); x += b;

Cache Member Variables

Accessing member variables is a common operation in C++ member functions.

The compiler must often load member variables from memory through the this pointer. Because values are being loaded through a pointer, the compiler sometimes cannot determine when a second load must be performed or whether the value loaded before is still valid. In these cases, the compiler must choose the safe, but slow, approach and reload the member variable each time it is accessed.

You can avoid unnecessary memory reloads by explicitly caching the values of member variables in local variables, as follows:

  • Declare a local variable and initialize it with the value of the member variable.
  • Use the local variable in place of the member variable throughout the function.
  • If the local variable changes, assign the final value of the local variable to the member variable. However, this optimization may yield undesired results if the member function calls another member function on that object.

This optimization is most productive when the values can reside in registers, as is the case with primitive types. The optimization may also be productive for memory-based values because the reduced aliasing gives the compiler more opportunity to optimize.

This optimization may be counter-productive if the member variable is often passed by reference, either explicitly or implicitly.

On occasion, the desired semantics of a class requires explicit caching of member variables, for instance when there is a potential alias between the current object and one of the member function's arguments. For example:

void complex::operator *= ( const complex& right ) {
    real = real * right.real + imag * right.imag;
    imag = real * right.imag + imag * right.real;
}

will yield unintended results when called with:

x *= x;

The cached version does not exhibit the problem.

void complex::operator *=(const complex& right){
    double this_real = real, this_imag = image;
    double that_real = right.real, that_imag = right.image;

    real = this_real *that_real + this_imag *that_image;
    imag = this_real *that_imag + this_imag *that_real;
}

Common Call Elimination

C++ programs usually rely heavily on procedure calls, and a function often calls the same function with the same data more than once. While compilers are generally effective at sharing the results of common subexpressions, they are not effective at sharing the results of common calls. For example,

person* grandfather = me->father()->father();
person* grandmother = me->father()->mother();

has a common call, and will generally be more efficient written as

person* dad = me->father();
person* grandfather = dad->father();
person* grandmother = dad->mother();

While this technique may seem obvious, and indeed it is, it often requires special attention because the extremely concise nature of C++ calls often provides insufficient reminders to consider efficiency.

Const Declarations

Many programmers use const reference parameters with the intent to inform the compiler that the parameter is read-only, and hence should be better optimized. Unfortunately, this intent is at odds with the C++ language definition. The const keyword says that the storage may not be modified through the given name. What it does not say is that the storage cannot be modified through some other name. With the exception of variables directly declared const, which means you can only initialize them, const is basically ineffective a improving run-time performance. It does, however, catch errors in the programming process.

Catching Run-Time Errors

While errors are generally uncommon, their very possibility can have a significant effect on program development time and on program performance.

Exit

In general, you should use assert, abort, or exit (or an application-specific variant) when program termination is acceptable. This technique yields the smallest intrusion on program performance.

Exceptions

When program termination is unacceptable and errors are rare, use exceptions. The C++ exception mechanism is designed for effective handling of rare events. The Sun C++ compiler has an extremely efficient implementation of exceptions when they are not thrown, and it is getting faster with each release. Furthermore, exceptions limit the coding needed to deal with the problem to the point of detection and the point of handling, and avoid handling by intermediates. However, exceptions have a high cost when actually thrown, and are thus not suitable when an error is common.

Return Codes

For common errors, the traditional function return codes provide the most efficient run-time solution.

Memory Allocation

Memory allocation can consume a very large amount of time. Reducing the amount of allocation is usually the most effective technique for improving performance.

Deallocating Memory

Once unnecessary allocations are eliminated, the next most effective technique for improving performance is to deallocate memory when it is no longer needed. This is best accomplished with explicit calls to delete. However, applications may lose control of their memory, and a conservative garbage collector can sometimes be a critical tool. The C++ compiler provides a conservative garbage collector with the option -library=gc.

Allocating Efficiently

Once memory is deallocated, you must turn to improving the efficiency of allocation. The option -library=gc provides a faster malloc and free. If that is not enough, use class-specific new and delete operators. These are commonly called pool allocators.

Finally, when processing character strings, consider using realloc when adjusting string size rather than separate new and delete.

Templates

Programs that use templates heavily can benefit quite a bit from careful template design.

Factoring Common Code

The simple intent of templates is to adapt a general class or function design to a particular type. This is readily accomplished by a simple template design. Unfortunately, these simple designs often result in template instances with large amounts of identical code because the entire class or function is specialized, not just the part that must be specialized.

template< typename T >
struct list {
    T* chain;
    T* next() { return chain; }
    void method();
};

You can reduce the amount of code generated, and hence the working set size of the application, by defining a common non-template base class and a derived template class. The base class contains the common code, and the derived class contains the part that must be specialized.

struct list_base {
    void* chain;
    void method();
};
template< typename T >
struct list : list_base {
    T* chain;
    T* next() { return (T*)chain; }
};

However, there are some costs associated with this technique. First, debugging may be hindered when the base class uses void* rather than a pointer to the actual type. Second, use of a base class will introduce another layer of function calls, which can cost more than the lower working set size justifies. Third, some type-based optimizations may no longer be possible, which is a concern only at very high levels of optimization.

Definitions

You can organize your template definitions in two ways: with definitions included and with definitions separated. The definitions-included organization places the template definitions within the template header file.

whatever.h
template< typename T >
int function( T ) { return 0; }

The definitions-separated organization places the template definitions in a separate file, which is read as-needed during template instantiation.

whatever.h
template< typename T >
int function( T );
whatever.cc
template< typename T >
int function( T ) { return 0; }

It is important to note that the template definition file is essentially an automatic include file. Do not compile this file.

The definitions-included organization requires the compiler to store template definitions, thus increasing compiler data size, even when no templates are instantiated, but it reads the least number of files. The definitions-separated organization avoids saving templates when none are instantiated, but requires searching directories and reading additional files when templates are instantiated. If a particular compilation usually does not instantiate a template, the definitions-separated organization is most efficient. Otherwise, the definitions-included organization is most efficient.

Instantiation

Template instantiation can have a major impact on compile time. The default instantiation mode of the compiler is -instances=global, which provides full support of the language with minimal programming effort, and good compilation time. However, it produces the largest object files.

The fastest template instantiation mode is -instances=explicit, but it requires manual instantiation of all templates needed, which can require as substantial amount of programmer effort to identify all the needed instantiations.

A slightly easier mode is -instances=semiexplicit, which requires manual instantiation of the directly used instances, but permits the compiler to automatically produce dependent instances. However, this mode restricts the kinds of templates one can use, and produces larger programs than -instances=explicit. See compiler manual for a discussion of the restrictions.

In summary, use the default instantiation mode when object size and compilation constraints are not severe. Use -instances=explicit or -instances=semiexplicit when both are important and you are willing to work for the benefits.

Summary

First and foremost, design your program well. Well-designed programs will often have good compile-time performance. If not, local changes will usually bring good compile-time performance. When the run-time performance is inadequate, explore compilation options and alternate libraries. When the run-time performance is still inadequate, reprogram the small portion of the code that contributes most to run time.

Further Reading

  • C++ User's Guide (Detailed information on the Sun Studio 11 C++ compiler, including command-line options.)
  • Performance Analyzer (This guide describes the performance analysis tools available with Sun Studio 11 software.)
  • Jon Louis Bentley, "Writing Efficient Programs", Prentice-Hall, 1982
  • Dov Bulka and David Mayhew, "Efficient C++: Performance Programming Techniques", Addison-Wesley, 2000
  • Tom Cargill, "C++ Programming Style", Addison-Wesley, 1992
  • Erich Gamma, et al., "Design Patterns: Elements of Reusable Object-Oriented Software", Addison-Wesley, 1995
  • "Programming languages -- C++", ISO/IEC 14882:2003
  • Scott Meyers, "Effective C++", Addison-Wesley, 1992
  • Scott Meyers, "More Effective C++", Addison-Wesley, 1996
  • Peter Miller, "Recursive Make Considered Harmful", AUUGN'97
Close    To Top
  • Prev Article-OS:
  • Next Article-OS:
  • Now: Tutorial for Web and Software Design > OS > Solaris > OS Content
    Photoshop Tutorial
     

    Special Effect

      3D Effect
      Photoshop Articles
    Programming Tutorial
     

    C/C++ Tutorial

      Visual Basic
      C# Tutorial
    Database Tutorial
     

    MySQL Tutorial

      MS SQL Tutorial
      Oracle Tutorial
    Geek Tutorial
     

    Blogging Tutorial

      RSS Tutorial
      Podcasting Tutorial
    Graphic Design Tutorial
      Coreldraw Tutorial
      Illustrator Tutorial
      3D Tutorials
    Webmaster Articles
     

    Domain Service

      Web Hosting
      Site Promotion
    Java Tutorial/ Articles
     

    Java Servlets

      JavaEE Tutorial
     

    JavaBeans Tutorial

    XML Tutorial/ Articles
     

    XML Style

      AJAX Tutorial
      XML Mobile
    Flash Tutorial/ Articles
     

    Flash Video

      Action Script
      Flash Articles
    OS Tutorial/ Articles
      Linux Tutorial
      Symbian Tutorial
      MacOS Tutorial
    Personal Tech
      Hardware Tutorial
      Software Tutorial
      Online Auction