$ cc -O -xprofile=collect:app.profile -xipo -o app *.c 
$ app < test_data 
$ cc -O -xprofile=use:app.profile -xipo -o app *.c 

Table 2 - Using profile feedback to optimise an application

Link-time optimisation

Mapfiles work at the routine level, and profile feedback works within routines; it would seem to be a simple progression to do both optimisations at the same time. This is possible with link-time optimisation (also called post-optimisation).

The principal of link-time optimisation is that the compiler has done its work, the code exists, and all that is necessary is to lay it out appropriately. In laying the code out appropriately, the link-time optimiser will sort the routines so that hot routines are placed together (in a similar way to mapfiles), and also lay out the code within those routines so that hot instructions are placed together. However, it is possible at link-time to go beyond this:

  • Since the hot code has been identified, it is possible to place all the hot-code together, and then place all the cold code together. The idea being to remove all code code from the hot region a€¡° placing code from different routines into the same region of memory.

  • It is also possible to do further optimisations since the addresses of variables and routines can be calculated exactly. Hence the link-time optimiser can simplify expressions which calculate the address of variables or routines -- this further reduces the instruction count.

Drawing 4 shows what an application will look like after it has been link-time optimised. The hot code will have been grouped together in one part of the binary, and the cold code in a separate part.

Frame4

The link-time optimisation step requires profile feedback data to work, so the necessary steps are as follows:

  • Build the application with the flags -xprofile=collect -xipo

  • Run the application with one or more representative workloads

  • Rebuild the application with -xprofile=use -xipo -xlinkopt

$ cc -O -xprofile=collect:app.profile -xipo -o app *.c
$ app < test_data
$ cc -O -xprofile=use:app.profile -xipo -o app *.c -xlinkopt

Table 3 - Combining link-time optimisation with profile feedback

Concluding remarks

Using these techniques on larger applications can yield significant performance gains. It should be noted that there is a cost in terms of increased build times, and increased build complexity; consequently the techniques should be evaluated as to whether the gain is worth the additional effort in the build. It should also be observed that not all builds of the application need to go through the process of optimising the code layout, development builds can be performed without this process, and the process only applied to the final product build.


About the Author
Darryl Gove is a senior staff engineer in Compiler Performance Engineering at Sun Microsystems Inc., analyzing and optimizing the performance of applications on current and future UltraSPARC systems. Darryl has an M.Sc. and Ph.D. in Operational Research from the University of Southampton in the UK. Before joining Sun, Darryl held various software architecture and development roles in the UK.

(Last updated June 22, 2005)
Improving Code Layout Can Improve Application Performance
Improving Code Layout Can Improve Application Performance   By Darryl Gove, Senior Performance Engineer, June 22, 2005  
Large applications have a particular problem: they have a lot of instructions, and the processor does not have the capacity to hold the entire application on-chip at any one time. As a consequence, larger applications spend some of their run time stalled with the processor waiting to fetch new instructions from memory. This paper discusses several techniques that help the processor to hold more useful instructions on-chip, consequently reducing the time wasted fetching data from memory.
but before doing that, it is important to realise that this doesn't just happen at the level of instructions. Whole routines are often either heavily used, or rarely used. Similarly libraries might be full of frequently used routines, or might be required only because of a single library call which almost never happens.

Since the compiler has the ability to change the way the code is laid out in memory, it is possible for the compiler to use memory more efficiently, but it will need more information to do this. The remainder of this article covers three different approaches that can be taken to improve the layout of the application in memory.

Reordering routines using mapfiles

One approach to improve the situation is to use mapfiles. Mapfiles are a facility that tell the linker how to layout routines in memory. To use these to improve the layout of the code it is necessary to order the routines from the most frequently used to the least

frequently used. The drawing 2 shows our original program from drawing 1 laid out from hot routines to cold using a mapfile.

Frame2

It is possible to manually generate mapfiles, but an easier approach is to use the Performance Analyzer:

  • Build the program using the flag -xF

  • Run the program with a representative workload under collect

  • Generate the mapfile using er_print -mapfile <app> <mapfilename>

  • Rebuild the application with the flags -xF -M <mapfile>

Once a mapfile is generated for an application, the same mapfile can be used on subsequent compiles until either the profile of the application changes, routines are renamed, or additional routines are added.

$ cc -O -xF -o app *.c
$ collect app < test_data
   Creating experiment test.1.er ...
$ er_print -mapfile app app.map test.1.er
$ cc -O -xF -M app.map -o app *.c

Table 1 - Creating a mapfile using the Performance Analyzer tools

Improving the layout of instructions using profile feedback

Mapfiles work very well at the routine level to separate frequently executed routines from infrequently executed routines. However, much of the time is spent at the instruction level, where the processor has to jump over blocks of unexecuted code. Profile feedback is a compiler technique for improving this situation.

The idea with profile feedback is to give the compiler information about how the code is typically run, based on this information it can do optimisations of the following types:

  • Arrange code so that the frequently executed code in a routine is grouped together.

  • Inline routines that are frequently called, to both remove the cost of calling the routine, and potentially to enable further optimisation of the inlined code.

Profile feedback works best with crossfile optimisation (controlled by the flag -xipo) since this allows the compiler to look at potentially optimisations between all source files.

The drawing 3 shows how profile feedback can rearrange code within a routine to put the frequently executed code together.

 

Frame3

Profile feedback is relatively straightforward to use:

  • Build the application with -xprofile=collect -xipo

  • Run the application with one or more representative workloads

  • Rebuild the application with -xprofile=use -xipo

Notice the inclusion of the -xipo flag to enable the compiler to do optimisations across the source files.

Close    To Top
  • Prev Article-OS:
  • Next Article-OS:
  • Now: Tutorial for Web and Software Design > OS > Solaris > OS Content
    Photoshop Tutorial
     

    Special Effect

      3D Effect
      Photoshop Articles
    Programming Tutorial
     

    C/C++ Tutorial

      Visual Basic
      C# Tutorial
    Database Tutorial
     

    MySQL Tutorial

      MS SQL Tutorial
      Oracle Tutorial
    Geek Tutorial
     

    Blogging Tutorial

      RSS Tutorial
      Podcasting Tutorial
    Graphic Design Tutorial
      Coreldraw Tutorial
      Illustrator Tutorial
      3D Tutorials
    Webmaster Articles
     

    Domain Service

      Web Hosting
      Site Promotion
    Java Tutorial/ Articles
     

    Java Servlets

      JavaEE Tutorial
     

    JavaBeans Tutorial

    XML Tutorial/ Articles
     

    XML Style

      AJAX Tutorial
      XML Mobile
    Flash Tutorial/ Articles
     

    Flash Video

      Action Script
      Flash Articles
    OS Tutorial/ Articles
      Linux Tutorial
      Symbian Tutorial
      MacOS Tutorial
    Personal Tech
      Hardware Tutorial
      Software Tutorial
      Online Auction