Building a 3D Engine in Perl, Part 4
by Geoff Broadwell
|
dprofpp is Your (Obtuse) Friend
With benchmark mode enabled, the engine runs under dprofpp. The
first step is to collect the profile data:
dprofpp -Q -p step065
-p step065 tells dprofpp to profile the
program named step065, and -Q tells it to
quit after collecting the data. dprofpp ran
step065, collected the profile data, and stored it in a
specially formatted text file named
tmon.out in the current directory.
To turn the profile data into human-readable output, I used
dprofpp without any arguments. It crunched the collected data for
a while and finally produced this:
$ dprofpp
Exporter::Heavy::heavy_export_to_level has 4 unstacked calls in outer
Exporter::export_to_level has -4 unstacked calls in outer
Exporter::export has -12 unstacked calls in outer
Exporter::Heavy::heavy_export has 12 unstacked calls in outer
Total Elapsed Time = 4.838377 Seconds
User+System Time = 1.498377 Seconds
Exclusive Times
%Time ExclSec CumulS #Calls sec/call Csec/c Name
88.1 1.320 1.320 1 1.3200 1.3200 SDL::SetVideoMode
38.1 0.571 0.774 294 0.0019 0.0026 main::draw_quad_face
16.0 0.240 0.341 8 0.0300 0.0426 SDL::OpenGL::BEGIN
13.0 0.195 0.195 64722 0.0000 0.0000 SDL::OpenGL::Vertex
11.3 0.170 0.170 1 0.1700 0.1700 DynaLoader::dl_load_file
9.34 0.140 0.020 12 0.0116 0.0017 Exporter::export
6.67 0.100 0.100 1001 0.0001 0.0001 SDL::in
4.00 0.060 0.060 1 0.0600 0.0600 SDL::Init
3.34 0.050 0.847 8 0.0062 0.1059 main::BEGIN
2.00 0.030 0.040 5 0.0060 0.0080 SDL::Event::BEGIN
1.80 0.027 0.801 49 0.0005 0.0163 main::draw_cube
1.47 0.022 0.022 2947 0.0000 0.0000 SDL::OpenGL::End
1.33 0.020 0.020 1 0.0200 0.0200 warnings::BEGIN
1.33 0.020 0.020 16 0.0012 0.0012 Exporter::as_heavy
1.33 0.020 0.209 5 0.0040 0.0418 SDL::BEGIN
There are several problems with this output. The numbers are clearly silly
(88 percent of its time spent in SDL::SetVideoMode?), the statistics for
the various BEGIN blocks are inconsequential to the task and in
the way, and the error messages at the top are rather disconcerting. To fix
these issues, dprofpp has the -g option, which tells
dprofpp to only display statistics for a particular routine and
its descendants:
$ dprofpp -g main::main_loop
Total Elapsed Time = 4.952042 Seconds
User+System Time = 0.812051 Seconds
Exclusive Times
%Time ExclSec CumulS #Calls sec/call Csec/c Name
70.3 0.571 0.774 294 0.0019 0.0026 main::draw_quad_face
24.0 0.195 0.195 64722 0.0000 0.0000 SDL::OpenGL::Vertex
3.32 0.027 0.801 49 0.0005 0.0163 main::draw_cube
2.71 0.022 0.022 2947 0.0000 0.0000 SDL::OpenGL::End
1.23 0.010 0.010 49 0.0002 0.0002 SDL::OpenGL::Rotate
1.11 0.009 0.009 7 0.0013 0.0013 main::prep_frame
1.11 0.009 0.009 70 0.0001 0.0001 SDL::OpenGL::Color
0.25 0.002 0.002 2947 0.0000 0.0000 SDL::OpenGL::Begin
0.00 - -0.000 1 - - main::action_quit
0.00 - -0.000 2 - - SDL::EventType
0.00 - -0.000 2 - - SDL::Event::type
0.00 - -0.000 7 - - SDL::GetTicks
0.00 - -0.000 7 - - SDL::OpenGL::Clear
0.00 - -0.000 7 - - SDL::OpenGL::GL_NORMALIZE
0.00 - -0.000 7 - - SDL::OpenGL::GL_SPOT_EXPONENT
You may have noticed that I specified main::main_loop instead
of just main_loop. dprofpp always uses
fully qualified names and will give empty results if you use
main_loop without the main:: package qualifier.
In this exclusive times view, the percentages in the first column and the
row order depend only on the runtime of each routine, without respect to its
children. Using just this view, I might have tried to optimize
draw_quad_face somehow, as it appears to be the most expensive
routine by a large margin. That's not the best approach, however, as an
inclusive view (-I) shows:
$ dprofpp -I -g main::main_loop
Total Elapsed Time = 4.952042 Seconds
User+System Time = 0.812051 Seconds
Inclusive Times
%Time ExclSec CumulS #Calls sec/call Csec/c Name
100. - 0.814 7 - 0.1163 main::do_frame
99.9 - 0.812 1 - 0.8121 main::main_loop
99.7 - 0.810 7 - 0.1158 main::draw_view
99.2 - 0.806 7 - 0.1151 main::draw_frame
98.6 0.027 0.801 49 0.0005 0.0163 main::draw_cube
95.3 0.571 0.774 294 0.0019 0.0026 main::draw_quad_face
24.0 0.195 0.195 64722 0.0000 0.0000 SDL::OpenGL::Vertex
2.71 0.022 0.022 2947 0.0000 0.0000 SDL::OpenGL::End
1.23 0.010 0.010 49 0.0002 0.0002 SDL::OpenGL::Rotate
1.11 0.009 0.009 70 0.0001 0.0001 SDL::OpenGL::Color
1.11 0.009 0.009 7 0.0013 0.0013 main::prep_frame
0.25 0.002 0.002 2947 0.0000 0.0000 SDL::OpenGL::Begin
0.00 - -0.000 1 - - main::action_quit
0.00 - -0.000 2 - - SDL::EventType
0.00 - -0.000 2 - - SDL::Event::type
In this view, draw_quad_face looks even worse, because the first
column now includes the time taken by all of the OpenGL calls inside of it,
including tens of thousands of glVertex calls. It seems that I
should do something to speed it up, but at this point it's not entirely clear
how to simplify it or reduce the number of OpenGL calls it makes (other than
reducing the subdivision level of each face, which would reduce rendering
quality).
Actually, there's a better option. The real problem is that
draw_cube dominates the execution time, and
draw_quad_face dominates that. How about not calling
draw_cube (and therefore draw_quad_face) at
all during normal rendering? It seems extremely wasteful to have to tell
OpenGL how to render a cube face dozens of times each frame. If only there were
a way to tell OpenGL to remember the cube definition once, and just refer to
that definition each time the engine needs to draw it.
Prev [1] [2] [3] [4] [5] [6] [7] [8] Next