Floating-Point Computing: A Comedy of Errors?
Floating-Point Computing: A Comedy of Errors?   By Gregory Tarsy and Neil Toda, Sun Microsystems, 20.Jan.04  
This article explores some of the intricacies inherent in using floating-point arithmetic for numerical computing. It is devoted to analyzing a straightforward example of summing the sequence of square-roots of the first 1,000,000,000 positive integers. Thanks are due to Paul Murphy whose Linuxworld article of 1/29/2003 was the inspiration for this note.
Contents
 
The Right Answer The AnalysisWhy are some results better than SPARC?IEEE-754 and the Results on Sun Platforms NotesConclusions  

Paul Murphy asked people to run the following code using default compilers and no special switches, and to report the result: 

% cc -o sqr -O sqr.c -lm
% time ./sqr

where sqr.c is:
#include <stdio.h>
#include <math.h>
main()
{
 register i;
 double f=0.0;
 double sqrt();

 for(i=0;i<=1000000000;i++)
 {
    f+=sqrt( (double) i);
 }
 printf("Finished %20.14f\n",f);
} 
 

The table at the end of the article summarizes some reported results. Studying these suggests the questions: why are the results different for different (IEEE754 compliant) computers and different compilers, what is the "correct" result, how is it found, why are some results apparently more accurate than others, and why are the results on Sun, IBM,  and other RISC CPU based computers about the same?

The Right Answer

Murphy discovered that one method of computing a "correct" result was the utilization of the Euler-Maclaurin formula and high precision computation. It gives 21081851083600.37596259382529338 as a good approximation to the correct result.

The Analysis

The main points to be made are:

  • The error as measured by the more suitable unit of ULPs (Units in the Last Place) is not really large and it is further controllable by improving the summation algorithm.
  • The SPARC and other RISC CPU results are uniform across systems while the IA32 results vary from somewhat better to much worse; uniformity is a virtue.
  • The variability of IA32 results stems from improper management of extended precision registers vis-a-vis memory precision, a subtlety that vitiates significantly the potential real advantage of extended precision intermediate computation. Note that IA32 does indeed comply with IEEE754; its differences with SPARC or other RISCs is within IEEE754 latitude.
  • The example is typical of kernels in the solution of differential equations, like the three-body problem, which will overwhelm eventually any finite bounded precision representation, so better numerical approaches must be used anyway.

Next is a more detailed discussion. For a good foundation in floating-point computing  see the ACM article by David Goldberg "What Every Computer Scientist Should Know about Floating-Point Computing" especially with the addendum by Douglas Priest that addresses IA32. Also of interest might be Joe Darcy's JavaOne talk on What Everybody Using the Java Programming Language Should Know About Floating-Point Arithmetic.

The Results: Correct vs IA32 versus SPARC

The table presents results from some of the  sources/runs in order of accuracy.

RUN-Name
Result
Double Prec Rep
1) Correct
21081851083600.37596259382529338
0x42b32c803ebb5060
2) SPARC-quad
21081851083600.37500000000000
0x42b32c803ebb5060
3) Correct+1ULP
21081851083600.3789
0x42b32c803ebb5061
4) IA32
21081851083600.38281250000000
0x42b32c803ebb5062
5) SPARC double
21081851083600.55859375000000
0x42b32c803ebb508f
 

Entry number 1 is the result from the Euler-Maclaurin formula as presented by Murphy. The third column of the table gives the IEEE-754 floating-point double-precision hexadecimal representation of the result.

Entry number 2, shows a result that in decimal representation, is in error by -.00096259382529338. However, observe that the hex representation shows an error of 0 ULPs, an exact match. The reason for this apparent contradiction is that IEEE-754 double precision represents only 16- or 17-decimal digits. Additional digits are used in rounding when converting from decimal to the double precision format. Entry number 2 presents the double-precision results when the sum is kept in IEEE-754 128-bit format.

Entry number 3 presents not the result of a run, but rather serves to illustrate the error in the decimal representation when the hexadecimal number varies by 1 bit, or unit in the last place, denoted ULP.

The best computed results are in error by 2 ULPs and are represented by entry number 4.

The SPARC results which are in error by 47 ULPs, are shown as entry number 5.

The Magnitude of the Errors

Consider now the errors encountered in the survey, based on the previous table. Consider #1 as perfect. Errors can be considered in terms of decimal representation, and ULPs.

When considering the absolute error alone, SPARC results appear very inaccurate. However, considering the ratio error/correct, the variance between results, in fact, is discovered to be very small.

RUN-Name
Decimal Error
ULP Error
err/correct
SPARC-quad
-0.00096259382529338
0
0
Correct+1ULP
 0.00293740617470662
1
1.39333E-16
BEST
 0.00684990617470662
2
3.24920E-16
SPARC-double
 0.18263115617470662
47
86.6296E-16
 

There are two sources of error in the algorithm as provided in Murphy's article. The first is due to the sqrt operation, and the second is due to the addition in summing the square roots.

Sqrt rounding error: the largest rounding errors for the sqrt operations should occur for the largest numbers. A one ULP error for sqrt(1,000,000,000) is on the order of 2*10^(-12). This rounding error is very small compared to the one that is possible due to the addition that follows the sqrt operations.

The sum just before sqrt(1000000000) is 21081851051977.5977. A 1 ULP error in the sum value results in an error of about .0029. Huge compared to the rounding error of the sqrt operation.

The SPARC-quad run shows that a lot more precision will yield the correct result. The cost as Murphy points out, is significantly more compute time.

The question is, can something be done to minimize the inaccuracy without pushing precision to quad?

Yes.

Compensated Summation: An Algorithm That Corrects for Errors

The following algorithm uses the notion of compensated summation, which deals with the rounding error that is associated with each add in the summation.

#include <stdio.h>
#include <math.h>
main()
{
  register i;
  double sum=0.0,newsum,f;
  double sumerr=0.0;
  double err;
  double sqrt();
  for(i=1;i<=1000000000;i++)
  {
       f=sqrt( (double) i);
       newsum = sum + f;
       err = (newsum - sum ) - f;
       sumerr += err;
       sum = newsum;
   }
   printf("Finished %20.14f\n",sum);
   printf("Finished %20.14f\n",sum-sumerr);
}        
 

This program was compiled/linked with the same command, but returned results matching Correct while only taking about 20% more compute time. The entry "SPARC double+e" in the following table shows the result using compensated summation. We offer a bit more detail about compensated summation in the Notes section at the end of this article.

RUN-Name
Result
Double Prec Rep
Correct
21081851083600.37596259382529338
0x42b32c803ebb5060
SPARC double+e
21081851083600.37500000000000
0x42b32c803ebb5060
SPARC-quad
21081851083600.37500000000000
0x42b32c803ebb5060
Correct+1ULP
21081851083600.3789
0x42b32c803ebb5061
BEST
21081851083600.38281250000000
0x42b32c803ebb5062
SPARC double
21081851083600.55859375000000
0x42b32c803ebb508f
 
Why are some results better than SPARC?

There is a clustering of systems in terms of errors, identified as RISC machines that adhere to IEEE-754 64-bit format for double precision work.

The best results were from systems based on Intel's Pentium model. Pentium systems have extended precision registers, 80 bits versus SPARC's 64-bits in double. In addition, Pentium systems allow intermediate results to be kept in extended format. An explicit call is required to force the Pentium processor to keep intermediate results in 64-bit format.

As the preceding section shows, the critical issue of concern here should be the precision used in the problem. As Murphy points out, the correct value was calculated using 200 digits of precision.

While the results from the Pentium class machines were better, the question arguably remains: Are any of these answers bad? We contend that the answer to this question is no. Given the magnitude of the number, and the size of the error, the results are all perfectly reasonable.

Finally, what should be learned from this example is that a precision level may work for one algorithm and data set, but fail if the data changes or if the algorithm is modified.

Another interpretation of this is that the compilers of C don't have strong guarantees about what "double" means. In some versions of gcc with some compiler options, "double" can mean "80-bits" instead of 64-bits. Some of the gcc compilers might be new enough to have some C99 support so it is conceivable that double_t is being used instead of double.

If one wants more accurate results without compensated summation, one should declare f as "long double" or "double_t". If one wants predictable results one can use Java, even on x86.

IEEE-754 and the Results on Sun Platforms
IEEE 754 aids understanding where errors arise in computations by enabling us to work out what error is committed in each operation just by knowing the operands and the destination format; we need not know such system-dependent details as whether the hardware uses a guard bit or whether it chops or rounds. But IEEE 754 also obscures understanding why a particular system computes the result that it does because it allows implementations to arbitrarily choose which operations are rounded to the format the programmer specifies and which are rounded to a wider format.

The one pertinent observation in Murphy's analysis of his example is that the results on SPARC systems dating back to 1989 are all the same, whereas the results on x86 systems vary in ways that are not at all obvious.

Notes

In the original algorithm:

for(i=0;i<=1000000000;i++)
   {
      f+=sqrt( (double) i);
   }
 

whenever the addition is done, there are two possibilities:

  1. The result is exact and fits into the 64-bits allotted, or
  2. the result will not exactly fit, and rounding needs to occur.

With each addition, the rounding error is ignored. After a billion additions, the results are noticeable in this problem.

The compensating algorithm does the following:

  • Do the add, but don't worry about rounding error: newsum = sum + f;
  • Determine the part of f that contributes to the new value: (newsum - sum )
  • f's contribution to the new sum minus f give the amount of f that is lost, i.e. the rounding error: err = (newsum - sum ) - f;
  • Keep track of the rounding errors separately: sumerr += err;
  • All that is left is to adjust things so that the loop can be used again.
OS
CPU
Mhz
Compiler Version/Compiler Command
Result
Error
Time (secs)
Linux
PII
400
GCC 2.95 with GMP 4.0
Precision set to 200 digits
21081851083600.37596259382529338
0.0
6 hours, 11 minutes, 45.9 sec
Solaris 2.7
Ultra-II
450
gcc 2.95
cc -o sqr -O2 sqr.c -lm
21081851083600.55859375
-0.18263116
79.0u 0.1s 0:15 96% 0+0k 0+0io 0pf+0w
SunOS 4.1.4 (Sun IPX)
SP 2+
40
K&R
cc -O -o sqr sqr.c -lm
21081851083600.55859375
-0.18263116
6 hours, 57 minutes, 18 seconds
Redhat7.1 Kernel 2.4.17
Intel PIII
933MHz
gcc version 2.96 20000731
(Red Hat Linux 7.1 2.96-81)
gcc -lm -o test test.c
21081851083598.38281250000000
1.99219
176.350u 0.340s 3:07.78 94.0%
Redhat7.1 Kernel 2.4.17
Intel PIII
933MHz
gcc version 2.96 20000731
(Red Hat Linux 7.1 2.96-81)
gcc -march=i686 -O3 -lm -o test test.c
21081851083600.38281250000000
-0.0078125
74.680u 0.110s 1:14.79 100.0% 0+0k 0+0io 109pf+0w
SUSE 7.1
Intel PI
Unknown
/usr/lib/gcc-lib/i486-suse-linux/2.95.2/specs
gcc version 2.95.2 19991024
gcc -o test test.c -lm
gcc -O3 -o test test.c -lm
21081851083598.38281250000000
1.99219
real 39m57.842s user 39m1.750s sys 0m1.210s
SUSE 7.1
Intel PI
Unknown
/usr/lib/gcc-lib/i486-suse-linux/2.95.2/specs
gcc version 2.95.2 19991024
gcc -O3 -o test test.c -lm
21081851083600.38281250000000
-0.0078125
real 11m56.810s user 11m48.850s sys 0m0.250s
NetBSD
PIV
1.7GHz
gcc version egcs-2.91.66 19990314
(egcs-1.1.2 release)
gcc -O sq.c -o sq -lm
21081851083600.55859375000000
-0.183594
112.383u 0.000s 1:52.98 99.4% 0+0k 2+75io 2pf+0w
NetBSD
PIV
1.7GHz
/usr/pkg/gcc-2.95.3/lib/gcc-lib/
i386--netbsdelf/2.95.3/specs
gcc -O sq.c -o sq -lm
21081851083600.55859375000000
-0.183594
110.656u 0.000s 1:51.64 99.1% 0+0k 2+110io 1pf+0w
Solaris 8
Ultra III
750
gcc version 2.95.3 20010315
(Not 64 bit as far as I can tell!)
gcc -o sqrO3 -O3 sqr.c -lm
21081851083600.55859375000000
-0.183594
58.40u 0.01s 1:00.76 96.1%
Darwin
G4
450
Apple Computer, Inc. version gcc-932.1,
based on gcc version 2.95.2
19991024 (release)
gcc -o sqrO -O sqr.c -lm
21081851083600.55859375000000
-0.183594
270.930u 1.040s 4:47.78 94.5% 0+0k 0+1io 0pf+0w
Linux 2.4.7 (redhat)
AMD
1.6GHz
gcc 2.96-98 (redhat)
cc -o sqr -O sqr.c -lm
21081851083600.38281250000000
-0.0078125
real 0m23.492s user 0m23.170s sys 0m0.000s
Linux 2.4.7
athlon
750
gcc 3.0
21081851083598.38281250000000
1.99219
121.730u 0.030s 2:22.15 85.6% 0+0k 0+0io 104pf+0w
Linux
G3
300
gcc 2.95.2
21081851083600.55859375000000
-0.183594
515.390u 0.030s 8:35.45 99.9% 0+0k 0+0io 154pf+0w
Linux
alpha/21164a
533
gcc 3.0
21081851083600.55859375000000
-0.183594
570.998u 0.049s 10:14.14 92.9% 0+0k 0+0io 0pf+0w
Linux
alpha/21164a
533
ccc here is DEC/Compaq C compiler V6.4, cpml is the DEC/Compaq Alpha-optimized math library
ccc -lcpml fptest.c -o fptest
21081851083600.55859375000000
-0.183594
151.426u 0.003s 2:31.44 99.9% 0+0k 0+0io 0pf+0w
Solaris 8
UIII
750
Sun Forte 6.2 Compiler
cc -o sqr -O sqr.c -lm
21081851083600.55859375000000
-0.183594
real 1:59.1 user 1:59.0 sys 0.0
Red hat 7.2
PIV
1.6GHz
gnu c compiler 2.96
cc -o sqr -O sqr.c -lm
21081851083600.38281250000000
-0.0078125
real 0m31.098s user 0m29.250s sys 0m0.040s
Red hat 7.1
PII
400
gcc 2.96
cc -o sqr -O sqr.c -lm
21081851083600.38281250000000
-0.0078125
real 2m54.350s user 2m53.180s sys 0m1.140s
Debian GNU/Linux 2.2.19
Athlon
1.2GH
gcc 2.95
gcc test.c -o test -lm
21081851083598.38281250000000
1.99219
real 2m7.523s user 2m3.160s sys 0m0.020s
Solaris 8
Ultra IIi
440
using Forte 6.1, "-O1" optimization
cc -o sqr -xO2 ~/src/sqr.c -lm
21081851083600.55859375000000
-0.183594
real 174.59 user 168.24 sys 0.03
Solaris 8
Ultra IIi
440
using gcc-2.95.3
gcc -o sqr -O2 ~/src/sqr.c -lm
21081851083600.55859375000000
-0.183594
real 88.69 user 87.06 sys 0.00
Solaris 8
U III
750
gcc-2.95.3, "-O2" optimization
gcc -o sqr -O2 ~/src/sqr.c -lm
21081851083600.55859375000000
-0.183594
real 57.60 user 57.47 sys 0.00
Solaris 8
U III
750
using Forte 6.1, "-O2" optimization
cc -o sqr -xO2 ~/src/sqr.c -lm
21081851083600.5585937500000
-0.183594
real 119.15 user 118.89 sys 0.00
HP-UX 11
PA-8600
550
HP Ansi C B.11.11.02,
"-O1" optimization
cc -o sqr -O1 ~/src/sqr.c -lm
21081851083600.55859375000000
-0.183594
real 3:20.8 user 3:20.7 sys 0.1
AIX 4.3.3.0
RS-64-III
500
IBM C (ibmcxx.cmp) 3.6.6.4,
"-O2" optimization
cc -o sqr -O2 ~/src/sqr.c -lm
21081851083600.55859375000000
-0.183594
Real 341.19 User 170.95 System 0.01
 
Conclusions

Even the simple example of square-root summation affords the opportunity for appreciation of the complexity of floating-point computation. The complexity is inherent in the attempt to model arithmetic with real numbers, which in "true" arithmetic have infinite precision, by a finite precision approximation. Almost every arithmetic operation, therefore, results in an error and, in fact, it is in addition/subtraction that the largest errors can occur. Thus floating-point computation is epitomized by the need to understand and estimate the error to ensure the validity of what one computes.

About the Authors
Gregory Tarsy is the manager of the Floating Point and Numerical Computing group at Sun Microsystems. Prior to joining Sun he worked in scientific applications on supercomputers and was a professor of mathematics at the University of California and City University of New York.

Neil Toda is a member of Sun's Floating Point and Numerical Computing group. Prior to joining Sun, he worked in Operating Systems Engineering, Database Engineering, and Software Performance Engineering.

 
Rate and Review Tell us what you think of the content of this page. Excellent   Good   Fair   Poor   Comments:
If you would like a reply to your comment, please submit your email address:
Note: We may not respond to all submitted comments.
Close    To Top
  • Prev Article-OS:
  • Next Article-OS:
  • Now: Tutorial for Web and Software Design > OS > Solaris > OS Content
    Photoshop Tutorial
     

    Special Effect

      3D Effect
      Photoshop Articles
    Programming Tutorial
     

    C/C++ Tutorial

      Visual Basic
      C# Tutorial
    Database Tutorial
     

    MySQL Tutorial

      MS SQL Tutorial
      Oracle Tutorial
    Geek Tutorial
     

    Blogging Tutorial

      RSS Tutorial
      Podcasting Tutorial
    Graphic Design Tutorial
      Coreldraw Tutorial
      Illustrator Tutorial
      3D Tutorials
    Webmaster Articles
     

    Domain Service

      Web Hosting
      Site Promotion
    Java Tutorial/ Articles
     

    Java Servlets

      JavaEE Tutorial
     

    JavaBeans Tutorial

    XML Tutorial/ Articles
     

    XML Style

      AJAX Tutorial
      XML Mobile
    Flash Tutorial/ Articles
     

    Flash Video

      Action Script
      Flash Articles
    OS Tutorial/ Articles
      Linux Tutorial
      Symbian Tutorial
      MacOS Tutorial
    Personal Tech
      Hardware Tutorial
      Software Tutorial
      Online Auction