Java Performance: Efficiently Formatting Doubles
Formatting with internationalization
One interesting point is that for printing formatted
numbers, my algorithm could run even quicker. Normally you want to
print fewer than 15 decimal places, and my algorithm runs faster the
fewer digits it needs to output. This contrasts with the SDK number
formatting which always takes longer to format doubles.
The SDK uses the java.text.DecimalFormat class to print
formatted floating point numbers, and the conversion algorithm first
uses the default SDK double-to-string conversion, then
parses and formats the resulting string characters to create the
formatted string. For example, to format a double with
four digits after the decimal point and thousands separators, you
could use the following SDK code:
DecimalFormat format = new DecimalFormat("#,##0.0000");
FieldPosition f = new FieldPosition(0);
StringBuffer s = new StringBuffer();
format.format(myDouble, s, f);
java.text.DecimalFormat also supports
internationalized formatting. But this internationalized support turns
out to be remarkably easy to manage for the most frequently used
formatting, which needs internationalization of only a few
elements:
- the decimal point character
- the thousands separator character
- the number of digits separated by the thousands separator
(normally three, but sometimes four)
- the prefix and suffix character for negative numbers (normally a
minus sign before or after the number, or the number surrounded by
parentheses).
Accessing these values for a particular locale can be managed
through the DecimalFormat class. For the conversion
algorithm, changing the decimal character and the prefix and suffix
characters is obviously strightforward. Adding in the thousands
separator is slightly more challenging. You need to know the distance
from the decimal point of the current digit as you are printing, but
this distance is given by the magnitude of the current digit being
printed, and it is simple to keep track of the magnitude: you
determine the magnitude of the full double as part of the
printing algorithm, and you can simply decrement the magnitude by one
for each digit printed. The decision to print a thousands-separator
character is then straightforward.
if (d_magnitude % numDigitsSeparated == (numDigitsSeparated-1))
s.append(thousandsSeparator);
To avoid having any thousands separator at all you could write
another identical method without the above logic, or you could simply
use a large value for numDigitsSeparated, e.g
Integer.MAX_VALUE.
Testing
The proof of the pudding is in the eating, so let's test out this
effort. In the following table, I've used several Sun VMs on four
tests:
- Test 1: the original conversion algorithm from my book
- Test 2: the adapted conversion algorithm including formatting
- Test 3: the SDK
StringBuffer.append(double) method
(which calls Double.toString()) - Test 4: the SDK
java.text.DecimalFormat.format() method
I've normalized all measured times to the SDK 1.2 VM with Java
Implementation Testing (JIT), running test 1. (That is, all measured
times are divided by the measured time for the 1.2 VM running test 1.)
Times are the averages over several test runs. HotSpot times are shown
for a second run of tests without exiting the VM, so that the
server-tuned VM has time for its optimizations to kick in.
Table 1: Times for converting doubles to strings
using various methods and VMs. |
| 1.2
VM | 1.2 no-JIT VM | 1.3
VM | HotSpot 2.0 VM (2nd run) |
test 1:
proprietary printing | 100.0% | 420.1% | 114.2% | 82.0% |
test 2:
proprietary with formatting | 115.1% | 414.4% | 85.4% | 93.8% |
test 3:
StringBuffer.append(double) | 282.2% | 926.1% | 265.1% | 199.8% |
test 4:
java.text.DecimalFormat.format() | 456.1% | 1690.2% | 409.6% | 303.7% |
The test results show several interesting things. Firstly, the two
tests using my algorithms produced relatively close timings in each
VM, but which test was the faster depended on the VM being used. Even
the two HotSpot VMs (the standard client-tuned 1.3 VM and the
server-tuned HotSpot 2.0) produced a different order for the test
timings. To me, this indicates that there are further possible
optimizations in both sets of code (test1 and test2), and that the two
HotSpot VMs are managing to apply two different (overlapping) sets of
optimizations. Looking at the code, I would not be at all surprised to
be able to tease out a 10% improvement by some re-factoring of nested
tests. The time taken to format numbers depends on the number of
digits being printed. I have used a format with four decimal places,
but a separate test formatting to two decimal places showed test2
always running faster than test1 for all VMs.
Secondly, all the tests clearly show my algorithms outperforming
the SDK conversion methods by a factor of two to four. Although the
tests did not show the proprietary formatting algorithm to be
consistently faster than the proprietary non-formatting algorithm,
which I had actually expected, nevertheless the tests do show that
both the proprietary algorithms are always significantly faster than
the SDK provided algorithms.
|
Related Files:
DoubleToString.java
DoubleToString.class
|
Finally, it is worth noting that to convert floats to
strings, you should not simply use the double
methods. Although that is technically possible, the smaller
float data structure is sufficiently different from
double that the methods should be re-implemented for
floats, and the smaller range taken account of by using
ints to hold the scaled values.
Jack Shirazi
is the author of Java Performance Tuning. He was an early adopter of Java, and for the last few years has consulted mainly for the financial sector, focusing on Java performance.
Return to ONJava.com.