Optimizing Hash Functions For a Perfect Map

Optimizing Hash Functions For a Perfect Map

Comparing speeds

What is the speed benefit of this perfect hashtable compared to HashMap? I'll test the two by comparing PerfectHashMaps to HashMaps which have the benefit of the same amount of memory, as well as measurements against a HashMap sized to just contain the set. I've optimized the Key.hashCode() and key.equals() methods so that the HashMaps are not at a disadvantage from these methods. And I've included a test against a KeyHashMap implementation, a specialized implementation of HashMap, which is optimized for Key object keys (see "Optimizing a query on a Map" in the resource section below).



It is worth noting that the HashMap.put() method performs at different speeds depending on whether or not the put() is overwriting an existing entry. With no existing entry for a key, the HashMap needs to create a node element before inserting the value into the collection, whereas the node already exists when the entry is being overwritten. So I test both these scenarios too. All measured times have been scaled by the same value for simpler comparison.

Table 5: Comparing the performance of various Maps using the put() method on an empty table

 

put(), no previous element at key

1.2 JVM

1.3 JVM

HotSpot 2.0

1

HashMap, size of key set

100%

49.5%

57.6%

2

HashMap, same size as perfect hash map

66.5%

35.8%

27.8%

3

KeyHashMap, size of key set

73.3%

46.1%

44.0%

4

KeyHashMap, same size as perfect hash map

42.4%

34.2%

26.6%

5

PerfectKeyHashMap

15.1%

15.7%

12.2%

6

MinimalPerfectKeyHashMap (size of key set)

32.2%

35.0%

27.7%



Table 6: Comparing the performance of various Maps using the put() method on a full table

 

put() overwriting previous element at key

1.2 JVM

1.3 JVM

HotSpot 2.0

1

HashMap, size of key set

17.4%

21.9%

27.7%

2

HashMap, same size as perfect hash map

18.3%

24.1%

18.6%

3

KeyHashMap, size of key set

13.4%

17.0%

15.4%

4

KeyHashMap, same size as perfect hash map

14.6%

16.1%

15.5%

5

PerfectKeyHashMap

15.15

14.0%

11.4%

6

MinimalPerfectKeyHashMap (size of key set)

32.4%

34.3%

27.2%



Table 7: Comparing the performance of various Maps using the get() method

 

get()

1.2 JVM

1.3 JVM

HotSpot 2.0

1

HashMap, size of key set

16.6%

25.1%

20.6%

2

HashMap, same size as perfect hash map

17.1%

20.4%

16.8%

3

KeyHashMap, size of key set

13.4%

11.7%

12.6%

4

KeyHashMap, same size as perfect hash map

13.2%

15.1%

11.7%

5

PerfectKeyHashMap

7.3%

8.7%

6.6%

6

MinimalPerfectKeyHashMap (size of key set)

16.0%

18.2%

14.9%

The comparisons show that PerfectKeyHashMap can be significantly faster than any of the other implementations except in the case where a put() is overwriting an existing entry. The Maps based on the HashMap function have a severely decreased worst case performance, which occurs when many of the keys map to the same index (not tested here), whereas the PerfectKeyHashMap has constant performance independent of the key set.

Reducing excessive memory requirements

Our hash function may require a large table in the worst case. A useful concept is the memory factor requirement of the perfect maps, which I define as the number of times larger the memory needs to be compared to the number of elements being stored in the maps. For the examples previously listed, the set {keys 1} using the hash function %7 produced a memory factor requirement of 1.75 (four elements requiring seven indexes); {keys 2} using %3 produced a memory factor requirement of 1; {keys 3} using %14 produced a memory factor requirement of 3.

I ran a test of thousands of randomly generated keysets of between 5 and 500 elements where each key value was retricted to the range 1 to 10,000. This test produced an average memory factor requirement of 12 and a maximum factor of 20. Whether these memory requirements are reasonable for your application depends on how many large perfect maps you need to create and on the importance of speed compared to memory. Can we gain the benefits of the perfect map without the large memory overhead?

For some datasets it may be possible to produce a perfect hash function that is fast and has a small memory factor requirement (for example the final map implementation in the article "Optimizing a query on a Map"). In general, however, reducing the memory overhead of hash functions usually requires the use of a table, thus reducing the speed of the hash function.

For the hash functions considered in this article, we have the required tables available already; the perfect maps provide us with tables that can produce minimal perfect maps. Since any minimal perfect map requires a perfect map as well, you may think we have increased the memory requirements rather than reduced them. But multiple minimal perfect map instances for the same set of keys can all point to the same perfect map for their memory map table; therefore, depending on the application, this might be a feasible solution. The optimized implementation of the MinimalPerfectKeyHashMap looks like

class MinimalPerfectKeyHashMap

{



  Key[] keys;

  int[] map;

  Object[] values;

  int hfRemainder;



  MinimalPerfectKeyHashMap(int size, Key[] keys, int[] map,

                                     int remainder)

  {

    this.keys = keys;

    values = new Object[size];

    this.map = map;

    this.hfRemainder = remainder;

  }



  public void put(Key key, Object value)

  {

    int index = key.id % hfRemainder;

    keys[index] = key;

    values[map[index]] = value;

  }



  public Object get(Key key)

  {

//    int index = key.id % hfRemainder;

    return values[map[key.id % hfRemainder]];

  }

}

Examples of minimal perfect map usage can be found in the associated source code with this article. However, the speed tests applied to the minimal perfect map (the rows numbered 6, in tables 5 through 7) show that there may be no advantage to using this implementation of MinimalPerfectKeyHashMaps. The table lookup produces enough of an overhead that an optimized KeyHashMap using the HashMap function can produce superior performance.

Further resources

  • The source code is available here.
  • The Java Performance Tuning website
  • Optimizing a query on a Map by Jack Shirazi, JavaWorld November 2000
  • Minimal Perfect Hashing by Carlo Pescio, Dr. Dobb's Journal, July 1996, p101

Jack Shirazi is the author of Java Performance Tuning. He was an early adopter of Java, and for the last few years has consulted mainly for the financial sector, focusing on Java performance.


Return to ONJava.com.

Prev  [1] [2] 

Close    To Top
  • Prev Article-Java:
  • Next Article-Java:
  • Now: Tutorial for Web and Software Design > Java > Java Basic > Java Content
    Photoshop Tutorial
     

    Special Effect

      3D Effect
      Photoshop Articles
    Programming Tutorial
     

    C/C++ Tutorial

      Visual Basic
      C# Tutorial
    Database Tutorial
     

    MySQL Tutorial

      MS SQL Tutorial
      Oracle Tutorial
    Geek Tutorial
     

    Blogging Tutorial

      RSS Tutorial
      Podcasting Tutorial
    Graphic Design Tutorial
      Coreldraw Tutorial
      Illustrator Tutorial
      3D Tutorials
    Webmaster Articles
     

    Domain Service

      Web Hosting
      Site Promotion
    Java Tutorial/ Articles
     

    Java Servlets

      JavaEE Tutorial
     

    JavaBeans Tutorial

    XML Tutorial/ Articles
     

    XML Style

      AJAX Tutorial
      XML Mobile
    Flash Tutorial/ Articles
     

    Flash Video

      Action Script
      Flash Articles
    OS Tutorial/ Articles
      Linux Tutorial
      Symbian Tutorial
      MacOS Tutorial
    Personal Tech
      Hardware Tutorial
      Software Tutorial
      Online Auction