Optimizing Hash Functions For a Perfect Map

Optimizing Hash Functions For a Perfect Map

by Jack Shirazi
01/25/2001

Maps that are hash tables are normally implemented using a hash function which maps a data item (the key) into an indexing table. The hash function takes the key and uses some algorithm to convert it to an index value into an array. For example, the hash function from the Java SDK HashMap class:

int index = (key.hashCode() & 0x7FFFFFFF) % table.length;

This hash function extracts some bits from the hashcode of the key object (to quickly convert the hashcode to some positive number), and then uses the remainder, after division by the table length, to get an index that fits into the HashMap internal array. A hash function cannot generally guarantee every data item maps to a different index, so the structure of a hash table is complicated by needing to maintain a list at every index, allowing multiple objects to be held at any index. Having multiple objects at the same index is inefficient since the collection at that index must be searched every time an element from that collection is required.

A perfect hash function is one that maps every key to a different index. Map implementations with perfect hash functions are much simpler than using generic hash functions; they consist of two arrays, one for the keys and one to hold all corresponding values. The array elements do not need to hold collections since only one object is ever located at each index, making processing easier and faster. (Note that the hash function does not have to map a key into every index, i.e. some indexes can be empty. A hash function which maps each key to a unique index and leaves no empty entries in the table is called a minimal perfect hash function). But how can you get a perfect hash function?

An interesting problem

Recently I came across an interesting performance problem. The application used a number of objects, which were at core Maps with known key objects (not Strings). There were several kinds of Maps, and each kind was allowed to contain only one specific set of key objects. For example, suppose the key objects were defined simply as

public class Key {

  int id;

  public Key(int id) {this.id = id;}

}

Then the first kind of Map might be allowed to have keys with ids of 5, 10, 11, and 27, while another kind of Map must contain only keys with ids 10, 11 and 15, etc. There could be multiple instances of each kind of map, but every instance could only contain its defined keys.

In this application these Map objects were accessed and updated often. The key lookup needed to be as fast as possible. Note the information we have about the Map data: the key set was restricted and known before construction for every Map, and the keys had associated numerical values which were unique to each key. Because the key data was restricted and known, my first thought was that the these Maps were ideal for optimization using perfect hash functions.

Optimizing the Maps

Creating optimized classes for these Maps is straightforward except for the hash function. To keep the code clear and optimized for Key object keys, I won't implement the Map interface precisely -- converting the following class to a standard implementation of Map is easy enough.

public class PerfectKeyHashMap

{

  Key[] keys;

  Object[] values;



  public void put(Key key, Object value)

  {

    //The following line is the currently unknown hash function

    int index = someHashFunctionOn(key);

    keys[index] = key;

    values[index] = values;

  }



  public Object get(Key key)

  {

    //The following line is the currently unknown hash function

    int index = someHashFunctionOn(key);

    //keys assumed to be entered on map creation with null values.

    //no validation done on key, but if needed, that could be

    //if(keys[index].id != key.id) throw new Exception

	 //("invalid key"); 

         return values[index];

  }



  ...

}

Now how do we get our perfect hash function? We want it to be quick so we should use as few simple operations as possible. The HashMap function manipulates bits and uses the remainder operator. Remainder looks like it could be quite a useful operation for our hash function. It will help map arbitrary numbers into a small table, and it can be used with a variable argument. So let's try it.

Generating a perfect hash function

We have a known maximum number of keys in the hashtable. We also know that we want each of those keys to map to a different index. So clearly our remainder operator argument must be a number greater than or equal to the number of keys we have. Let's try out some tests to get a feel for the data. Using some examples, including the ones we had earlier, what would work using the remainder operator? We'll start with the list of key values and apply the remainder operator to each value. Then we use the argument that starts with the size of the key set, which increments by one each time until we get a result set of unique keys (the repeated values in each column are marked with a *):

Table 1: Finding the hash function for the set {5,10,11,27}

keys 1

%4

%5

%6

%7

5

1

0*

5*

5

10

2

0*

4

3

11

3*

1

5*

4

27

3*

2

3

6



Table 2: Finding the hash function for the set {10,11,15}

keys 2

%3

10

1

11

2

15

0



Table 3: Finding the hash function for the set {1,55,300,1095,1111}

keys 3

%5

%6

%7

%8

%9

%10

%11

%12

%13

%14

300

0

0

6*

4

3

0

3

0

1

6

1095

0

3

3

7*

6

5*

6

3

3*

3

55

0

1*

6*

5

1*

5*

0*

7*

3*

13

1

1*

1*

1

1

1*

1

1

1

1

1

1111

1*

1*

5

7*

4

1

0*

7*

6

5

For each of these three examples, we eventually gained a set of indexes which were unique and would map to a fairly small table. This method of obtaining indexes by successively increasing the remainder argument will definitely lead to a set of unique value for any set of non-repeated keys. This is because the value of one plus the largest key in the set for the remainder operator argument is guaranteed to produce a unique set of integers. For the last example this argument would be 1112 as shown in this table:

Table 4: Showing that a hash function producing a unique set is always obtainable

keys 3

%1112

300

300

1095

1095

55

55

1

1

1111

1111

So our perfect hash map needs an extra instance variable to hold the remainder appropriate for its set of keys. The hash function is just

  int index = key.id % remainder;

[1] [2] Next

Close    To Top
  • Prev Article-Java:
  • Next Article-Java:
  • Now: Tutorial for Web and Software Design > Java > Java Basic > Java Content
    Photoshop Tutorial
     

    Special Effect

      3D Effect
      Photoshop Articles
    Programming Tutorial
     

    C/C++ Tutorial

      Visual Basic
      C# Tutorial
    Database Tutorial
     

    MySQL Tutorial

      MS SQL Tutorial
      Oracle Tutorial
    Geek Tutorial
     

    Blogging Tutorial

      RSS Tutorial
      Podcasting Tutorial
    Graphic Design Tutorial
      Coreldraw Tutorial
      Illustrator Tutorial
      3D Tutorials
    Webmaster Articles
     

    Domain Service

      Web Hosting
      Site Promotion
    Java Tutorial/ Articles
     

    Java Servlets

      JavaEE Tutorial
     

    JavaBeans Tutorial

    XML Tutorial/ Articles
     

    XML Style

      AJAX Tutorial
      XML Mobile
    Flash Tutorial/ Articles
     

    Flash Video

      Action Script
      Flash Articles
    OS Tutorial/ Articles
      Linux Tutorial
      Symbian Tutorial
      MacOS Tutorial
    Personal Tech
      Hardware Tutorial
      Software Tutorial
      Online Auction