The Hashbelt Data Structure

The Hashbelt Data Structure

The first change we're going to make is simple: instead of session objects that contain an expiration time, we're going to allow any type of object to be stored. The reason is simple: a caching system for time-sensitive information shouldn't require the objects it stores to understand timestamps and know when they are out of date. Tomcat gets away with using session objects (which store their own timestamps) because it's a single application. We need to be more general, because we're building a framework that can be reused to expire all sorts of data. Later on, when we discuss the code, we're going to define an interface, named ExpirationSystem, that encapsulates exactly what our data expiration algorithms are allowed to assume about the data being stored; for now, it suffices to say that programmers must be able to store any type of object.



Since the objects stored in an instance of ExpirationSystem no longer contain any information about when they expire, we need to store the expiration times for the objects somewhere within our framework. The obvious way to do this is to have an instance of Hashtable, which maps objects to some sort of timestamp (for example, to instances of Date). Instead of doing this, however, we're going to break apart the global cache and store objects in separate containers, based on how long they have to live. More precisely, we're going to do the following:

  • Define a container interface called HashbeltContainer. The idea behind the interface is that we'll have different implementations of our basic container, optimized for different uses.

  • Use many instances of HashbeltContainer, each one of which will only contain objects of approximately the same age. Each object stored in the global cache will be stored in exactly one container.

Once nice consequence of doing things this way is that we'll be able to tell how old an object is, or how close it is to being expired, simply by looking at what container it's in.

Pictorially, we've just performed the following transformation:

Diagram.
Figure 1. The cache is a set of containers based on how much time objects have before they expire. This structure is not reflected in the public interface

Of course, using multiple containers instead of a single instance of Hashtable means that the hashbelt's lookup algorithm is going to be slightly different from Tomcat's. In particular, it's going to iterate over all of the instances of HashbeltContainer. Attempting to fetch an object from the expiration system will still look the same to client code; it will involve a call to a get method. But inside the expiration system, instead of being a single lookup, the implementation of get will iterate over all of the containers, starting with the most recent and stopping when the object has been found. This algorithm is illustrated in figure 2.

Diagram - click for full-size view.
Figure 2. Implementing a query (click for full size view).

Now that we've changed how objects are stored, we also need to change the way objects are expired. Hashbelt still uses a background thread for object expiration. However, the actual expiration algorithm is quite different from Tomcat's. In a hashbelt system, the background thread performs the following tasks:

  1. The thread wakes up.
  2. The thread creates a new container.
  3. The thread locks the main data structure and then "rotates" the set of containers. The newly-created container goes into the first position, and the last container is removed from the data structure entirely. This "rotation" is the "conveyor belt" I mentioned earlier.
  4. The thread unlocks the main data structure and then iterates over the removed container, performing some sort of expiration operation for each object it finds.
  5. The thread goes to sleep for a brief period of time.

This sequence of operations is illustrated in figure 3.

Diagram.
Figure 3. How containers are expired.

At this point, we've covered how a hashbelt stores objects (in multiple containers, sorted by expiration time) and we've covered the basic "rotate and expire" algorithm. However, in the hashbelt algorithm as defined so far, there are some very compelling questions that we don't know the answer for. Namely:

  • What do we do with an object once it's been retrieved?
  • What do we do with an object when it's been removed from the data structure?
  • How do we expire an arbitrary object anyway?

Unfortunately, there are no universal answers to these questions. For example, consider session keys in a servlet engine. Session keys answer these questions in the following way:

  • What do we do with a session key once it's been retrieved? We remove it from the container we found it in, and stick it in the container with the most recent timestamp (this effectively postpones the expiration of the session).
  • What do we do with an object when it's been removed from the data structure? We actively expire it.
  • How do we expire a session? We remove any resources associated to the session that may have been allocated. In addition, we probably record the end of the session in some persistent storage mechanism.

On the other hand, consider weather reports from Half Moon Bay. The answers are completely different:

  • What do we do with a weather report once it's been retrieved? We leave it in the container in which we found it. Under no circumstances should we postpone the expiration. The whole point is that once weather information gets old, it ought to replaced regardless of whether someone has accessed it recently.

  • What do we do with an object when it's been removed from the data structure? We don't do anything. Garbage collection handles getting rid of the instance.

  • How do we expire a weather report? We don't. If a user requests a weather report that isn't in the ExpirationSystem, we'll fetch it at the time it's requested. That is, the appropriate re-fetch is when the information is requested, not when the old information has expired.

Because these sets of answers are so different, our implementation of the Hashbelt data structure will have to be flexible and easily customized. We'll achieve the flexibility by using strategy objects to handle application-specific behavior.

Prev  [1] [2] [3] [4] [5] Next

Close    To Top
  • Prev Article-Java:
  • Next Article-Java:
  • Now: Tutorial for Web and Software Design > Java > JDOnJDBCnSQLJ > Java Content
    Photoshop Tutorial
     

    Special Effect

      3D Effect
      Photoshop Articles
    Programming Tutorial
     

    C/C++ Tutorial

      Visual Basic
      C# Tutorial
    Database Tutorial
     

    MySQL Tutorial

      MS SQL Tutorial
      Oracle Tutorial
    Geek Tutorial
     

    Blogging Tutorial

      RSS Tutorial
      Podcasting Tutorial
    Graphic Design Tutorial
      Coreldraw Tutorial
      Illustrator Tutorial
      3D Tutorials
    Webmaster Articles
     

    Domain Service

      Web Hosting
      Site Promotion
    Java Tutorial/ Articles
     

    Java Servlets

      JavaEE Tutorial
     

    JavaBeans Tutorial

    XML Tutorial/ Articles
     

    XML Style

      AJAX Tutorial
      XML Mobile
    Flash Tutorial/ Articles
     

    Flash Video

      Action Script
      Flash Articles
    OS Tutorial/ Articles
      Linux Tutorial
      Symbian Tutorial
      MacOS Tutorial
    Personal Tech
      Hardware Tutorial
      Software Tutorial
      Online Auction