Managing Rich Data Structures

Managing Rich Data Structures

by Dave Baker
February 16, 2006

If you're like me, you've written plenty of scripts that use simple text files to store snippets of data. Those scripts might have evolved over time into using several snippets of data for each item, which translates into lots and lots of little text files in a data directory somewhere.

After reading that Linux doesn't like more than a hundred or so text files per directory, and thinking about the amount of space wasted on my hard drive due to the small size of the snippets compared to the size of a sector and the hassle of all those little files when making a backup, I decided to move from snippets to a single database. Here's how I did it.

I didn't go all the way to a relational database, in part because I'm not a very proficient data-slinger. Plus I wanted to try the apparently simpler technique described in Chapter 14 of Perl Cookbook, 2nd Edition, namely the use of one of the DBM libraries. No SQL required.

Also, I didn't want to use a one-line-per-item type of plain text file, although I've had quite a bit of luck with them in other projects. That's where you have a list of different values separated by a pipe character or some other special symbol (rather than commas). Each item might have a unique identification number as the item in the first field, for example. Then you can read through the lines in the text file until you find the line that has the ID number you desire in the first field.

The reason the one-line-per-item delimited text file didn't seem ideal for my project is that some of the items of data consist of text that includes line breaks. If someone inserts literal line breaks into a field, you lose the ability to easily search for particular fields by position. The example data here doesn't include such data; I've omitted the multi-line data for the sake of compactness in demonstrating the MLDBM solution. Happily, there is nothing that prevents you from storing multiline data with MLDBM.

My three kinds of data were:

  • A text file that stored the target URL of an advertiser. When a reader clicks on the banner, a mod_perl script takes the reader there.
  • A text file that stored the URL of the .gif or .jpg file to use as the banner.
  • A text file that stored a one-line headline to display above the banner.

Each file's name indicated the date of its associated banner (the banners appear in a daily newsletter published on the Web and via email each day) and the type of data stored in the file. For example, url_2005_12_09.txt, gif_2005_12_09.txt, and headline_2005_12_09.txt are the three data files for the December 9, 2005 newsletter's banner.

Here's how I turned those three data files (multiplied by the number of banner slots sold to date and the number of banner slots sold for upcoming newsletters) into a single file.

A Hash of Hashes

First, I thought about what kind of data structure I would create. I looked at the relationships between the various text files. It became clear that I basically had a lot of hashes: each banner's data consists of a set of keys and values. I had been creating a separate text file that essentially contained the key in its name and the value in its data. Each banner's data would fit nicely into a hash having three keys and three associated values.

I thought about storing this bunch of hashes in an array, but then realized that an array of hashes would not let me access a particular day's data easily--the hashes would be in the order in which I saved them into the array, but that wouldn't translate easily to particular newsletter dates. Would $array[8] be the hash for December 8, 2005's banner? What happens in January of next year? Should I put the New Year's Day banner into $array[32]?

Hmmm. What kind of data structure associates a unique key, such as the date of a particular banner, with its value, such as the three different kinds of data and their values per banner? A hash, of course! I would create a hash of hashes.

The name I chose for the "parent" hash is %data_for_ad_on. The keys will be the dates of the ads, so the use of an ending preposition in the name of the hash leads to a more natural-reading and meaningful variable name. The key for the data for the December 8, 2005 banner will be 2005_12_08, for example, and the way to access the value associated with that key is $data_for_ad_on{'2005_12_08'}.

How could I store each day's hash--the three named kinds of data and their values--into the mother hash as the value for a particular banner's key (date)?

It's not possible to store a hash directly as the value of a key in another hash; a hash isn't a scalar value. Instead, I turned each day's hash of data keys and values into a reference to that hash. Making such references seemed a bit intimidating at first, but it turned to be fairly easy once I felt comfortable with some new syntax rules.

In Perl, this is how to create the hash of hashes (here I show only two newsletters' worth of data):

%data_for_ad_on = (

    '2005_12_09' =>

    {

        'url' =>

        'http://acme.com/index.html',                  'gif'      =>

        'http://myserver.com/banners/acme_banner.gif', 'headline' =>

        'Looking for quality, inexpensive widgets? Acme\'s got \'em!',

    },



    '2005_12_08' = >

    {

        'url' =>

        'http://roadrunners-r-us.com/index.html',             'gif'      =>

        'http://myserver.com/banners/roadrunners_banner.gif', 'headline' =>

        'Looking for inexpensive deliveries? Roadrunners R Us has \'em!',

    },

);

[1] [2] [3] Next

Close    To Top
  • Prev Article-Programming:
  • Next Article-Programming:
  • Now: Tutorial for Web and Software Design > Programming > Perl > Programming Content
    Photoshop Tutorial
     

    Special Effect

      3D Effect
      Photoshop Articles
    Programming Tutorial
     

    C/C++ Tutorial

      Visual Basic
      C# Tutorial
    Database Tutorial
     

    MySQL Tutorial

      MS SQL Tutorial
      Oracle Tutorial
    Geek Tutorial
     

    Blogging Tutorial

      RSS Tutorial
      Podcasting Tutorial
    Graphic Design Tutorial
      Coreldraw Tutorial
      Illustrator Tutorial
      3D Tutorials
    Webmaster Articles
     

    Domain Service

      Web Hosting
      Site Promotion
    Java Tutorial/ Articles
     

    Java Servlets

      JavaEE Tutorial
     

    JavaBeans Tutorial

    XML Tutorial/ Articles
     

    XML Style

      AJAX Tutorial
      XML Mobile
    Flash Tutorial/ Articles
     

    Flash Video

      Action Script
      Flash Articles
    OS Tutorial/ Articles
      Linux Tutorial
      Symbian Tutorial
      MacOS Tutorial
    Personal Tech
      Hardware Tutorial
      Software Tutorial
      Online Auction