Managing Rich Data Structures
by Dave Baker
|
A Persistent Hash of Hashes
Now I had a hash of hashes, but could I store it into a database? How could I use that database later to pull a particular banner's data from the stored hash? Simple text files won't work for a hash data structure, the way they can work for scalars (one value per file) or lists (one value per line, or multiple values per line separated by a pipe character or other unusual delimiter). By avoiding text files in favor of another kind of data storage, I hoped to avoid having to "open, slurp, close" three times for each banner or have my script perform directory listings to see whether a particular day's newsletter had a banner.
My solution largely came from Recipe 14.6 of Perl Cookbook, 2nd Edition. That recipe was not as specific in its example as I needed, though, so I wrote this article to share the details that I learned how to fill in. (I later learned that Recipe 11.14 provides most of the missing details.)
Here's how to store the hash of hashes using a DBM file (don't include the line numbers in any actual code, if you're cutting and pasting along at home):
=0= #!/usr/local/bin/perl
=1= use strict;
=2= use warnings;
=3= use MLDBM qw( DB_File Storable );
=4= use Fcntl;
=5= my $db = '/www/cgi-bin/databases/ad_data.db';
=6= my %data_for_ad_on;
=7= tie %data_for_ad_on, 'MLDBM', $db, O_CREAT|O_RDWR, 0644
or die "Trouble opening $db, stopped: $!";
=8= [here, paste the multi-line %data_for_ad_on statement listed earlier]
And that's it! When the script completes, it will have assigned the value of %data_for_ad_on and then saved it to a new DBM file named /www/cgi-bin/databases/ad_data.db (or any file you like, if your script has permission to write to the specified directory).
The secret is in the tie function. It associates a particular hash (%data_for_ad_on) with a class and a file. The class that works for complex data--data that contains references--is the MLDBM module available from the CPAN.
The Fcntl module facilitates the creation or updating of the database by the script, by importing the O_CREAT|O_RDWR parameters that tell the script to create the database file if it doesn't yet exist or to update (read/write) the file if it exists already.
The DB_File and Storable parameters passed to the MLDBM module indicate the ways in which to store the data on disk, including the behind-the-scenes conversion of the references into strings.
If you have hundreds of text files like I did, undoubtedly you don't look forward to hand-coding that information into a hash as shown above. In fact, it's not hard to write a short script to convert your data into the needed hash (and then MLDBM) format. Here's how I did it:
#!/usr/local/bin/perl -T
use warnings;
use strict;
use Fcntl qw( :flock O_CREAT O_RDWR );
my %data_for_ad_on;
my $dbm_filename = '/www/cgi-bin/databases/data_for_ad_on.db';
use MLDBM qw( DB_File Storable );
tie %data_for_ad_on, 'MLDBM', $dbm_filename, O_CREAT|O_RDWR, 0644
or die "Can't open $dbm_filename: $!";
my $data_dir = '/www/cgi-bin/databases/ad';
opendir my $dh, $data_dir
or die "Can't open $data_dir, stopped: $!";
my @files = grep { /\d\d\d\d_\d\d_\d\d\.txt$/ } readdir $dh;
# Because Perl's tie mechanism doesn't let us modify parts of an MLDBM value
# directly, we have to get, change and set pieces of the stored structure
# through a temporary variable ($entry).
foreach my $file (@files) {
if ($file =~ /^url_(\d\d\d\d_\d\d_\d\d)$/ ) {
my $entry = $data_for_ad_on{$1}; # Get
open (FILE, "$data_dir/$file")
or die "Couldn't open $data_dir/$file: $!";
flock FILE, LOCK_SH
or die "Can't flock $data_dir/$file: $!";
my $url = do { local $/; <FILE> };
$url =~ s/^\s+//g; # So long, leading whitespace
$url =~ s/\s+$//g; # So long, trailing whitespace
close FILE;
$entry->{url} = $url; # Change
$data_for_ad_on{$1} = $entry; # Set
print "Just set target URL for $1\n";
}
elsif ($file =~ /^gif_(\d\d\d\d_\d\d_\d\d)$/ ) {
my $entry = $data_for_ad_on{$1}; # Get
open (FILE, "$data_dir/$file")
or die "Couldn't open $data_dir/$file: $!";
flock FILE, LOCK_SH
or die "Can't flock $data_dir/$file: $!";
my $gif = do { local $/; <FILE> };
$gif =~ s/^\s+//g; # So long, leading whitespace
$gif =~ s/\s+$//g; # So long, trailing whitespace
close FILE;
$entry->{gif_URL} = $gif; # Change
$data_for_ad_on{$1} = $entry; # Set
print "Just set location URL of banner for $1\n";
}
elsif ($file =~ /^headline_(\d\d\d\d_\d\d_\d\d)$/ ) {
my $entry = $data_for_ad_on{$1}; # Get
open (FILE, "$data_dir/$file")
or die "Couldn't open $data_dir/$file: $!";
flock FILE, LOCK_SH
or die "Can't flock $data_dir/$file: $!";
my $headline = do { local $/; <FILE> };
$headline =~ s/^\s+//g; # So long, leading whitespace
$headline =~ s/\s+$//g; # So long, trailing whitespace
close FILE;
$entry->{headline} = $headline; # Change
$data_for_ad_on{$1} = $entry; # Set
print "Just set headline for $1\n";
}
}
After the script runs, it will have converted all of the data into pieces of the mother hash being tied to our database. When the script quits, it automatically stores the data in the database and unties the hash.
First, the code opens the directory where I've stored all of my small text files. It then puts into the @files array all those filenames that have a particular date-type sequence and end in ".txt." This picks up my url_2005_12_09.txt, gif_2005_12_09.txt, and headline_2005_12_09.txt files for December 9, 2005, for example, and the three similar files for each other date that has data files in the directory.
Prev [1] [2] [3] Next