The Evolution of Perl Email Handling

The Evolution of Perl Email Handling

by Simon Cozens
June 10, 2004

I spend the vast majority of my time at a computer working with email, whether it's working through the ones I send and receive each day, or working on my interest in analyzing, indexing, organizing, and mining email content. Naturally, Perl helps out with this.

There are many modules on the CPAN for slicing and dicing email, and we're going to take a whistlestop tour of the major ones. We'll also concentrate on an effort started by myself, Richard Clamp, Simon Wistow, and others, called the Perl Email Project, to produce simple, efficient and accurate mail handling modules.

Message Handling

We'll begin with those modules that represent an individual message, giving you access to the headers and body, and usually allowing you to modify these.

The granddaddy of these modules is Mail::Internet, originally created by Graham Barr and now maintained by Mark Overmeer. This module offers a constructor that takes either an array of lines or a filehandle, reads a message, and returns a Mail::Internet object representing the message. Throughout these examples, we'll use the variable $rfc2822 to represent a mail message as a string.

    my $obj = Mail::Internet->new( [ split /\n/, $rfc2822 ] );

Mail::Internet splits a message into a header object in the Mail::Header class, plus a body. You can get and set individual headers through this object:

    my $subject = $obj->head->get("Subject");

    $obj->head->replace("Subject", "New subject");

Reading and editing the body is done through the body method:

    my $old_body = $obj->body;

    $obj->body("Wasn't worth reading anyway.");

I've not said anything about MIME yet. Mail::Internet is reasonably handy for simple tasks, but it doesn't handle MIME at all. Thankfully, MIME::Entity is a MIME-aware subclass of Mail::Internet; it allows you to read individual parts of a MIME message:

    my $num_parts = $obj->parts;

    for (0..$num_parts) {

        my $part = $obj->parts($_);

        ...

    }

If Mail::Internet and MIME::Entity don't cut it for you, you can try Mark Overmeer's own Mail::Message, part of the impressive Mail::Box suite. Mail::Message is extremely featureful and comprehensive, but that is not always meant as a compliment.

Mail::Message objects are usually constructed by Mail::Box as part of reading in an email folder, but can also be generated from an email using the read method:

    $obj = Mail::Message->read($rfc2822);

Like Mail::Internet, messages are split into headers and bodies; unlike Mail::Internet, the body of a Mail::Message object is also an object. We read headers like so:

    $obj->head->get("Subject");

Or, for Subject and other common headers:

    $obj->subject;

I couldn't find a way to set headers directly, and ended up doing this:

    $obj->head->delete($header);

    $obj->head->add($header, $_) for @data;

Reading the body as a string is only marginally more difficult:

    $obj->decoded->string

While setting the body is an absolute nightmare--we have to create a new Mail::Message::Body object and replace our current one with it.

    $obj->body(Mail::Message::Body->new(data => [split /\n/, $body]));

Mail::Message may be slow, but it's certainly hard to use. It's also rather complex; the operations we've looked at so far involved the use of 16 classes (Mail::Address, Mail::Box::Parser, Mail::Box::Parser::Perl, Mail::Message, Mail::Message::Body, Mail::Message::Body::File, Mail::Message::Body::Lines, Mail::Message::Body::Multipart, Mail::Message::Body::Nested, Mail::Message::Construct, Mail::Message::Field, Mail::Message::Field::Fast, Mail::Message::Head, Mail::Message::Head::Complete, Mail::Message::Part, and Mail::Reporter) and 4400 lines of code. It does have a lot of features, though.

Foolishly, I thought that email parsing shouldn't be so complex, and so I sat down to write the simplest possible functional mail handling library. The result is Email::Simple, and its interface looks like this:

    my $obj = Email::Simple->new($rfc2822);

    my $subject = $obj->header("Subject");

    $obj->header_set("Subject", "A new subject");

    my $old_body = $obj->body;

    $obj->body_set("A new body\n");

    print $obj->as_string;

It doesn't do a lot, but it does it simply and efficiently. If you need MIME handling, there's a subclass called Email::MIME, which adds the parts method.

Realistically, the choice of which mail handling library to use ought to be up to you, the end user, but this isn't always true. Auxilliary modules, which mess about with email at a higher level, can ask for the mail to be presented in a particular representation. For instance, until recently, the wonderful Mail::ListDetector module, which we'll examine later, required mails passed in to it to be Mail::Internet objects, since this gave it a known API to work with the objects. I don't want to work with Mail::Internet objects, but I want to use Mail::ListDetector's functionality. What can I do?

In order to enable the user to have the choice again, I wrote an abstraction layer across all of the above modules, called Email::Abstract. Given any of the above objects, we can say:

     my $subject = Email::Abstract->get_header($obj, "Subject");

     Email::Abstract->set_header($obj, "Subject", "My new subject");

     my $body = Email::Abstract->get_body($obj);

     Email::Abstract->set_body($message, "Hello\nTest message\n");

     $rfc2822 = Email::Abstract->as_string($obj);

Email::Abstract knows how to perform these operations on the major types of mail representation objects. It also abstracts out the process of constructing a message, and allows you to change the interface of a message using the cast class method:

    my $obj = Email::Abstract->cast($rfc2822, "Mail::Internet");

    my $mm = Email::Abstract->cast($obj, "Mail::Message");

This allows module authors to write their mail handling libraries in an interface-agnostic way, and I'm grateful to Michael Stevens for taking up Email::Abstract in Mail::ListDetector so quickly. Now I can pass in Email::Simple objects to Mail::ListDetector and it will work fine.

Email::Abstract also gives us the opportunity to create some benchmarks for all of the above modules. Here was the benchmarking code I used:

    use Email::Abstract;

    my $message = do { local $/; <DATA>; };

    my @classes =

        qw(Email::MIME Email::Simple MIME::Entity Mail::Internet Mail::Message);



    eval "require $_" or die $@ for @classes;



    use Benchmark;

    my %h;

    for my $class (@classes) {

        $h{$class} = sub {

            my $obj = Email::Abstract->cast($message, $class);

            Email::Abstract->get_header($obj, "Subject");

            Email::Abstract->get_body($obj);

            Email::Abstract->set_header($obj, "Subject", "New Subject");

            Email::Abstract->set_body($obj, "A completely new body");

            Email::Abstract->as_string($obj);

        }

    }

    timethese(1000, \%h);



    __DATA__

    ...

I put a short email in the DATA section and ran the same simple operations a thousand times: construct a message, read a header, read the body, set the header, set the body, and return the message as a string.

    Benchmark: timing 1000 iterations of Email::MIME, Email::Simple, 

    MIME::Entity, Mail::Internet, Mail::Message...

    Email::MIME: 10 wallclock secs ( 7.97 usr +  0.24 sys =  8.21 CPU) 

        @ 121.80/s (n=1000)

    Email::Simple:  9 wallclock secs ( 7.49 usr +  0.05 sys =  7.54 CPU) 

        @ 132.63/s (n=1000)

    MIME::Entity: 33 wallclock secs (23.76 usr +  0.35 sys = 24.11 CPU) 

        @ 41.48/s (n=1000)

    Mail::Internet: 24 wallclock secs (17.34 usr +  0.30 sys = 17.64 CPU) 

        @ 56.69/s (n=1000)

    Mail::Message: 20 wallclock secs (17.12 usr +  0.27 sys = 17.39 CPU) 

        @ 57.50/s (n=1000)

The Perl Email Project was a success: Email::MIME and Email::Simple were twice as fast as their nearest competitors. However, it should be stressed that they're both very low level; if you're doing anything more complex than the operations we've seen, you might consider one of the older Mail:: modules.

[1] [2] Next

Close    To Top
  • Prev Article-Programming:
  • Next Article-Programming:
  • Now: Tutorial for Web and Software Design > Programming > Perl > Programming Content
    Photoshop Tutorial
     

    Special Effect

      3D Effect
      Photoshop Articles
    Programming Tutorial
     

    C/C++ Tutorial

      Visual Basic
      C# Tutorial
    Database Tutorial
     

    MySQL Tutorial

      MS SQL Tutorial
      Oracle Tutorial
    Geek Tutorial
     

    Blogging Tutorial

      RSS Tutorial
      Podcasting Tutorial
    Graphic Design Tutorial
      Coreldraw Tutorial
      Illustrator Tutorial
      3D Tutorials
    Webmaster Articles
     

    Domain Service

      Web Hosting
      Site Promotion
    Java Tutorial/ Articles
     

    Java Servlets

      JavaEE Tutorial
     

    JavaBeans Tutorial

    XML Tutorial/ Articles
     

    XML Style

      AJAX Tutorial
      XML Mobile
    Flash Tutorial/ Articles
     

    Flash Video

      Action Script
      Flash Articles
    OS Tutorial/ Articles
      Linux Tutorial
      Symbian Tutorial
      MacOS Tutorial
    Personal Tech
      Hardware Tutorial
      Software Tutorial
      Online Auction