Where Wizards Fear To Tread

Where Wizards Fear To Tread

Hacking The Op Tree For Fun And...

by Simon Cozens
May 07, 2002

So you're a Perl master. You've got XS sorted. You know how the internals work. Hey, there's nothing we can teach you on perl.com that you don't already know. You think? Where Wizards Fear To Tread brings you the information you won't find anywhere else concerning the very top level of Perl hackery.

Putting Down Your Roots

This month, we look at the Perl op tree. Every Perl program is compiled into an internal representation before it is executed. Functions, subroutine calls, variable accesses, control structures, and all that makes up a Perl program, are converted into a series of different fundamental operations (ops) and these ops are strung together into a tree data structure.

For more on the different types of ops available, how they fit together, and how to manipulate them with the B compiler module, look at the Perl 5 internals tutorial. Right now, though, we're going to take things a step further.

B and Beyond With B::Utils

The B module allows us to get at a wealth of information about an op, but it can become incredibly frustrating to know which op you want to deal with, and to perform simple manipulation on a range of ops. It also offers limited functionality for navigating around the op tree, meaning that you need to hold onto a load of additional state about which op is where. This gets complicated quickly. Finally, it's not easy to get at the op trees for particular subroutines, or indeed, all subroutines both named and anonymous.

B::Utils was created at the request of Michael Schwern to address these issues. It offers much more high-level functionality for navigating through the tree, such as the ability to move ``upward'' or ``backward,'' to return the old name of an op that has currently been optimized away, to get a list of the op's children, and so on. It can return arrays of anonymous subroutines, and hashes of subroutine op roots and starts. It also contains functions for walking through the op tree from various starting points in various orders, optionally filtering out ops that don't match certain conditions; while performing actions on ops, B::Utils provides carp and croak routines which perform error reporting from the point of view of the original source code.

But one of the most useful functions provided by B::Utils is the opgrep routine. This allows you to filter a series of ops based on a pattern that represents their attributes and their position in a tree. The major advantage over doing it yourself is that opgrep takes care of making sure that the attributes are present before testing them - the seasoned B user is likely to be accustomed to the carnage that results from accidentally trying to call name on a B::NULL object.

For instance, we can find all the subroutine calls in a program with



    walkallops_filtered (

        sub { opgrep( { name => "entersub" }, @_) },

        sub { print "Found one: $_[0]\n"; }

    );

C<opgrep> supports alternation and negation of attribute queries. For instance, here are all the scalar variable accesses, whether to globals or lexicals:



    @svs = opgrep ( { name => ["padsv", "gvsv"] }, @ops)

And as for checking an op's position in the tree, here are all the exec ops followed by a nextstate and then followed by something other than exit, warn or die:



  walkallops_filtered(

      sub { opgrep( {

                        name => "exec",

                        next => {

                           name    => "nextstate",

                           sibling => { 

                                         name => [qw(! exit warn die)] 

                                      }

                                }

                    }, @_)},

      sub {

            carp("Statement unlikely to be reached");

            carp("\t(Maybe you meant system() when you said exec()?)\n");

      }

  )

Don't Do That, Do This

So, what can we do with all this? The answer is, of course, ``anything we want.'' If you can mess about with the op tree, then you have complete control over Perl's operation. Let's take an example.

Damian Conway recently released the Acme::Don't module, which doesn't do anything:



    don't { print "Something\n" }

doesn't print anything. Very clever. But not clever enough. You see, I like double negatives:



    my $x = 1;

    don't { print "Something\n" } unless $x;

doesn't print anything either, and if you like double negatives, then you might agree that it should print something. But how on earth are we going to get Perl to do something when a test proves false? By messing about with the op tree, of course.

The way to solve any problem like this is to think about the op tree that we've currently got, work out what we'd rather do instead, and work out the differences between the op trees. Then, we write something that looks for a given pattern in a program's op tree and modifies it to be what we want.

There are several ways of achieving what we actually want to get but the simplest one is this: add a second parameter to don't which, if set, actually does do the code. This allows us to replace any occurrence of


    don't { print "Something\n" } if (condition);

with



    don't(sub { print "Something\n" }, 1) unless (condition);

Let's now look at this in terms of op trees. Here's the relevant part of the op tree for don't { ... } if $x, produced by running perl -MO=Terse and then using sed to trim out to unsightly hex addresses:



    UNOP  null

        LOGOP  and

            UNOP  null [15]

                SVOP  *x

            UNOP  entersub [2]

                UNOP  null [141]

                    OP  pushmark

                    UNOP  refgen

                        UNOP  null [141]

                            OP  pushmark

                            SVOP  anoncode  SPECIAL #0 Nullsv

                    UNOP  null [17]

                        SVOP  *don::t

As we can see, the if is represented as an and op internally, which makes sense if you think about it. The two ``legs'' of the and, called ``first'' and ``other,'' are a call to fetch the value of $c, and a subroutine call. Look at the subroutine call closely: the ops ``inside'' this set up a mark to say where the parameters start, push a reference to anonymous code (that's our { ... }) onto the stack, and then push the glob for *don::t on there.

So, we need to do two things: We need to insert another parameter between refgen and the null attached to *don::t, and we need to invert the sense of the test.

Now we know what we've got to do, let's start doing it - remember our solution: stage one, write code to find the pattern.

This is actually pretty simple: We're looking for either an and or an or op, and the ``other'' leg of the op is going to be a call to *don::t. However, we have to be a bit clever here, since Perl internally performs a few optimizations on the op tree that even the B::* reporting modules don't tell you about. When Perl threads the next pointers around an op tree, it does something special for a short-circuiting binary op like and or or - it sets the other pointer to be not the first sibling in the tree, but the first op in execution order. In this case, that's pushmark, as we can see from running B::Terse,exec:



    LOGOP (0x80fa008) and

    AND => {

        OP (0x80f9f88) pushmark

        OP (0x80f9f20) pushmark

        SVOP (0x80f9ec0) anoncode  SPECIAL #0 Nullsv

        ...

With this knowledge, we can create a pattern to pass to opgrep:



    {

        name => ["and", "or"],

        other => {

            name => "pushmark",

            sibling => { next => { name => "gv" }}

        }

    }

Unfortunately, this doesn't tell us the whole story, since we actually need to check that the subroutine call is to don't, rather than to any other given subroutine that might be called conditionally. Hence, our filter looks like this:



    sub {

        my $op = shift;

        opgrep(

            {

                name => ["and", "or"],

                other => {

                    name => "pushmark",

                    sibling => { next => { name => "gv" }}

                }

            }, $op) or return;

        my $gv = $op->other->sibling->next->gv;

        return unless $gv->STASH->NAME eq "don" and $gv->NAME eq "t";

        return 1;

    }

We grab the GV (we know exactly where it's going to be because of our pattern!) and test that it's in the don stash and is called t.

Part one done - we have located the ops that we want to change. Now how on earth do we change ops in an op tree?

[1] [2] Next

Close    To Top
  • Prev Article-Programming:
  • Next Article-Programming:
  • Now: Tutorial for Web and Software Design > Programming > Perl > Programming Content
    Photoshop Tutorial
     

    Special Effect

      3D Effect
      Photoshop Articles
    Programming Tutorial
     

    C/C++ Tutorial

      Visual Basic
      C# Tutorial
    Database Tutorial
     

    MySQL Tutorial

      MS SQL Tutorial
      Oracle Tutorial
    Geek Tutorial
     

    Blogging Tutorial

      RSS Tutorial
      Podcasting Tutorial
    Graphic Design Tutorial
      Coreldraw Tutorial
      Illustrator Tutorial
      3D Tutorials
    Webmaster Articles
     

    Domain Service

      Web Hosting
      Site Promotion
    Java Tutorial/ Articles
     

    Java Servlets

      JavaEE Tutorial
     

    JavaBeans Tutorial

    XML Tutorial/ Articles
     

    XML Style

      AJAX Tutorial
      XML Mobile
    Flash Tutorial/ Articles
     

    Flash Video

      Action Script
      Flash Articles
    OS Tutorial/ Articles
      Linux Tutorial
      Symbian Tutorial
      MacOS Tutorial
    Personal Tech
      Hardware Tutorial
      Software Tutorial
      Online Auction