
Push, Pull, Next!
by Bob DuCharme
July 06, 2005
In a
recent weblog
post, XML.com's "Python and XML" columnist Uche Ogbuji provided a
nice collection of links to discussions about the push vs. pull styles
of XSLT stylesheet development. What do we mean by "push" and "pull"?
As a short example of each, let's look at two approaches to converting
the following DocBook document to XHTML:
<book>
<title>Beneath the Underdog</title>
<para>In other words, I am three.</para>
<para>"Which one is real?"</para>
<para>"They're all real."</para>
</book>
The first stylesheet below takes a push approach. The XSLT
processor "pushes" the source tree nodes through the stylesheet, which
has template rules to handle various kinds of nodes as they come
through:
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
version="1.0">
<xsl:template match="book">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<title><xsl:value-of select="book/title"/></title>
</head>
<body>
<xsl:apply-templates/>
</body>
</html>
</xsl:template>
<xsl:template match="para">
<p><xsl:apply-templates/></p>
</xsl:template>
<xsl:template match="title">
<h4><xsl:apply-templates/></h4>
</xsl:template>
</xsl:stylesheet>
Each xsl:apply-templates instruction is the
stylesheet's way of telling the XSLT processor to send along the
context node's child nodes to the stylesheet's relevant template
rules. (Or, to quote Curtis Mayfield,
"Keep
On Pushing.")
A pull-style stylesheet minimizes the use
of xsl:apply-template instructions. It uses instructions such
as xsl:value-of and xsl:for-each to retrieve the
nodes it wants and then puts them where it needs them, like this:
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
version="1.0">
<xsl:template match="/">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<title><xsl:value-of select="book/title"/></title>
</head>
<body>
<h4><xsl:value-of select="book/title"/></h4>
<xsl:for-each select="book/para">
<p><xsl:value-of select="."/></p>
</xsl:for-each>
</body>
</html>
</xsl:template>
</xsl:stylesheet>
Few stylesheets rely strictly on push or pull
processing. For example, while my first stylesheet above includes a
template rule to convert the title element into
an h4 element when it comes along, it needs to explicitly go
get the title value to plug it into the XHTML head
element's title element. The use of such a simple source
document also made the pull example a little too clean and simple; in
the real third paragraph of the
book Beneath
the Underdog, the word "all" is emphasized, and a pure pull style
approach to handling in-line content would require contortions that
doubled the length of the stylesheet.
The pull style can feel more natural to developers
intimidated by XSLT's roots in the functional programming style used
by its ancestors DSSSL, Scheme, and LISP. A pure pull stylesheet like
the one above has one template rule that tells the XSLT processor,
"When you find the root node of the source document, do this, then do
this, then do this, then do this..." It's a series of steps to
perform, as with a typical declarative programming language. Other
template rules in such a stylesheet are usually named template
rules—that is, template rules with name attributes that
get explicitly called with xsl:call-template instructions
instead of being called when the XSLT processor finds a node matching
the condition described in the template rule's match
attribute. (An xsl:template element can have both
a match attribute and a name attribute, but typical
template rules have one or the other.) These named template rules play
the role of the subroutines or procedures of a procedural programming
language, adding modularity to a growing program the old-fashioned
way.
A Matter of Style
Considering XSLT's functional roots, though, I find the
pull approach to be unnatural, and it scales up badly. I have minimal
experience with functional languages—I never got
beyond toy examples
with DSSSL and
I struggled a
bit with Scheme and LISP school. I do have a theory why I never
had such problems with the structure of XSLT stylesheets: those of us
who began our document processing careers in the SGML days don't see
XSLT as a successor to DSSSL (which few people used in production
applications) but as a successor
to Omnimark,
which is what most developers used to turn SGML into something
else. Omnimark is a pattern matching language that uses a streaming
model, like today's SAX interfaces, and an Omnimark script is
structured almost like a series of event handlers: when
a title element comes along, do this with it; when
a para element comes along, do that with it, and so
forth.
Thinking of XSLT as an event-driven environment has served
me pretty well if I consider the XSLT processor's discovery of various
kinds of nodes as the events to write handlers for. I won't push the
analogy to event-driven development too far, but I will say that it
works much better than attempts to shoehorn XSLT into the declarative,
"do this, then this, then this," style of a purist pull approach.
Stylesheets that use the push approach also make debugging
easier. Usually, when I see people ask for help with a stylesheet,
they're hoping that a one- or two-line change will fix their
problem. I often look at one of these stylesheets, which typically
have a minimal number of template rules each trying to execute too
much program logic, and I think, "If they just rewrote it with more
template rules to handle the different source node types, this would
be easy to fix." Of course, telling people to revise the whole
architecture of their stylesheet is not what they want to hear, so
I'll rewrite part of their stylesheet using a push approach to
demonstrate "one approach to the problem."
In a panel discussion on XSLT, I once
asked Michael
Kay what aspect of XSLT was most underused and
underappreciated. I expected him to name some little-known
instruction, function, or xsl:output attribute, and he
surprised me with his reply that template rules—the most
fundamental unit of an XSLT stylesheet—weren't used enough. A
comparison of my two stylesheets above, though, demonstrates his
point: a set of template rules can usually express the logic necessary
to handle a source document's elements and attributes better than a
single template rule with lots of xsl:if
and xsl:choose instructions inside of it to express the
processing logic for that application. This is especially true with
publishing-oriented (or "document-oriented") XML documents, with their
irregular structure and in-line elements, because a pull stylesheet
can have a difficult time finding find the specific pieces of
information it needs in such documents.
Pull Advantages?
Keeping the program logic for multiple classes of nodes in
one template rule can be an advantage if you want to perform some
specific steps on each node type, as well as some other steps on all
those nodes. For example, let's say I want to wrap
every member element from the following sample document in
a p element.
<members>
<member joinDate="2003-10-03">Jimmy Osterberg</member>
<member joinDate="2005-03-07">Declan McManus</member>
<member joinDate="2003-10-03">Richard Starkey</member>
<member joinDate="2004-08-23">Vincent Furnier</member>
</members>
I want to precede each with a p element that says
"(founding member)" if the joinDate date equals "2003-10-03",
and with a p element of "(new member)" if
the joinDate attribute begins with "2005". The following does
this easily in a single template rule:
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
version="1.0">
<xsl:template match="member">
<xsl:if test="@joinDate='2003-10-03'">
<p>(founding member)</p>
</xsl:if>
<xsl:if test="substring(@joinDate,1,4) = '2005'">
<p>(new member)</p>
</xsl:if>
<p><xsl:apply-templates/></p>
</xsl:template>
</xsl:stylesheet>
XSLT 2.0: New Options
XSLT 2.0 offers another
approach. The xsl:next-match
element tells the XSLT processor to find the next most applicable
template rule for the context node being processed and apply it,
letting you apply multiple template rules to a node while still using
a push approach. Normally, when multiple template rules all have match
conditions that can describe the same element (for example, if one
template rule has a match condition of "*", another has one of
"member," and another has one of "member[@joinDate='2003-10-03'],"
they can all apply to the first member element shown above),
the XSLT processor applies the one with the most specific description
to the node—in this case, the one with a match condition of
"member[@joinDate='2003-10-03']." (The choice is actually made based
on
a priority
number to help judge how specific the description is. You can
override this by explicitly setting a priority attribute
value in the template rule.)
While an XSLT processor processes a particular node in a
template rule, the xsl:next-match instruction tells it, "Go
find the next most appropriate template rule after this one, execute
all of its instructions, and then resume in this template rule." This
lets you rewrite the stylesheet above like this, with the same
effect:
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
version="2.0">
<xsl:template match="member[@joinDate='2003-10-03']">
<p>(founding member)</p>
<xsl:next-match/>
</xsl:template>
<xsl:template match="member[substring(@joinDate,1,4) = '2005']">
<p>(new member)</p>
<xsl:next-match/>
</xsl:template>
<xsl:template match="member">
<p><xsl:apply-templates/></p>
</xsl:template>
</xsl:stylesheet>
When either the first or second template rule here is
triggered, it outputs the p element shown and then triggers
the third template rule.
It's best to use short examples in this kind of article,
and the examples above are so short that the difference between the
last two stylesheets seems trivial. You'll find that the usefulness of
the xsl:next-match instruction becomes clearer as the amount
of program logic to execute scales up. When you have different
combinations of large blocks of instructions to execute on a set of
nodes, putting these blocks inside of xsl:if instructions or
the xsl:when children of xsl:choose elements makes a
stylesheet increasingly difficult to read. When you combine the
conditional processing made possible by carefully chosen match
conditions with the template rule chaining allowed
by xsl:next-match, you can have a much more elegant, readable
solution. For even greater control over the relationship between the
calling and the called templates, you can add xsl:with-param
children to the xsl:next-match element to pass parameters,
just like you can with named templates. (See my earlier
column Setting and
Using Variables and Parameters for an introduction to this.)
The same
section of the XSLT 2.0 specification that
covers xsl:next-match covers a related
instruction: xsl:apply-imports. To understand its value,
let's first review xsl:include and xsl:import
instructions: both tell an XSLT processor to treat the identified file
as part of the stylesheet with the xsl:include
or xsl:import instruction. The latter lets you override
template rules from the imported stylesheet, making it great for
creating personal customizations of large, complex stylesheets. For
example, you can import Norm
Walsh's Docbook
stylesheets and then, after the xsl:import instruction,
add revised versions of the template rules that you've customized for
yourself. (See my earlier
column Combining
Stylesheets with Include and Import for further review with
examples.)
If the template rule that you overrode was long and
complex and you just wanted to override one or two details in an XSLT
1.0 stylesheet, you had to copy the whole thing into your importing
stylesheet and then change those
details. The xsl:apply-imports instruction gives you a new
option: it lets the overriding template rule call the imported
stylesheet's overridden one. If your overriding template rule only
needs to add a few things to the result of the overridden one, you can
add them before and after an xsl:apply-imports instruction
and let the imported template rule do the rest of the work. This
instruction also lets you add xsl:with-param children, giving
the overriding template rule even greater control over the behavior of
the overridden one.
Controlling the Flow
The pull approach to XSLT stylesheet development may give
the illusion of greater control because of its resemblance to a
declarative programming style, but it often results in some quirky
surprises that frustrate many stylesheet developers. The push approach
offers several tools to navigate the natural flow of an XSLT
processor's handling of a source tree, and XSLT
2.0's xsl:next-match and xsl:apply-imports
instructions are two tools that should make the push approach more
attractive.