Copying, Deleting, and Renaming Elements
by Bob DuCharme
June 07, 2000
Welcome to "Transforming XML." Each column will explain how to handle
two or three basic document manipulation tasks using the W3C Standard that was
spun off from the Extensible Stylesheet Language (XSL): the XSL
Transformations Language, or XSLT. In this first column, we'll start with the
basics -- the use of style sheets, the role of the
xsl:stylesheet element, and how to copy, delete, and rename
elements. (For other material on XSLT that's appeared in XML.com and
elsewhere, see the XML.com Resource Guide.)
XSL Style Sheets
XSLT, according to the W3C
Recommendation that specifies it, is "a language for transforming XML
documents into other XML documents." As XML becomes more popular, and the
dreams of shared DTDs often prove unrealistic, a quick and easy way to
convert documents that conform to your DTD into documents that conform to my
DTD becomes very valuable. This is especially so if you and I want to do
business together without going to the trouble of authoring a
DTD that we can both agree on.
An XSLT style sheet is an XML document that uses specialized element
types from the http://www.w3.org/1999/XSL/Transform
namespace to specify how to transform a set of elements. Technically, it's not
transforming elements into elements, but a source tree into a result tree.
This is good news, because by reading a document into a tree structure in
memory before carrying out the style sheet's transformations, an XSLT processor
can use information from anywhere in the tree when transforming a
particular element (or rather, a particular tree node)
because the whole document is sitting there in memory.
An XSLT processor is a program that applies an XSLT style sheet to a
tree representation of an input document, and creates a result tree based upon
the style sheet's instructions. Most processors read an XML document into the
input tree first, and output the result tree as another document after
finishing the transformation, with a net effect of converting one document into
another.
Currently, the most popular implementations are James Clark's XT, the Apache
XML Project's Xalan, and Michael Kay's SAXON. (A recent
XSL-List posting from Clark about having no plans for further XT
development is bound to hurt its long-term popularity.) Internet Explorer
also implements some of XSLT, but its support of the W3C XSLT standard is
still a bit idiosyncratic; see their XSL
Developer's Guide for details. Check each of these XSLT processors'
documentation for information on how to tell it to "use this XSL style sheet to
turn this XML input document into this output document."
The document (root) element of an XSLT style sheet is usually an
xsl:stylesheet element, but it doesn't have to be that exact
element:
-
A style sheet can use xsl:transform as a synonym for xsl:stylesheet.
-
You don't have to use xsl as the namespace
prefix to point to the namespace mentioned above, but it is a common convention.
-
There are ways to incorporate XSLT instructions directly into a
document that doesn't use or refer to an xsl:stylesheet or
xsl:transform element, but a serious transformation usually
uses one of these in its own file.
XSLT offers various element types as potential children of this
xsl:stylesheet element, each providing different style sheet
instructions to the XSLT processor. The most important is
xsl:template, which specifies a template rule.
Copying Elements to the Output
A template rule essentially says "when you find an input tree node that corresponds to the value of my match attribute, output text with the structure described by the template in my contents." The value of the match attribute can be a simple element type name,
or a more complex pattern describing the element, attribute, comment, or processing instruction nodes that the template applies to.
Two popular XSLT elements to include in a template rule's contents
are xsl:copy, which copies the current node, and
xsl:apply-templates, which processes the children of the
current node. For example, the single template in the following style sheet
will copy the start-tags, end-tags, and contents of all
title elements to the output. (Because of XSLT's default
transformation rules, the contents of other elements will also be output
without their tags.)
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
<xsl:template match="title">
<xsl:copy>
<xsl:apply-templates/>
</xsl:copy>
</xsl:template>
</xsl:stylesheet>
Note that this template rule only acts on nodes representing
title elements. Any
attributes of title elements have
their own nodes in the input tree and require their own template rule or rules
if the XSLT processor is supposed to copy them to the output.
The xsl:copy-of element, on the other hand, can
copy the entire subtree of each node that the template selects. This includes
attributes, if the xsl:copy-of element's
select attribute has the appropriate value. In the following example,
the template copies title element nodes and
all of their descendant nodes -- in other words, the complete
title elements, including their tags, subelements, and
attributes:
<xsl:template match="title">
<xsl:copy-of select="*"/>
</xsl:template>
Deleting Elements
If a template rule says "output my contents when you find an
input tree node that corresponds to the value of my match attribute," what happens if
there is no content, as with the following two templates?
<xsl:template match="nickname">
</xsl:template>
<xsl:template match="project[@status='canceled']">
</xsl:template>
They'll output nothing, essentially deleting the matched nodes
from the output. The first template rule says "when you find a
nickname element, output nothing." The second takes
advantage of the flexibility allowed in the patterns that are legal values for
the template element's match attribute.
While a match value of "project" would delete
all the project elements from the
output, the match value shown will only delete
project elements whose status attributes
have the string "canceled" as their value.
Changing Element Names
We saw above that
xsl:apply-templates processes only the children of the
current node. For an element, this means everything between the
tags, but nothing in the tags themselves.
If your template outputs an input element's content but not its
tags, you can surround that content with anything you want, as long as it
doesn't prevent the output document from being well-formed. For example, the
following template rule tells an XSLT processor to take any
article element fed to it as input, and output its
contents surrounded by html tags.
<xsl:template match="article">
<html>
<xsl:apply-templates/>
</html>
</xsl:template>
The html tags add an actual
html element to the style sheet, but because the tags have
no xsl: prefix, the resulting html
element is known in XSLT as a "literal result element." The element isn't
some special XSLT instruction, so an XSLT processor will leave it alone and
pass its tags along to the output looking just like they do in the
style sheet.
Instead of enclosing the article template
rule's xsl:apply-templates element with
html tags, another way to convert article elements to html elements would
be to enclose the xsl:apply-templates element with an
xsl:element element that had "html" specified as the value
for its name attribute. In this particular case, that would have been
overkill -- the markup shown above is much simpler and gets the job
done -- but the xsl:element element's ability to provide
the element type name in an attribute value lets you use expressions that are
more complex than a simple string like "html" as that element type name. This
makes it possible to dynamically create the element name by concatenating
strings, calling functions, or by retrieving element content or
attribute values from elsewhere in the document to use in the element
name. We'll learn more about these tricks in future "Transforming XML" columns.