The Extensible Style Language - XSL
by Norman Walsh
January 19, 1999
Styling XML Documents
Reprinted from Web Techniques
January 1999
From the earliest days of the Web, we've been
using essentially the same set of tags in our documents. Web pages written
in HTML use HTML tags and the meaning of those tags is well understood: <H1>
makes a heading, <IMG> loads a graphic, <OL>
starts an ordered list, and so on. The number of tags has slowly grown, and
there have been numerous browser-compatibility issues, but the basic tag set
is still the same.
There's a significant benefit to a fixed tag set with fixed semantics: portability.
A Web page that uses the standard tags can be viewed by just about any browser,
anywhere in the world. However, HTML is very confining; Web designers want
more control over presentation and many processes would benefit from more
descriptive tagging.
Enter XML. With XML, we can use any tags we want. We can write documents
using our own tag namesnames that are meaningful in the context of our
subject matter and offer the possibility of far greater control over presentation.
But this freedom comes at a price: XML tag names have no predefined semantics.
An <H1> might just as legitimately identify a tall hedge
as a first-level heading. Is <IMG> an image, or an imaginary
number? Who knows?
The style sheet knows. From the very beginning of the XML effort, it was
recognized that in order to successfully send XML documents over the Web,
it would be necessary to have a standard mechanism for describing how they
were to be presented. That's why we need style sheets.
The Extensible Style Language (XSL) is the style language for XML. At the
time of this writing (October 1998), XSL is under active development by the
W3C. On August 18, 1998, the XSL Working Group (WG) released its first Working
Draft. This article introduces XSL as described in that document. (Visit www.w3.org/TR/WD-xsl
to view the Working Draft for yourself.)
By the time this article is published, a second Working Draft may be available.
It doesn't seem likely that any of the topics covered here will change substantially
between the first and second Working Drafts, but it's always possible.
What Does a Style Sheet Do?
In simplest terms, a style sheet contains instructions that tell a processor
(such as a Web browser, print composition engine, or document reader) how
to translate the logical structure of a source document into a presentational
structure.
Style sheets typically contain instructions like these:
- Display hypertext links in blue.
- Start chapters on a new, left-hand page.
- Number figures sequentially throughout the document.
- Speak emphasized text in a slightly louder voice.
Many style-sheet languages augment the presentation of elements that have
a built-in semantic meaning. For example, a Microsoft Word paragraph style
can change the presentation of a paragraph, but even without the style, Word
knows that the object in question is a paragraph.
The challenge for XSL is slightly greater. Because there's no underlying
semantic to augment for XML, XSL must specify how each element should be presented
and what the element is. For this reason, XSL defines not only a language
for expressing style sheets, but also a vocabulary of "formatting objects"
that have the necessary base semantics.
For the purpose of this article, we're going to consider a simple XML document,
shown in Example 1:
Example 1: A simple XML document.
<?xml version='1.0'?>
<doc><title>My Document</title>
<para>This is a <em>short</em> document.</para>
<para>It only exists to <em>demonstrate a <em>simple</em>
XML document</em>.</para>
<figure><title>My Figure</title>
<graphic fileref="myfig.gif"/>
</figure>
</doc>
This document contains only a few elements:
- doc defines document element;
- title defines titles;
- para defines paragraphs;
- em indicates emphasis;
- figure and graphic define external graphics.
How Does XSL Work?
Before discussing XSL in more detail, it's necessary to consider the XSL
processing model. An XSL processor begins with a style sheet and a "source
tree." The source tree is the tree representation of the parsed XML source
document. All XML documents can be represented as trees.
Conceptually, the XSL processor begins at the root node in the source tree
and processes it by finding the template in the style sheet that describes
how that element should be displayed. Each node is then processed in turn
until there are no more nodes left to be processed. (In fact, it's a little
more complicated than this because each template can specify which nodes to
process, so some nodes may be processed more than once and some may not be
processed at all. We'll examine this later.)
The product of all this processing is a "result tree." If the result tree
is composed of XSL formatting objects, then it describes how to present the
source document. It's a feature of XSL that the result tree doesn't have to
be composed of XSL formatting objectsit can be composed of any elements.
One common alternative to XSL formatting objects will be HTML element names.
When HTML is used in the result tree, XSL will transform an XML source document
into an XML document that looks very much like HTML. It's important to realize,
however, that the result is XML, not HTML. In particular, empty elements will
use the XML empty-element syntax, and it's impossible to produce documents
that are not well-formed XML.
What Does XSL Look Like?
XSL style sheets are XML documents. A short XSL style sheet can be seen in
Example 2. This style sheet transforms source
documents like the XML document in Example 1
into HTML. A style sheet is contained within a style sheet element and contains
template elements. (Style sheets can contain a small handful of elements in
addition to the template, but most style sheets consist of mostly templates.)
Example 2: A simple XSL style sheet that generates HTML from XML.
<xsl:stylesheet
xmlns:xsl="http://www.w4.org/TR/WD-xsl">
<xsl:template pattern="doc">
<HTML>
<HEAD>
<TITLE>A Document</TITLE>
</HEAD>
<BODY>
<xsl:process-children/>
</BODY>
</HTML>
</xsl:template>
<xsl:template pattern="title">
<H1>
<xsl:process-children/>
</H1>
</xsl:template>
<!-- this stylesheet handles only a
subset of the sample document -->
</xsl:stylesheet>
Don't worry if this looks a little confusing at first. There's a lot going
on. We'll revisit this style sheet in the "Understanding XSL" section.
One thing that stands out in an XSL style sheet is the use of namespaces.
(covered in two articles in this issue of XML.com),
namespaces are what all the colon-delimited prefixes are about.
In XSL, there can be no reserved element names, so it's necessary to use
some other mechanism to distinguish between elements that have XSL semantics
and other elements. This is the problem that namespaces were designed to solve.
If you're not familiar with namespaces, here are some simple guidelines:
The prefix is significant when comparing element names; therefore xsl:template
and template are different.
The prefix string is arbitrary. What's important is the association of a
prefix string with a URI. That's the function of the "xmlns:"
attribute on the stylesheet.
The attribute
xmlns:xsl="http:// www.w3.org/TR/WD-xsl"
associates the namespace prefix "xsl" with the URI that follows
it:
("http://www.w3.org/TR/ WD-xsl").
If it were instead
xmlns:xyzzy="http://www.w3.org/ TR/WD-xsl"
then the prefix xyzzy: would replace every instance of xsl:
in the example, and the style sheet would be exactly the same.
From the preceding points, it follows that xsl:template and
xyz:template are different (unless the two namespace prefixes
are associated with the same URI).
[1] [2] [3] Next