XSLT UK 2001 Report
by Jeni Tennison
April 25, 2001
April 8th and 9th 2001 saw the first conference dedicated to XSLT
take place at Keble College in Oxford. While the basis of the
conference was XSLT, this didn't stop people talking about the XSL
effort in general or about other vocabularies and technologies that
work with or against XSLT.
Opening Address
The conference was opened by Norm Walsh from Sun Microsystems,
member of the XSL Working Group and maintainer of one of the more
complex XSL applications -- the DocBook XSL family, which he talked
about later in the day. Norm set the scene for the conference,
reminding us of the origins of XSLT and outlining four requirements
that will make XSLT and XPath as ubiquitous as XML has become:
- interoperable tools,
- cooperative specs,
- optimizations or compilations of stylesheets, and
- information set pipelines.
XSLT and the Art of Motorcycle Maintenance
Next up was David Carlisle, from NAG Ltd., one of the editors of
MathML and an XSL-List regular. David gave another view of XSLT's
heritage, as a functional programming language fitting into the same
development path as Scheme or DSSSL. He outlined the benefits of
taking a functional approach to presenting information, especially
with web-based content, where random access means that you need
something that allows you to process only parts of the content and
still work reliably (for example, in numbering pages without having to
process each page to construct the number). David had the title for
his talk thrust upon him, but he still managed to bring in a reference
to the seminal book "Zen and the Art of Motorcycle Maintenance" with a
quote.
After a while he says, "Can I have a motorcycle when I get old
enough?"
"If you take care of it."
"What do you have to do?"
"Lots of things. You've been watching me."
"Will you show me all of them?"
"Sure."
"Is it hard?"
"Not if you have the right attitudes. It's having the right
attitudes that's hard."
"Oh."
After a while I see he is sitting down again. Then he says,
"Dad?"
"What?"
"Will I have the right attitudes?"
"I think so," I say. "I don't think that will be any problem at
all."
And so we ride on and on, down through Ukiah, and Hopland, and
Cloverdale, down into the wine country...
Beginners can find XSLT difficult to deal with, especially when
they come from a procedural languages background. But XSLT isn't hard
if you have the right attitude.
XSLT Design Patterns
I spoke next, representing only myself and drawing on my experience
answering questions on XSL-List. I outlined some of the design patterns that
have emerged in the use of XSLT. Using examples from an application I worked
on for Xi advise bv as an example, I spoke about four levels of design
patterns.
- application level
- combining stylesheets and using XSLT within a wider context -- I
specifically talked about getting multiple views of the same data using
XSLT
- stylesheet level
- the flow of processing within the application -- I talked about the
differences between push and pull, and how to combine them, and about
grouping by position, in hierarchies and by value (using the Muenchian
Method)
- template level
- patterns in instructions such as Wendell Piez's method for repetition
and David Allouche's method for normalizing strings
- XPath level
- expressions for getting unique nodes, for set manipulation and for
conditional XPaths, such as Oliver Becker's method
Throughout, I talked about the way that identifying these methods
can help us to identify the areas where XSLT and XPath need to be
developed.
XSLT Performance
We were then treated to a talk by Mike Kay that highlighted the
experiences of implementers. Now at Software AG, he is a member of
the XSL Working Group and another regular contributor on XSL-List, but
he's probably most well known as the implementer of the Saxon XSLT
processor and the author of the XSLT Programmer's Reference.
Mike spoke about XSLT performance. Kay advised that you only need
to worry about the performance of XSLT processors or stylesheets if
you have business requirements that require a certain throughput or
response time, although you might also be concerned about the
predictability, tuneability, or scalability of a particular
stylesheet.
While he didn't specifically talk about Saxon, Mike showed the
basic way an XSLT processor works: taking the XML stylesheet, turning
it into a tree, 'compiling' that tree, similarly taking the XML source
and turning that into a tree, and then constructing the result tree
(theoretically in memory, but often practically outputting it
immediately).
Mike described the most important things for XSLT processor
efficiency: tight code, name management, XPath queries, XSLT pattern
matching, pipelining, and the storage of node sets. He discussed the
issues involved in constructing a node tree for XPath/XSLT processing,
especially given its differences from the DOM. (XPath node trees don't
include CDATA or entity nodes, and there is different handling of
whitespace.) He also outlined the Tiny Tree Model that he now uses in
Saxon (after seeing a similar technique in Xalan), where transient
objects are created from arrays as required. This gives real
advantages, allowing run-time decisions about the kinds of access
paths that should be stored (for example, you only need to store
information about what a node's parent is if you need to access a
node's parent).
The areas for future optimization that implementers have barely
touched yet are
- parallel execution, which should be possible as XSLT is
side-effect free
- compilation of stylesheets into byte code,
something picked up by Morten Jrgensen in the next talk
- global optimization of processing flow, as opposed to local
optimization of XPaths
- serial transformations, if it's
possible to detect those (parts of) transformations that don't
require access to the entire tree
- exploiting XML
schemas
There were some tips for users too:
- follow good performance engineering practice: record the time a
stylesheet takes before and after making each change, and change it back
if it doesn't improve
- use small documents rather than large ones
- don't assume that the processor makes a particular optimization
- minimize the number of visits to each node
- use variables
- use temporary trees (result tree fragments in XSLT 1.0)
- use keys
- don't use
xsl:number
- don't care about the changes that can only give less than 10%
improvement
The XSLT Compiler for JVM
Morten Jrgensen, from Sun Microsystems, introduced the XSLT
Compiler (XSLTC). XSLTC creates "translets": Java classes that run
about 30-200% faster than interpretive XSLT processors and are usually
about a quarter of the size of an XSLT processor and
stylesheet. Because of their size and platform independence, these
translets can run on virtually anything, including handheld
machines.
With XSLTC, stylesheets can be compiled into translet bundles, each
one of which contains a main class and a set of auxiliary classes for
elements that require special handling. These are shipped with an XSLT
runtime library, containing a tailored DOM with SAX interfaces for
input and output.
For authors using XSLTC, Morten outlined a few tips. The main body
of a translet is a switch statement, which each case being a
particular match pattern. Authors should therefore keep match patterns
simple and, in particular, avoid unioned match patterns. At an
application level, developers should take advantage of the
cacheability of the DOMs used by XSLTC as XML parsing can take as much
as 50% of the total processing time.
XSLTC is still alpha software, but the only outstanding features
needed for conformance with XSLT 1.0 are support for simplified
stylesheets (where the document element of the stylesheet is not
xsl:stylesheet), the namespace axis, and
id() and key() functions within match
patterns.
[1] [2] [3] Next