
An Interview with Michael Kay
by Bob DuCharme
July 07, 2004
Michael Kay is the author of Wrox press's "XSLT Programmer's
Reference," the standard reference work on XSLT, and the editor of the
W3C's XSLT 2.0
specification, which is currently in Working Draft status. His
Java-based Saxon XSLT
processor is one of the most successful and popular XSLT processors in
the language's history. The branch of Saxon supporting XSLT 1.0 is
currently at release 6.5.3, and regular
readers of this column will know that the 7.x branch of Saxon has
been implementing more and more support for XSLT 2.0.
Michael has
recently upgraded the 7.x branch to version 8.0, which is split into
two versions: the free, open-source basic version known as Saxon-B and the
commercial, schema-aware version known as Saxon-SA.
Michael has also recently founded his own company, Saxonica, to develop and market
Saxon-SA. I discussed his new venture with him via email.
Bob DuCharme: As of today, are Saxon 7 and 8 still the only XSLT processors
with any XSLT 2.0 support?
Michael Kay: Essentially, yes. Oracle has a beta release with support for a
few XSLT 2.0 features, but it's very far from complete. A few other
people have said, either officially or unofficially, that they are
working on it, but they've not shown anything in public yet.
BD: Do you know of any use of Saxon 7 in production environments
yet, even though XSLT 2.0 is still a Working Draft?
MK: One of the oddities of the open-source world is that I don't
know very much about what my users are doing. I know that there are
around 250 downloads of Saxon a day, of which around half are the 7.x
version, but I have very little idea who is downloading it and what
they are doing with it (if anything). Most of the feedback I get is
either from a small group of experts who know the technology inside
out and are stretching its boundaries, or from beginners who don't
know where to start. There's a silent majority in between that I
never hear from.
I think you need to distinguish two kinds of production environments
for XSLT. There's the continuously running mission-critical web site,
and there's the publishing shop that does a lot of ad-hoc one-off
jobs. The impression I get is that a lot of people are using Saxon 7.x
extensively for the second kind of production workload, but that most
people with the first kind of environment are (quite rightly) sticking
to Saxon 6.x (and XSLT 1.0) for the time being.
BD: What does the Saxonica version of Saxon offer above and beyond
the features of the free version?
MK: For the moment, there is one difference: Saxon-SA, the commercial
version, is schema-aware. The XSLT 2.0 specification itself identifies
two conformance levels, a basic processor and a schema-aware
processor, and I'm aiming to align the open-source product with the
basic conformance level, and the commercial product with the
schema-aware level. I expect there will be a similar distinction in
XQuery as well, although the current working draft doesn't define
conformance levels.
Being schema-aware means that a stylesheet (or query) can declare
what type of input document it is designed to process, and what type
of output document it is designed to produce. The main result is that
you get better diagnostics when you get your code wrong.
Another
benefit, which you start to realize when you are dealing with the more
complex XML vocabularies, is that it becomes easier to write generic
(or reusable) code that can process different elements with the same
characteristics: as a very simple example, you can write a single
template rule to process all date-valued attributes in the same
way.
Part of the rationale for schema-aware XSLT and XQuery processing
is that it should be possible to do more powerful optimizations, and
therefore to get improved performance. For Saxon though, that's future
potential rather than a reality today.
BD: Because XSLT processors need an XML parser to read the
stylesheet and source document, versions of Saxon that support the
XSLT 1.0 Recommendation had the Ælfred XML parser included as the
default parser. To enable schema-aware XSLT processing, what parser
does Saxon-SA use?
MK: Saxon continues to work with any SAX2 parser. It doesn't rely
on the XML parser to do schema validation -- it does that itself.
A free bonus that comes with Saxon-SA is that it includes a brand-new schema processor. It's an unfortunate fact that the XML Schema
specification is extremely complex (and buggy). As a result there
aren't very many implementations, and they don't always give the same
answers in edge cases.
Many users have taken to validating documents
(and schemas) with more than one processor, to give added confidence
when the document is valid, or to get better diagnostics when it
isn't. I think that increasing the choice of schema processors that's
available is something the community will find valuable in itself.
BD: Are there specific XSLT and XPath functions just for use with
schema-related processing?
MK: When an XML document gets validated against a schema, the result isn't
just a pass or fail: every element and attribute gets labeled with the
schema-defined type that it validated against.
So you will have
elements and attributes labeled as strings, integers, or dates, or as
instances of user-defined types such as geographic coordinates, postal
addresses, or taxpayer reference numbers. In a schema-aware stylesheet
or query, you can write functions to process objects of a particular
type, just as you would in Java or C#: the schema becomes the type
system of the language. And you basically get the same benefits -- many
programming errors are picked up sooner, which gives you a faster
debugging turnaround, which means you can deliver working code more
quickly.
Also in Transforming XML
Automating Stylesheet Creation
Appreciating Libxslt
Push, Pull, Next!
Seeking Equality
The Path of Control
At the coding level, you can declare the argument types of your
variables, templates, and functions, and you can write path expressions
and match patterns that select nodes according to their schema type.
That means, for example, that you can select "all inline elements" in
an XHTML document, without having to list all the elements that are
classified as inline elements. Apart from anything else, that makes
your code more resilient to changes in the schema.
The other important feature is that you can ask for your
result documents to be validated against a schema. In Saxon, the
validation is done on the fly, so instead of getting an error message
at the end of the run that says the transformation or query was
successful but the output wasn't valid against the schema, you get a
failure as soon as you try to write an invalid element or attribute to
the output. The error message points straight to the offending place
in the stylesheet or query. I've been quite startled myself to see how
effectively this works.
BD: Where have you seen early interest in using XSLT and W3C
schemas together (for example, specific industries or development
communities)?
MK: I can't quantify the level of interest. But I've certainly heard
from quite a few individuals who are excited by the prospect. I don't
think that the community as a whole will really catch on to the
benefits, or discover what a different experience it is to write
schema-aware queries and transforms, until they actually try it out
and see for themselves.
BD: Will the schema-awareness help people who have been using the
free version of Saxon 7 for XQuery work?
MK: The schema-aware features work equally from XSLT or XQuery. At the
moment, I don't get the impression anyone is using XQuery in anger -- people are playing with it to learn about it, not to do real work. But
a lot of people coming to XML from the data side rather than the
document side see XQuery as the future, so there's an important
community to be served there.
BD: What are your plans for a Saxon-SA beta program?
MK: One of the challenges ahead is to see how much I can adapt the
things that work well in an open-source world to a more conventional, commercial software model. I've never much liked the concept of beta
releases. I work by producing new releases every two or three months,
each of which aims to be fit for production use, and if it falls short
of that then I follow it up with a maintenance release after two or
three weeks. So long as the W3C specs themselves are still moving,
users will want the product to keep moving too. Once the specs have
stabilized, I shall probably do what I did with 6.5.x, and freeze a
version for people who want stability.
BD: When do you foresee Saxon-SA 1.0 being ready?
MK: The code is finished, tested, documented, and sitting on the shelf
waiting to go out: I just have to sort out a few details of the
logistics and the commercial side (the bankers have to approve my
licensing terms, for example). With luck, you'll be able to get an
evaluation copy by the time this interview is published.
BD: Where should people go to find out more?
MK: Saxonica's home page, www.saxonica.com, has recently gone
live. And there will continue to be information about the open-source
product at http://saxon.sourceforge.net.
Information about subscribing to the saxon-announce mailing
list, where you can find out about new developments in the free and
commercial versions of Saxon, and the saxon-help list, where you can
address questions about your use of Saxon to Michael and other members
of the Saxon community, is at http://sourceforge.net/mail/?group_id=29872.