Bleeding-Edge XML: XLink and Apache
by Edd Dumbill
February 28, 2000
Tutorial sessions at XTech 2000 are a great place to find out about
the bleeding edge of XML development. To give you a taste of the current
work on XML specifications and technologies, we attended two sessions: XLink and Apache: XML Publication Techniques.
XLink represents the cutting edge of W3C specification development, while
Apache XML represents the cutting edge of XML web publishing tools. Both sessions
demonstrated incredible potential for the future application of XML.
XLink
Eve Maler, co-editor of the XLink spec, presented a tutorial on the XML
linking technology,
covering XLink and XPointer. She was able to report on changes and
decisions made just last week by the W3C working group. XLink has just
entered a period of being a "last call" working draft, so the next published spec
will be stable enough to base implementations on.
So what is the motivation for creating an XML linking language? Why
won't HTML links suffice? Maler presented four reasons why XLink is
required:
Hard-coded markup and behavior: In HTML, only certain elements
(the A and IMG elements and a few others) can
have linking behavior. If you want to create a link, it has to be a
refinement of these elements, which is unnecessarily restrictive.
Links are inseparable from the document: In HTML, you can't add
links to a document you don't own. This limits the capability for facilities
such as annotation.
Link target offers little granularity: Commonly the target of
an HTML link is an entire document. You may be able to link to an anchor
defined in a document, but only if the author has actually added it. There
is no facility for linking to an arbitrary part of a document.
Links are one way: HTML only supports linking outbound
from your document; there is no way you can create links inbound to
your document from external documents.
These deficiencies of HTML links provided the basic motivation for
XLink. Originally, XML linking was part of the XML 1.0 activity,
but got separated off. Since then it has become a three-part activity in its
own right, comprising XLink, XPointer, and XML Base. While XLink is a
vocabulary for expressing links, XPointer is an extension of the XPath
language (found in XSLT) that allows you to pinpoint a remote resource. XML Base offers facilities in XML similar to those of the HTML element
BASE.
XLink achieves its ends by specifying attributes that can turn any
element into a link. That is, you are free to use any element in
your XML DTD as a link by adding the XLink attributes to it. While this may
seem confusing to those used to HTML links, it allows maximum
flexibility.
Two kinds of link are possible: simple and extended. The
simple link offers basically the same kind of functionality available with
the HTML A and IMG tags. Here's an example of a
simple link:
<myLink xmlns:xlink="http://www.w3.org/1999/xlink"
xlink:type="simple"
xlink:href="myTarget.xml"
xlink:show="replace"
xlink:actuate="onRequest"
>click here for the next file</myLink>
This link would, when activated in the user agent, replace the current
document with the myTarget.xml document. Extended links are more syntactically complex, but offer the ability for
links with multiple endpoints. It is the XPointer specification that allows
the endpoint of a link to be at a finer granularity than document-level.
Participants in the XLink tutorial were clearly very excited about the
functionality offered, and Maler did a good job of presenting a lucid
picture of the XLink specification. While tool support is currently
limited, it is expected that the stabilization of the specification will
lead to many implementations. In particular, it is possible to experiment
with (and implement a
reasonable amount of) XLink interpretation with an XSLT style sheet.
For more information on XLink, see:
- XLink Requirements
Document
- XLink Specification
Apache: XML Publishing Techniques
An energetic Pierpaolo Fumagalli, a developer with the Apache Cocoon
project, gave an interesting overview of the challenges that arise when developing
web sites with XML. With two years' experience developing and documenting
the Cocoon server (among other projects), Fumagalli has developed his own unique methods and techniques.
Fumagalli shared some of the particular problems encountered with Cocoon 1.
Cocoon is a server for parsing, transforming, and styling
XML. Typically the output files are HTML, although PDF and graphic formats
can also be produced. Many of the lessons learnt by Fumigalli and colleagues
have applicability over the spectrum of XML processing and publication
applications.
The first version of Cocoon had two major difficulties that limited its
usefulness. Foremost among these was its use of the DOM (a parsed tree model
in memory) to pass processed
XML data
from stage to stage. This led to memory bloat and resultant
inefficiency. The second problem was inflexibility due to a one-to-one
association between the source XML and its transformation process. For
example, this meant it was impossible to render the same XML file as both
HTML and PDF.
The DOM problem is being countered by the use of two alternative techniques. The
first of these is the conversion to using SAX, a processing model that allows
streaming of events inside the server, rather than waiting for the parsing
of an XML document to complete. Unfortunately SAX raised problems for Cocoon
where XSLT transformations were required. So, in addition,
they are currently developing
the use of a special DOM that can be read from as it is being
built. Fumigalli demonstrated that careful construction of style sheets can
lead to better performance from the server with this method.
The second problem with the first version of Cocoon
was solved by the introduction of a mapping file,
which instructs Cocoon how to process and style the source XML in order to
produce the target HTML. The technology to do this is called "Stylebook," and
forms part of the Apache XML project.
Of perhaps more immediate use to those trying to build web sites with XML
today were Fumigalli's experiences with creating the Apache XML web site. He
has developed techniques to allow maximum flexibility in every stage: from
DTD design through to styling. By the application of two XSLT
transformations he is able to isolate DTD changes from style sheets changes.
Fumigalli has invented an intermediate DTD, which he calls "Graphic
Metalanguage," to sit between the original DTD and the output format (HTML
or PDF). One XSLT sheet is applied to transform the source XML into the
metalanguage, and another is then applied to transform it into the target format.
This means that if the original DTD changesnot an infrequent
occurrence in rapid development cyclesonly the transform to the
metalanguage needs to be altered, rather than the style sheet for every
desired output format.
Pierpaolo Fumigalli's presentation was full of the energy that
characterizes the Apache XML developers, and it was fascinating to hear the
account of problems encountered and how they were solved.
For more information, see:
- The Apache XML Project
- Cocoon