When you need to store and display a modest amount of structured or
semistructured data, it's tempting to store it directly in an HTML
file. I've used this
strategy many times; undoubtedly you have too. The advantages and
disadvantages of working directly with a presentation format are pretty
clear. It's handy that the "database" is a self-contained package that
can be updated using any text editor, emailed, read directly from a file
system, or served by any web server. But it's awkward to share the work
of updating with other people or to isolate and edit parts of the file
as it grows. When we convert to a database-backed web application in
order to solve these problems, we trade away the convenience of the
file-oriented approach. Can we have our cake and eat it too? This
month's column explores the idea that a complete web application can be
wrapped around an XHTML document, using XSLT for search, insert, and
update functions.
I've been developing the idea in the context of the Zope application
server, so the first order of business was to come up with an XSLT wrapper
for Zope. Since Zope is written in Python, my first inclination was to
use a Python binding to an XSLT library. But which one? libxslt is a
popular choice, but on one particular FreeBSD system -- where I lack the
root privileges needed to install libxslt -- I use Sablotron instead. On
Windows, meanwhile, MSXML is the incumbent. So I settled on the more basic
strategy of wrapping a command-line XSLT processor -- such as libxslt's
xsltproc, Sablotron's sabcmd, or MSXML's
msxsl.exe -- in a Zope method. Because this method calls OS
functions to create and remove temporary files, it has to be deployed in
Zope as an External Method rather than a Python Script. To do that, I put
the code in a file called xslt.py, put the file in Zope's
Extensions directory, added an External Method called 'xslt' to a folder,
and used 'xslt' as both its module name and function name.
The method has three required arguments and an optional fourth, but
the first argument, self, is supplied automatically by Zope. It's the
folder from which the method is called and, through the magic of Zope
acquisition, it can be any folder below the one containing the External
Method. The second argument is the XSLT data which, as we'll see, is
produced by other scripts that interpolate values into templates. The
third argument is the name (in Zope lingo, the id) of the Zope File object
containing the XML data to be transformed. In this case, that File has a
html extension, contains XHTML, and is served with a
text/html content type. The optional fourth argument, update,
defaults to false, but when true causes the XSLT transformation to
overwrite the XML data.
To set the stage, let's suppose we're collecting and displaying data
about speakers at a conference. Here's the shell of our XHTML data:
And let's assume that we're dealing with multiple conferences, so the
Zope namespace looks like this:
/Conferences/OSCON
/Conferences/ETech
Our xslt External Method, installed in the /Conferences
folder, can be acquired by any subfolder, as can the other scripts we'll
use to add, find, and update speaker data. If the data are stored in a
file called speakers.html, there can be multiple instances of it -- for
example, /Conferences/OSCON/speakers.html and
/Conferences/ETech/speakers.html.
Now let's add a speaker to /Conferences/OSCON/speakers.html. This
script, called add, kicks off the process:
The add script is a Python Script, not an External Method,
which means that it's subject to security restrictions but is more
convenient to update. It's located in /Conferences, but when called as
/Conferences/OSCON/add it sets up a context that will cause
/Conferences/OSCON/speakers.html to be updated. The script simply
displays a form that collects the speaker's email address -- which will
serve as the key into our XHTML database -- and passes it (by way of
JavaScript) to another Python Script, insert:
In a Zope Python Script, all the interesting stuff hangs off the
context
variable. In this case, we'll use it to get to the HTTP request with the
caller's form data, to locate some convenience scripts that supply XSLT
boilerplate, and to locate our xslt External Method.
The XSLT script that's created is a filter for speakers.html. It
locates the <speakers> node in that file. If no <speaker> node with
the given email address exists, it inserts one. The XSLT identity
transform, i.e.:
passes the rest of the XML data through the filter unchanged. When the
insert script makes this call:
context.xslt(xsl, 'speakers.html', update=1)
the xslt external method receives an implicit first argument,
self,
which represents the context folder, in this case /Conferences/OSCON. It
uses that handle in three times:
self.findFileInFolder
Convert the name (e.g. speakers.html) to a ZODB object reference. The
findFileInFolder function is:
files = context.objectValues(['File'])
for i in range(len(files)):
if ( files[i].getId() == fname ):
return files[i]
return None
After the update, speakers.html might look like this:
<?xml version="1.0"?>
<body>
<style>
.speaker { margin-bottom: 10px }
.speakername { font-weight: bold }
.speakerTitle { font-style: italic }
</style>
<speakers>
<div class="speaker" email="dj.adams@pobox.com">
<div class="speakerName">
DJ Adams
</div>
<div class="speakerTitle">
SAP hacker
</div>
<div class="speakerBio">
<p>
DJ Adams is an old SAP hacker who still thinks JCL and S/370 assembler
is pretty cool. In recent years he's been successfully combining Open
Source software with R/3 to produce hybrid systems that show off the
power of free software.
</p>
<p>
He is the author of O'Reilly's <a
href="http://www.oreilly.com/catalog/jabber/"><i>Programming
Jabber</i></a>, contributes <a
href="http://www.oreillynet.com/pub/au/139">articles</a> to
O'ReillyNet's P2P site, and has to own up to being responsible for the
Jabber::Connection, Jabber::RPC and Jabber::Component::Proxy modules
on CPAN.
</p>
</div>
</div>
</speakers>
</body>
As new speaker nodes are added to the file, they push down the older
ones.
In this naive implementation, there's no effort to sort the nodes stored
in the XHTML file. But here's another script, find, that uses
XSLT to produce an HTML SELECT statement sorted by speakers' email
addresses. The selected item is fed to the select script for
updating.
As speakers are added and updated, the speakers.html file remains
immediately
viewable in the browser. The file can also be searched in a structured
way, using the technique I explored last
month. Here, for example, is a query that finds speakers whose
biographies contain 'JCL':
Is this really a practical way to manage a collection of
semistructured data? Frankly, I'm undecided. But it's an interesting
preview of how things will be when native XML storage, and node-level
update capability, are standard features of all databases. Meanwhile, the
ability to use Python to generate and run XSLT transformations, in a Zope
context, seems like a useful pattern.