
The Social Life of XML
by Jon Udell
December 23, 2003

Jon Udell, Tim Bray, Norbert Mikula, Peter Chen, David Orchard,
Bob Sutor
I recently found a picture of the panelists at the XML DevCon 2001
session entitled "The Importance of XML." My body language told the
story: I wasn't a happy camper. Of course I agreed with all the reasons
the panel thought XML was important: for web services, for interprocess
communication, and for business process automation. But I also thought
XML was important for a whole different set of reasons that weren't on
the conference's agenda. I thought XML was important for end-user
applications, for human communication, and for personal productivity. I
believed then, and I believe more strongly today, that it's a bad idea
to separate those two ways of using XML.
When you get right down to it, what's really so special about web
services? Is it distributed computing? Is it serialization and transfer
of complex data? We've been there and done those things, though it's
true that that we didn't use to do them using cheap and ubiquitous XML
technologies. So, is service-oriented architecture the real
game-changer? Clearly a lot of us think so, and maybe we're right. But
I want to focus on something much more basic.
The really important thing, it seems to me, is the way the XML document
can become a shared construct, a tangible thing that processes and
people can pass around and interact with. On the one hand, an XML
document is the payload of a SOAP message that gets routed around on
the Web services network -- a payload that represents, for example, a
purchase order. On the other hand, an XML document is the form that
somebody uses to submit, or approve, or audit that purchase order. Now,
all of a sudden, these two documents are not only made of the same XML
stuff, they can literally be the same XML document.
When Tim Bray talks about the tribal history of XML, he says the
current focus on XML data wasn't foreseen by what he calls
"publishing-technology geeks" who thought they were building what he
calls the "smart-document format of the future." Maybe not, but I've
never been able to make much of a useful distinction between documents
and databases. For me every document is a database, and every database
is an assembly of documents. The "publishing-technology geeks" and the
"Database Divas" that Tim writes about may cling to their tribal
allegiances for a while longer, but the interbreeding experiment is
already a success. I can query any XML document, including the
slideshow that accompanies this talk, as if it were a database. And I
can absorb XML documents into relational databases in increasingly
granular and flexible ways. We're heading toward an extraordinary
convergence of documents and databases. But I'm not sure we're always
as clear as we could be about why this convergence is happening, or
what opportunities it presents. I don't think the fact that XML has its
roots in publishing is an accident -- or if it was, then it was a happy
accident.
Let's imagine a purchase order flowing through a web services pipeline,
sometime in the near future. It's an XML document, perhaps created with
a tool such as InfoPath. The document carries core data elements -- an
item number, a department code. But it also carries contextual metadata
-- for example, a threaded discussion involving the requester, the
reviewer, and the approver. This context is the key to understanding
how the data got there and what it means.
Let's suppose Kathy, the department administrator, reminds Frank, the
CIO, that Paul, the marketing guy, is way overdue for a PC upgrade.
Frank pushes back: the budget is tight and something's got to give. So
Paul negotiates a deal: he'll give up the DVD burner if he still have
the flatscreen he asked for. But since Paul is in marketing, and he
does sometimes have to burn DVDs, Frank tacks a DVD burner on to the
upgrade order for Marcia, who's also in marketing. But the deal is that
Marcia will have to share that DVD burner with Paul.
Today this contextual narrative is mostly scattered across a bunch of
different email inboxes. It never finds its way into the operational
database, although it would be great if it did. That way, the next CIO
might have a shot at sorting out the environment that she inherits from
Frank. But there's more than archaeology at stake here. Documents,
including the purchase order and the messages related to it, aren't
just passive carriers of information. They're the warp on which we
weave a socially constructed reality. Somehow, we need to find ways to
connect that reality to the workflow and process orchestration systems
now being invented.
When I read the specs that define how these systems will work, I'm
struck most of all by their treatment of exceptions. Here's how the
BPEL 1.1 spec puts it: "The ability to specify exceptional conditions
and their consequences," it says, "is at least as important for
business protocols as the ability to define the behavior in the 'all
goes well' case." I agree. But when I read these computer-sciency
descriptions of compensation scopes and upward-chaining exception
handlers, I worry that the we've left something important out of the
picture. In our example, the exception was thrown by Frank, who
asserted a veto for budgetary reasons. And it was handled by Paul, who
agreed to a negotiated compromise that enabled the transaction to go
forward.
This kind of scenario isn't an exception, if you'll pardon the pun.
It's the rule. Everyone has an agenda; every transaction is a
negotiation; and every outcome is a compromise. But the documents that
help us to articulate agendas, conduct negotiations, and assess
compromises don't exploit the contextual power of XML, and they aren't
being woven into the web services fabric. I think that's a problem. I
also think we can solve it without inventing huge amounts of new
technology. Common sense, basic tools, and some elbow grease can take
us a long way.
Of the various Microsoft slogans that have come and gone over the
years, two in particular have stuck with me. The first, from 1990, was
"information at your fingertips." In his Comdex speech that year, Bill
Gates laid out a vision that's still, frankly, pretty elusive. It
wasn't just about finding the information we're looking for, though
that did require a leap of imagination back before Internet search came
along and made it look easy. The premise of "information at your
fingertips" was also that we would empower knowledge workers to
interact with that information. These folks -- who we're now supposed
to call information workers, by the way, because knowledge
evidently sounds too elitist -- any these folks aren't just passive
consumers of information, they're active creators of it. They need
tools to produce, combine, transform, analyze, and exchange lots of
different kinds of data, without tripping over differences in the
underlying formats or editing tools.
The solution proposed at the time was compound documents with embedded
active parts. Microsoft called this OLE; Apple, IBM, Novell, and Sun
called it OpenDoc. You don't hear much about OLE and OpenDoc any more,
and that's a shame because the problems they were meant to solve are
still very much with us. I'm glad to see that WSRP (Web Services for
Remote Portlets) is now tackling the problem from a web services
perspective. It's a really good idea to work out how markup fragments
-- and the machinery for interacting with those fragments -- can be
packaged up for use on the web services network.
Back in the last century, of course, the assumption was that
applications like Word and Excel were still going to control the data,
and retain their own proprietary ways of representing it. The OLE
interfaces would wake up chunks of that proprietary data for editing,
and then tuck them them to bed again in a binary
file-system-within-a-filesystem. This wasn't exactly a recipe for
free-flowing data integration, but it sold some big fat programming
books.
A decade after the 1990 Comdex speech, the .NET platform was rolled out
with much celebration of XML as a universal data store, and with a new
slogan -- "the universal canvas" -- that I absolutely love. It's an
idea that makes intuitive sense to everyone. Science fiction writers
have always imagined what this would be like. The best demonstration of
the concept I've seen is a 1987 concept video produced by Apple, called
Knowledge Navigator. When I mentioned it on my weblog last month and
posted a link to the video, it attracted a huge amount of interest. We
all have a deep conviction that networked computers are supposed to
help us create and inhabit shared collaborative spaces where we can
fluidly manage relationships, create and reuse information, and conduct
business transactions.
Those transactions are governed by business protocols that we're
working hard to formalize and automate. I don't want to trivialize the
effort that's going to require. It's a deep problem and there's a lot
we still don't know. Take, for example, the question of schemas. Some
really smart people, including Jon Bosak, think we'll need a Universal
Business Language to connect business protocols across different
vertical-industry domains. Some other really smart people, including
Jean Paoli, are tackling the problem from the bottom up, on the
assumption that schemas need to emerge from specific practices before
they can be codified in the large. I'm sure there's no simple answer,
and I expect that both approaches will usefully coexist. But no matter
how this plays out, the schemas and protocols are just the skeletal
outlines of business processes. The flesh on the bones is the context
that we create as we participate in these processes.
Weblogs are arguably the best examples we have of XML connecting people
to other people in information-rich contexts. But while the glue that
holds the weblog universe together is an XML application called RSS,
it's really only a thin wrapper of metadata around otherwise opaque
content. The RSS payload typically isn't XML, it's escaped HTML -- a
practice that Norm Walsh calls an
abomination. I think Norm is right to say that. So my own RSS
payload, like a few others out there, includes namespaced XHTML. But
the gymnastics I that have to perform, in order to create that payload,
are another kind of abomination.
We've waited a long time for XML-aware authoring tools that fit easily
and naturally into the flow of the Web. Although this was the year in
which Microsoft shipped an XML-aware version of Office, the sad truth
is that it was still easier for me to create my presentation in Emacs,
rather than in PowerPoint, or Word, or InfoPath.
Having said that, InfoPath, in particular, does get a number of things
very right. It enables a relatively non-technical person to invent a
schema, create a form that captures information that's valid with
respect to that schema, and distribute the form to completely
non-technical people who can fill it with data. What's more, the form,
or document -- it's hard to know just what to call it -- has exactly
the dual nature I've been talking about. Its information payload can be
detached from a web services pipeline, edited offline by Kathy, emailed
to Frank, edited offline by Frank, and injected back into the web
services pipeline using email, or an HTTP postback, or a WSDL call.
Since InfoPath only runs on Windows, and isn't part of the basic Office
2003 kit, it's not on a path to ubiquity. But it's based on the same
standard technologies that I can use in Mozilla Firebird on Mac OS X:
XML, XPath, XSLT, CSS, JavaScript. I'm not suggesting that the browser
is the right hammer for every nail. Rather, it's one way to package a
set of standard XML-aware components. I'd love to see, among other
things, an InfoPath-like application built on the Mozilla platform.
The email client is another way of packaging those components. And
unless spam completely kills it, email is going to keep on being a
primary lubricant of our business processes. Email is where most of our
contextual information is created and exchanged, but where none of
XML's contextual power is brought to bear. Here, by the way, Microsoft
completely dropped the ball. The only Office 2003 application in which
users can't create and use XML content is Outlook. But that's precisely
where the need is greatest. Every day we ask questions about who said
what, to whom, in reference to what, in response to whom. Because none
of our routine written communication is well-formed, we fall back on
decades-old search and navigation strategies in order to find things.
And what we find is typically a mess. It's amazing to watch a
highly-paid professional spending billable time trying to untangle what
we like call an "email thread," but what's really just a patchwork
quilt of mangled fragments with no discernible order, structure, or
attribution.
The problem with routine and casual use of well-formed content, of
course, is that the XML parser is designed to keep the barbarians at
bay. If the parser smells even a whiff of trouble, it slams the gate
shut. As well it should. We wouldn't be having a web services
revolution, right now, if we encouraged the kind sloppiness that's
rampant on the Web. But we do need to find ways to make it easier for
the barbarians to become respectable citizens. We have these liberal
parsers that browser developers have spent untold effort creating,
parsers that can slurp up the worst kind of tag soup that comes pouring
out of HTML composers, or is written by hand. Maybe we can get more
mileage of them.
It's easy to just dismiss the barbarians, but there are an awful lot of
them. They're creating and sharing tons of content that isn't
well-formed, but in many cases we could squint and pretend that it is,
just as browsers do. If we did that, we might be able to make the
information they create and exchange more useful to them, as they
performt the business scenarios we script for them. And we might also
be able to make the information more useful to us, as we try to manage
and debug those scenarios.
I think that the combination of XHTML, CSS, and XPath adds up to a
fruitful opportunity, even at this late date. Back in 2001, at that
other convention I mentioned, somebody asked Tim Bray when XML would
replace HTML on the Web. Here was his answer:
Nobody thought for a microsecond that HTML would be replaced, and I
don't think HTML will be replaced in my lifetime. It is the single most
successful document format ever devised for delivering information
electronically to humans. The population of computer users has voted
for it overwhelmingly. I like it, I use it, I can't see why you'd want
to stop using it.
I completely agree. And since we are going to keep on using HTML, it
behooves us to find smarter and better ways to use it. XHTML is one of
those smarter and better ways.
CSS is another. It strikes me as a really interesting opportunity to
smuggle metadata into documents. People who don't know or care about
metadata will nevertheless spend a lot of time fiddling with styles
because they care a lot about how their documents look. A friend of
mine, who's a teacher, told me that it takes her much longer to make
presentations now, in PowerPoint, than it used to when she wrote them
by hand on overhead-projector transparencies. There's a powerful human
urge to achieve the right style. So let's exploit that. Let's promote
packages of style tags that people will use just because they want to
look cool. That's the immediate payoff. They don't need to know that
those style tags are also hooks that make it easier to search for and
manipulate content. Then, let's give them XPath-enhanced document
viewers that do useful things with those hooks -- that cut down on the
hassle and frustration of finding and reusing stuff. There's nothing
earth-shattering here. It's just a modest proposal that aims to make
better use of the tools and technologies already in place. Given the
amount of hassle and frustration that's experienced by everyone on a
daily basis, though, it's the kind of thing that could add up to a big
payoff.
It's also time to get serious about using XML to capture and represent
real-world context. The XML and web services communities are doing a
good job of reducing friction at the interface between processes and
data. I'm pretty sure we can solve that eventually because it's the
kind of problem that we, as technologists, are good at solving. We like
to think about protocols and formats.
I'm not sure we'll do such a good job of reducing friction at the
interface between people and data-driven processeses. Success there
will require serious attention to how people connect with one another,
and with data, in information-dense, event-driven, networked
environments. That means thinking about "human factors" and the "user
experience" -- a couple of awkward phrases for things that we, as
technologists, are not very good at dealing with. We don't like to
think about habits, or agendas, or ways of thinking, or modes of
communicating.
More from Jon Udell
The Beauty of REST
Lightweight XML Search Servers, Part 2
Lightweight XML Search Servers
Interactive Microcontent
Language Instincts
Fortunately, there's all that publishing DNA floating around in the XML
community's gene pool. We've only got a few decades of experience with
networked information systems. But we've got a few millenia of
experience with documents. Let's use that to our advantage as we build
out service-oriented architectures in which documents are both payloads
and user interfaces. From a publishing perspective, we know a lot about
how to build documents that capture and hold attention, establish
historical and current contexts, and tell stories that help people
understand themselves in relation to those contexts. We need to draw on
all that publishing knowledge as work out how to connect people to
data-driven processes.
Here's another idea. The emerging web services network is radically
open -- not only because the messages exchanged on that network are
XML, but also because the services are connected using pipelines. We
can inject intermediaries into those pipelines; the intermediaries can
observe and act on the messages. So we can acquire a lot of useful
context, and can implement useful policy, by reading and writing what
goes by on the wire. Things don't tend work the same way on the
desktop, but maybe they could. Our personal productivity tools are in a
position to learn a lot about how we interact with remote services,
communicate with other people, and manage our data. And they're in a
position to help us do those things more effectively. But the messages
and events flowing on our local machines have nothing in common with
the messages and events flowing in the cloud.
For a long time I've thought that if we could bring these two worlds
closer together, we could achieve powerful synergies. The idea got a
boost recently when Microsoft revealed its plans for Indigo, the
communication subsystem in Longhorn. Indigo aims, among things, to make
XML web services efficient for use across -- or maybe even within --
local applications. I invite you to think about what that could mean,
not only for Longhorn but for all platforms, and not only in three
years but also right now.