XML on the Move
by Edd Dumbill
February 21, 2001
On the first day of XML DevCon Europe in London, England, speakers
highlighted the growth of XML in its three years of existence. Henry
Thompson from the University of Edinburgh (and zealous editor of the
W3C's XML Schema specification) noted in his opening keynote that XML
had grown from one specification to a family of technologies. He
focused on the emerging centrality of the XML Infoset and XML
Schema. David Orchard of Jamcracker taught a session on web services,
XML, and UDDI. Despite XML's growth in the area of program-to-program
communication, there's still much to build.
Infoset Pipelining
In surveying the growth of XML specifications such as XPath and
DOM, Henry Thompson observed that they each had to supply what XML 1.0
was missing: a data model. But these separate data models were
different, hence the development of the XML Information Set
specification, which specifies a data model for XML documents and is
mean to provide a reference model for further XML specifications.
The Infoset is essentially a distillation of the vital parts of an
XML document after it's been parsed. According to Thompson, the
Infoset leaves out the "uninteresting" parts of a document, such as
whether attribute values use double or single quotes, the amount of
whitespace outside of elements, and whether empty elements are written
with one tag or two. Of course, Thompson noted, XML editing
applications needed to know these things but XML processing
applications don't.
Systems that operate on XML documents can be thought of as
processing pipelines for infosets. When a document is parsed, an
infoset is created, which may then be validated against a schema,
after which the infoset is augmented with type information. The
resulting infoset is called the "post schema validation infoset" or
PSVI. The infoset may then have an XSLT transform applied to it,
finally being serialized back to XML. In this world, the XML documents
we are all used to, angle brackets and all, become merely hosts for
the propagation of infosets.
Thompson emphasized the usefulness of pipelines as originally
implemented in Unix. Large systems can be composed from simple,
modular subsystems in this fashion. Whereas Unix pipelines are "thin",
passing only character streams, XML Infoset pipelines are "fat", as
they pass structured data from process to process.
Concluding his talk, Thompson encouraged us to think of XML
applications in terms of infosets and pipelines of infosets. He
stressed the importance of using XML Schemas to facilitate mapping
between data structures and XML documents, referring to efforts like
the Schema Adjunct Framework.
Web Services -- Not As Mature As You'd Think
David Orchard from Jamcracker gave a talk on the world of web
Services, XML, and UDDI. In a change from the usual breathless hype
about the future of web services, Orchard began by promising to
include some nay-saying as well as hype. Explaining that "web
services" was simply the "sexy" name for XML over HTTP, he outlined
the reasons why web services were taking off. Apart from the
well-known advantages of XML, Orchard outlined the benefits of HTTP: a
robust operational architecture; a simple, mature infrastructure;
reliability and scalability; standards-based; offering "good enough"
performance; and well-defined APIs for application programming
(e.g. Java servlets).
Orchard described the general features of web service protocols and
then moved on to UDDI, which is the new kid on the block since he last
gave this talk about a year ago in New York. UDDI (Universal Description, Discovery
and Integration) provides directory services for businesses offering
web services. UDDI is unusual in the current environment of XML
specification development: It's managed by a closed, independent
consortium. The intention is ultimately to give the UDDI
specifications to a standards body, but Orchard suggested that the
reason it is closed to date is that their work is considered too
changeable or unstable to invite public participation.
He also raised doubts about the current maturity of UDDI, and its
ability to perform in real-world deployment situations. He highlighted
its lack of distributed queries and a generic query syntax; the
uncertainty about whether replication would scale; the lack of support
for versioning; and, in particular, the weakness of the descriptive
power of UDDI's tModel metadata structure. In summary, Orchard said
that UDDI was an "interesting experiment" and had about a "50% chance
of success."
Though a confirmed supporter of web services, Orchard also
presented an honest view of the current web services world. He
concluded by identifying some areas that need improvement.
- Server programming model: whereas more
established server technologies have defined server-side programming
models (for instance, the Java Servlet interface), web services
currently lack a standard API for binding language implementations
to service requests. This interface also needs to solve problems
such as storing session information or specifying whether the XML
will be passed as DOM or a character stream.
- Request in one document: Orchard said that
mixing content and header/routing information in one document is
problematic. When a header is mixed into the document, it makes it
difficult to mix in custom content -- for instance, the content DTD
might preclude the presence of header elements. Although he noted
the work on SOAP attachments, Orchard commented that this needs
further attention.
- XML not intended for B2B: noting that XML was
originally designed with "write once, view anywhere" in mind,
Orchard complained that insufficient attention has been given to the
use of XML on the server side. Only now, with the XML Protocols
Activity, is the W3C starting to give attention to it. One comment
he made was that XML needs something similar to Sun's J2EE label, a
designation of a "Unified XML" which denotes support for a certain
set of XML specifications. To say only that a product is XML-enable
is largely meaningless.
- Security: Little work has yet been done on
security with technologies like SOAP. One problem in this space is
how to pass a message body securely while retaining routing
information. SOAP through port 80 also makes the firewall
administrator's life more difficult. Orchard observed however that
this seems to be inevitable.
- Missing functionality: Finally, Orchard
observed that there were many features from more mature
infrastructures still missing. From the world of distributed
objects he identified type-safety, discovery, versioning, service
metadata, and object activation. From message-oriented-middleware he
noted security, transactions, guaranteed delivery and asynchronous
operation. Work is underway on some of these but isn't yet at the
deployable stage.
The take-home message was that web services was indeed an exciting
area of development, but that if you wanted to deploy a significant
web service right now, there is quite a lot of infrastructure work
you'll need to do for yourself.