XML-related Activities at the W3C
by C.M. Sperberg-McQueen
January 03, 2001
Introduction
Anyone contemplating the intensity of XML-related work in the World
Wide Web Consortium (W3C) nowadays might be forgiven for finding it
hard to believe that XML 1.0 became a W3C Recommendation less than two
years ago. W3C Notes, Submissions from Member organizations, and draft
specifications in varying stages of completion define applications of
XML, rules for using XML in particular contexts, extensions to XML,
languages for processing XML documents, languages for declaring
XML-based languages, languages for querying collections of XML
documents -- all in ever-greater speed and profusion.
This article provides a brief survey of recent work and current
efforts on XML-related topics at W3C, as a sort of abbreviated annual
report to the XML community of what's going on.
The original goal of the XML Activity in W3C was, from one point of
view, merely to add a little more structure to documents on the Web
and provide a little more flexibility than HTML, or any single tag
set, could provide. From another point of view, the goal was to make
SGML (the Standard Generalized Markup Language defined by ISO) usable
on the Web by making a lighter-weight version of SGML (defined in 25
pages or less). This lightweight SGML would be easier for software
developers to implement and to embed in software whose main task was
something other than being an SGML processor. To achieve these goals,
the XML Working Group planned to define three languages: XML itself, a
linking language called XLink, and a stylesheet language called XSL,
roughly analogous to the ISO standards SGML, HyTime, and DSSSL.
The most important thing that has happened is that with the wide
adoption of XML the set of goals has expanded. XML is used not only
for natural-language documents, but for all kinds of information.
Database owners (and vendors) want to be able to expose arbitrary
relational and object-relational databases on the Web in XML form.
Application developers want to use XML for all kinds of information
interchange. Both would like better methods of defining specific XML
languages for use in applications. For all sorts of reasons it would
be useful to run queries against collections of XML-encoded
information. If XML is going to be used in electronic commerce or
similar applications, digital signatures are essential; to make the
signatures more robust, some explicit method of transforming an XML
document into a canonical form is desirable. And so on. Every new
application of XML
leads to new requirements for standardization.
So it's not surprising that XML-related work in the W3C now
occupies not merely one Working Group but several. It's even hard to
count the W3C Working Groups involved in XML. Any W3C specification
that defines a data format or tag set is likely, nowadays, to be using
XML. The most visible example to users is the reformulation of HTML as
an application of XML, which began with the release of the XHTML 1.0 specification as a
W3C Recommendation in January 2000. That reformulation continues with
work on a more modular formulation of (X)HTML, parts of which are now
complete (XHTML Basic
is now a Recommendation) or nearing completion (a spec on Modularization of
XHTML is now a Candidate Recommendation).
Other W3C specs that apply XML to specific application areas
include RDF (the Resource Description Framework), SMIL (the Synchronized
Multimedia Integration Language), SVG (Scalable Vector Graphics),
and P3P (the Platform for
Privacy Preferences).
Working Group Reports
W3C Process
A quick word
about W3C specifications: they begin as Working Drafts, and then are
published for public review as Last Call Working Drafts (think of the
"last call" shortly before the pub closes). When last-call issues have
been resolved, the spec is published as a Candidate Recommendation,
and software developers are invited to implement it. After
implementation experience has been gathered, the spec becomes a
Proposed Recommendation, and, if all goes well, after a period of
review by the membership, a W3C Recommendation. At any point, specs
may go back to an earlier stage of the process for further
modification and review.
Further information:
W3C Process
W3C and the Web Community
The basic framework within which XML applications can be built is
the responsibility of several Working Groups, most but not all of them
in the W3C's XML Activity (the W3C organizes technical work into
Activities, each of which may involve one or more Working Groups
developing specifications). The XML Activity is organizationally the
heir of the original single XML Working Group; it now comprises the
XML Core, XML Linking, XML Schema, and XML Query Working Groups.
Other Working Groups crucially involved with XML are the DOM (Document
Object Model), the XSL (Extensible Stylesheet Language), and the XML
Protocols Working Groups.
The XML Core Working Group published a Second Edition of the XML
1.0 spec this year, edited by Eve Maler of Sun Microsystems, and every
reader will be grateful for its clarifications and corrections. The
Core WG also released working drafts of XML Inclusions (XInclude) and
the XML information set ("infoset"). The first of these provides a way
of using XML element markup to embed objects or portions of a document
into a larger context. The second fills a gap left in the original
XML 1.0 specification by its failure to specify exactly what
information an XML processor is responsible for passing to a
downstream application. The XML Information Set specification provides
a concrete inventory of so-called information items and their
properties in an XML document, thus creating a somewhat cleaner formal
description of what counts as information in an XML document and what
doesn't. A possible future revision of the XML Namespaces
Recommendation and a document proposing a classification scheme for
describing XML processors are currently on a Core Working Group back
burner.
The XML Linking Working Group recently achieved a hat trick by
releasing all three of their specifications as Candidate
Recommendations at the same time (and two of them, XLink and XML Base,
moved to Proposed Recommendation status on 22 December 2000). The XML
Linking Language (XLink) was one of the deliverables of the original
XML Working Group; it defines standard ways of linking among resources
which go well beyond the simple in-place two-ended unidirectional
links familiar from HTML. It allows the expression of more complex
links with arbitrary numbers of link ends and arbitrary locations.
These facilities have been part of working hypertext systems since the
1960s; now they can be part of the Web. The XML Base generalizes the
HTML BASE facility to make it possible to specify a base URI for
interpreting relative URIs in a language. The XML Pointer language
(XPointer) defines a powerful notation for use in linking to XML
documents; it's an extension of the XPath notation familiar from XSLT.
The XML Schema Working Group is currently dealing with comments
received on the Candidate Recommendation draft of the XML Schema
language. XML Schema is a metalanguage for defining XML tag sets and
applications. It provides functionality similar to that of XML 1.0
document type definitions (DTDs). It adds the ability to assign
datatypes (e.g. integer, calendar date) to elements and attributes,
explicit support for namespaces, and more powerful content models than
DTDs. XML Schema also uses XML, rather than an ad hoc notation, for
declarations; this means XML Schema documents, unlike DTDs, can be
processed by normal XML software instead of requiring ad hoc
tools.
The XML Query Working Group is nearing the end of its systematic
assault on the problem of defining a query language for XML documents.
Having published a requirements document, specified a formal data
model, and then formulated a query algebra on top of the data model,
the Working Group will next turn its attention to syntax design for
the actual query language.
The requirements and data model have been out for some
time; the query algebra was published at the end of 2000. Public
comment is invited.
The XSL Working Group is responsible for defining a stylesheet
language for XML documents. The first major part of their work became
available as a Recommendation about a year ago in the form of the XSL
Transformations (XSLT) language, which has rapidly become the
preferred means of transforming XML
documents. The second part, specifying a library of XSL formatting
objects, was published as a Candidate Recommendation in November;
the comment period runs through February 2001. XSL formatting objects
are compatible with those of Cascading Style Sheets but provide a
richer set of typographic semantics.
The Document Object Model (DOM) Working Group also passed a major
milestone this year, issuing five Recommendations which together
define Level 2 of the DOM for XML documents. The level-two Core
defines a set of interfaces for creating and
manipulating the contents of a document; level-two Views allow
software to dynamically manipulate the representation of the document.
Other specifications define an event system, access to stylesheets,
and ways to define and traverse ranges of content in a document.
The newest XML-related Working Group in the W3C is the XML
Protocols Working Group, chartered to create a simple foundation for
program-to-program communication using XML. They are currently
engaged in developing a requirements document and in surveying the
existing work in the field: SOAP, XML-RPC, WDDX, XMI, Jabber, ebXML,
and others.
This quick survey of XML-related work at the World Wide Web
Consortium has scarcely done more than list the various kinds of work
going on. For more information, as always, consult the W3C Web site, the W3C XML home page and in particular
the W3C Technical Reports page. W3C
specifications are published early and often for public review. By
participating actively and commenting on drafts, you can have an
influence on the future of XML, and help W3C lead the Web to its full
potential.