Table of Contents
The Search for Intelligent Life
Approaches and Applications
The Free-For-All in Montreal
The Next Round
GCA's
Extreme Markup Languages 2000, held in August in Montreal, was
billed as "not frisbee, not skateboarding, not lounging -- just
no-holds-barred tech talk," and so it was. It was the kind of
conference where the code and ideas presented in the sessions were
light years ahead of any commercial widgets that one can actually buy.
Extreme Markup offered the best mix of geekdom and academia, sort of
"geeks in tweed." One might be amused by attendees' quaint ways and
love of abstraction, if it were not the same folks who built,
promoted, and implemented SGML, XML, XSLT, and DOM, etc.
While it might seem a world away from business architectures to
ponder whether "The Descriptive/Procedural Distinction is Flawed" [1], keep in mind that
such digging has fueled the creation of markup language technology
thus far. If we are to move further toward a truly interoperable,
semantic Web, these tweedy techies are the "extremists" who will take
us there. While there were many areas explored at the conference --
including general n-tiered XML architectures, end-user friendly tools
for writers and designers, neat stuff to do with DTDs, groves, and the
DOM, and XML-izing Eiffel -- the preponderance of papers and
discussions zeroed in on one mission: the search for intelligence
and meaning in markup.
The Search for Intelligent Life
XML has to date achieved a degree of syntactic, but not semantic,
interoperability. On the Web, you still can't find what you need
easily, and when you do, you may not know what it means. Grammars,
that is, DTDs and schemas, don't supply meaning any more than the
Elements of Style can tell you the size, shape, and location of
a certain white whale. (The draft W3C schemas do include a type
system, a necessary but not sufficient component of "meaning." W3C
schemas figured remarkably little in discussion, although a
pre-conference tutorial was well attended.)
As Jeff Heflin and James Hendler put it, "To achieve semantic
interoperability, systems must be able to exchange data in such a way
that the precise meaning of the data is readily accessible and the
data itself can be translated by any system into a form that it
understands."[2] The
problem is that XML itself has, by design, no semantics. Or, as John
Heintz and W. Eliot Kimber said, DTDs constrain syntax, not data
models. They don't capture abstraction across models, they are simply
an implementation view of a higher abstraction. [3]
In fact, the structural definition which XML can supply isn't even
a universally adequate representation of the structure of text.[4] The ability to convey
meaning through current markup applications was shredded
systematically in "The Meaning and Interpretation of Markup" [5], which claimed that
not only is the ordered hierarchy of content objects (OHCO) not
sufficiently descriptive of structure, it is not an adequate rack on
which to hang semantic vestments.
So what is an implementor to do?
Approaches and Applications
The conference program was rich in reports of real-world,
large-scale implementations actively engaged in the search for
meaning, and they were not all focused on Topic Maps or RDF --
although these specs (ISO and W3C respectively) were the most
prevalent form of semantic representation addressed. One paper
described the mapping to abstract data structures in the Perseus
Project, where the semantic layer is used to manage a large digital
library [6]. Another
paper described an XML-based AI language for the web developed at the
University of Maryland. [2]
Kimber and Heintz described in detail their use of UML as the
semantic and structural constraint language for XML instances. Its
advantages are that the model serves double duty as an ideal method to
communicate with users and implementors and also by integrating the
document model into system design. UML is not only a great GUI for a
document model, the packaging mechanism allows modularization of
document types. There is also the advantage of availability of tools
and standard design methods.
Topic maps, ISO/IEC 13250 (see our
report on Topic Maps from XML Europe earlier this year) remain a
hot topic on the GCA conference circuit, with several papers
describing Topic Map implementations.
Topic Maps separate semantic structure from data, so that a single
map can apply to multiple resources, and multiple maps can be layered
over a single resource. Maps can be contextually scoped and can, but
need not, be associated with existing taxonomies through the 'facet'
facility.
A paper by Helka Folch and Benoit Habert [7] explained how textual
data analysis software operates on a corpus of 8 million words
including book extracts, press releases, meeting minutes, and
transcripts. The documents are first tagged down to the paragraph
level, then analyzed for topics that can then be used for navigation
by text mining software. The application creates classes of
information "in opposition to the rest, not in terms of an absolute
criteria." Hence, it can discover topics not previously
classified. They then use Topic Maps to semantically tag the documents
using labels chosen by humans from a "bag of words" supplied by the
analysis.
Nikita Ogievetsky described how to build a dynamic web site with
Topic Maps and XSLT, where the Topic Maps function as the site map[8]. In his
implementation, every topic becomes a page, TM occurrences supply
metadata, text, objects and images; occurrence role types determine
XSLT rendering for referenced resources. Topic names become titles and
links, and properties are used for natural language generation. The TM
associations build the site map with recursive references and subject
reuse. Such sites built on such Topic Maps can be merged.
Hans Holger Rath laid out a proposal for Topic Map templates, type
hierarchies, association properties, inference rules, and consistency
constraints that, in aggregate, create a schema for Topic Maps. [9] The proposal would
provide a further link to existing semantics and hooks for text
retrieval.
[1] [2] Next