Putting RDF to Work
by Edd Dumbill
August 09, 2000
Over recent months, members of the www-rdf-interest mailing
list have been working at creating practical applications of RDF
technology. Notable among these efforts have been Dan Connolly's
work with using XSLT to generate RDF from web pages, and
R.V. Guha's lightweight RDF database project.
RDF has always had the appeal of a Grand Unification Theory of
the Internet, promising to create an information backbone into
which many diverse information sources can be connected. With
every source representing information in the same way, the
prospect is that structured queries over the whole Web become
possible.
That's the promise, anyway. The reality has been somewhat
more frustrating. RDF has been ostracized by many for a complex and
confusing syntax, which more often than not obscures the real
value of the platform. One also gets the feeling, RDF being
the inaugural application of namespaces, that there's a certain
contingent who will never forgive it for that!
As an RDF-advocate, I am dismayed when some emerging Web metadata
applications reject RDF -- the reason given usually being "it's too
hard." I tend to think that a rather weak reason, especially as many of
the same people are attempting deployment of XML schemas!
However, I can't dispute that the current RDF syntax isn't the best, and
as long as there is metadata on the Web that can be converted to RDF by
means of a simple transform, we retain our hope of a "semantic web" of
information.
Some of the most important factors in XML's success have been
the ready availability of tools (in particular, parsers) and
ubiquitous APIs (SAX, DOM). RDF has
not matched that level of support, with the
consequence that it has felt a lot more like a research project
than an immediately applicable technology. However, some folks
have been committing their time to developing tools for
manipulating RDF, and also moving toward standardized APIs.
One tool in particular, R.V. Guha's RDFDB, caught my
attention. It's an RDF database server, based on top of the Sleepycat Berkeley
Database. The source code is in C, but more importantly, it
supports interrogation via TCP/IP sockets, meaning integration is
possible with any programming language. For me, this is an
advantage over previous RDF libraries in Java and Perl, neither of
which are my platforms of choice.
Breaking Out of Hierarchies
After spending a little while with RDFDB, I began to see that
it offers what I'm looking for in an RDF store. Let me explain a
little further about my criteria. My personal RDF
dream centers around the integration of all my information. I
want to be able to traverse the relationships between
my surfing, e-mail, schedule, and document data. The hierarchy of
e-mail folders and the file system just doesn't reflect the way I
work.
That reality of this dawned on me as I found myself using the
Unix tools find, grep, and locate to connect and
cross-reference documents and e-mail. The way I was traversing my
data was task-centric. If I'm working on a particular topic, I
want to see all previous correspondence on that issue. If I
visit someone's web page, I might want to see all the mail that
person has sent me recently.
So began my dream of integrating all my metadata. Somewhere
there would be a large database into which my e-mail, web
browser, file system, and so on would enter metadata. I'd then be
able to, with relative ease, query the database to make
connections between data items on my computer. On top of that
database, graphical clients could be written to maintain and
annotate it, and hooks written back into the browser, file
manager, and e-mail client to allow the use of this extra
information.
RDFDB appears to be the first stage of this plan, a database
tuned for storing and querying descriptions of resources. (Note that there
are existing approaches to RDF storage
using a relational database, but RDFDB takes a specialist
approach to storing RDF data).
What RDFDB Offers
Although an early stage project, RDFDB offers enough
functionality to do useful work immediately. Guha has tried to
keep the interface similar to that of SQL, in order to make the
learning curve easier. With RDFDB, you can:
create database testdb </>
insert into testdb (editor http://xml.com/ http://edd.oreillynet.com/) </>
select ?p from testdb where (editor ?p http://edd.oreillynet.com/) </>
(Note the </> line terminator). Facilities also exist for loading entire modules of data in
from an RDF file, and for assigning prefixes to namespaces.
Getting Started
Both binaries for Linux and source code are available. You'll
need Sleepycat's Berkeley
DB3.1 installed in order to compile RDFDB. The
RDFDB server itself runs as a TCP/IP server, and just
sits there waiting for connections. You can use telnet as
a trivial command-line interface to the server -- this is one of
Guha's design goals, that RDFDB access should be as easy as HTTP
access.
Once a couple of environmental variables are set and the server
is running, it's easy to
start working with RDFDB, by inserting simple relationships into
the database and querying them. RDFDB also offers a facility to
perform batch import of RDF data, via the load ... file
construct.
A good source of example data can always be found in one's
mailbox.
To take the first step toward my dreams of
integration, we first need to invent a vocabulary for describing
the data. This is where the prototype nature of my project
becomes apparent: a well-designed vocabulary is probably 80 percent of
the work in an effort like this. Particularly when integration
with disparate sources is required, a common vocabulary is
essential, and using standards such as Dublin Core becomes a very good idea.
Here are the properties I settled on:
- realName: a person's name
- author: the author of a message
- subject: the subject of a message
- timestamp: the timestamp of a message
A basic use of these properties would be simply to scrape all
the names and addresses from my in-box in order to create an
address book. With a small bit of Python, I generated a document
looking something like this:
<?xml version='1.0'?>
<rdf:RDF xmlns:rdf='http://www.w3.org/1999/02/22-rdf-syntax-ns#'
xmlns:m='http://edd.oreillynet.com/mailbox/'>
<rdf:Description about='mailto:edd@usefulinc.com'>
<m:realName>Edd Dumbill</m:realName>
</rdf:Description>
<rdf:Description about='mailto:liora@the-word-electric.com'>
<m:realName>Liora Alschuler</m:realName>
</rdf:Description>
<rdf:Description about='mailto:lisarein@finetuning.com'>
<m:realName>Lisa Rein</m:realName>
</rdf:Description>
</rdf:RDF>
The file was placed in a suitable place on my machine and
imported into a database:
create database mailstore </>
load XML_RDF file http://localhost/addrs.rdf into mailstore </>
A few simple queries (user input in bold, lines broken for
convenience, queries should be written all on one line):
select ?x from mailstore where
(http://edd.oreillynet.com/mailbox/|realName ?x 'Lisa Rein') </>
?x = mailto:lisarein@finetuning.com
select ?x from mailstore where
(http://edd.oreillynet.com/mailbox/|realName
mailto:simonstl@simonstl.com ?x) </>
?x = Simon St.Laurent
select ?x from mailstore where
(?x mailto:edd@usefulinc.com 'Edd Dumbill') </>
?x = http://edd.oreillynet.com/mailbox/|realName
You can see that the property, subject, and object (the components
of an RDF description) can all be queried
by the database server. Although trivial in this example, querying
properties could have some very useful purposes, such as
determining the relationship between two people.
Writing out the
qualified names of the properties each time is a little
cumbersome, so RDFDB allows you to do this instead:
enter namespace xmlns:m http://edd.oreillynet.com/mailbox/ </>
select ?x from mailstore where
(m:realName mailto:simonstl@simonstl.com ?x) </>
Adding More
Now let's take things a little further and include some more
data from the mailbox. I wrote a simple
Python script to parse my mailbox and extract some message
data. In addition to the addressbook entries above, I also
include a description for each message:
<rdf:Description about='mid:3990A5D9.C13AACDE@finetuning.com'>
<m:subject>Re: Submitting an article to xml.com</m:subject>
<m:timestamp>Tue, 08 Aug 2000 17:29:13 -0700</m:timestamp>
<m:author rdf:resource='mailto:lisarein@finetuning.com' />
</rdf:Description>
Note that for the e-mail message identification itself, I'm
using the mid: URI scheme
(more on URI
schemes). Having imported the RDF again into my database,
I can now answer questions like "Which e-mail messages were written by
Simon St. Laurent?":
select ?a from mailstore where
(m:realName ?a 'Simon St.Laurent') </>
?a = mailto:simonstl@simonstl.com
select ?i from mailstore where
(m:author ?i mailto:simonstl@simonstl.com) </>
?i = mid:200008022049.QAA05551@hesketh.net
?i = mid:200008030436.AAA29373@hesketh.net
?i = mid:200008030444.AAA29581@hesketh.net
?i = mid:200008061638.MAA09311@hesketh.net
?i = mid:200008081347.JAA04357@hesketh.net
?i = mid:200008090020.UAA06745@hesketh.net
I tried a few more exotic queries, using conjunctions, but
RDFDB currently seems a little flaky in its processing of
these. Using this form the above query could be reduced to:
select ?i from mailstore where
(m:realName ?a 'Simon St.Laurent')
(m:author ?i ?a) </>
Where Next?
Resources
RDF at the W3C
RDF FAQ
RDF and
Metadata (XML.com)
XMLhack RDF
news
RDF Interest
Group
Semantic Web Road Map
RDFDB offers a great backbone -- storage and query
facilities -- for integrating diverse information sources. In its
early stages now, it's a project that deserves to get more
mindshare. The SQL-like syntax brings a familiarity to querying
that other, more Prolog-like, mechanisms don't.
Architecturally, I find the implementation of RDFDB as a database server
a great advantage. It immediately makes multiple data sources
and clients a reality, and makes cross-platform
implementation easy
(writing a language client to RDFDB is pretty trivial, I managed
a workable first cut in 10 lines of Perl).
RDF is slowly getting more use in the field, but it needs
more ubiquitous, easy-to-use technology and APIs to
be an obvious first-stop for metadata and resource
discovery applications. RDFDB can
make an important contribution in this area.