
RDF, What's It Good For?
by Kendall Grant Clark
November 13, 2002
The Family Eccentric
RDF is like my eccentric old uncle. I don't know him as well as I'd
like, which is partly his fault, since his eccentricities can be
off-putting. Of course they're what make him so interesting and are
the reason I want to get to know him better in the first place. Yeah,
RDF is just like that.
The Resource Description Framework is still among the most
interesting of W3C technologies. But it's got persistent troubles,
including having had its reputation beaten up unfairly as a result of
the many and often nasty fights about RSS. But, just like my eccentric
old uncle, RDF is not entirely blameless. In a previous XML-Deviant
article ("Go Tell It
On the Mountain") I argued that RDF's trouble might have something
to do with it having been the victim of poor technical evangelism.
In some sense that's still true. Recently I googled for a
comprehensive, up-to-date RDF tutorial, which proved as elusive as
finding Uncle's dentures the morning after nickel beer night at the
bingo hall. In fact, I was hard pressed to find an RDF tutorial which
looked like it had been updated this year. And one which I did find
simply listed 13 different ways to express the same basic set of
assertions, which not only makes a terrible tutorial, but also
exemplifies another of RDF's persistent troubles: its XML
serialization.
During the time I've tracked RDF in the XML community, I can't
recall running across even one enthusiastic defender of RDF's XML
serialization. Apparently everyone, or so it seems, thinks it's a
nasty kludge at best. Now, I've been using RDF in some of my recent
Python programming, using Daniel Krech's excellent rdflib (which, as Andrew Kuchling
reminded
me, thanks to its new ZODB/ZEO storage layer, now does
fully-distributed storage.) One virtue of rdflib is that it shields
me, the carefree application hacker, from having to deal with RDF's
XML serialization. I never think about it or about its warts. I rarely
even see it. Which is perfect. As long as, when I send the
XML-serialized dump of my RDF triple store to someone else, they end
up with the same graph of assertions, I'm happy.
But everyone's needs are not as easy to satisfy, I suspect. For my
recent apps (a "knowledge base" of persistent news URLs used to
generate a static Web site; and a URL-annotating IRC bot) RDF is the
thing: the XML serialization is simply a way I can share the RDF
assertions with very little pain (though, perhaps, with more pain than
shipping n-triples around, though that's unclear and likely moot).
However, some applications require that RDF be
embedded in XML, often in an extant XML language. This is one
reason why in these pages two weeks ago John Cowan and Bob DuCharme,
in an article called "Make
Your XML RDF-Friendly", offered some tips for arranging your XML
so that it's more, rather than less likely for RDF processors to be
able to make sense of it.
RDF: Mundane Metadata and a Relational Model Alternative
I won't discuss Cowan and DuCharme's suggestions, but I will
review XML-DEV's reaction to them. Their article offered 8 ideas, which I paraphrase
thus:
- Every element should belong to a namespace
- Use rdf:ID, not ID attributes
- Put the URI of a described resource in a rdf:about attribute
- Put the URI of a referenced resource in an empty element's
rdf:resource attribute
- Use URIs from existing ontologies
- Take care with containers
- Avoid mixed content
- Check assertions with an RDF processor
As is often the case on XML-DEV, a post by Simon St.Laurent
initiated a wide-ranging and often helpful conversation -- this time
St.Laurent
suggested that Cowan and DuCharme's rendering of XML amenable
to RDF was, among other things, overly intrusive: "I can't imagine,"
St.Laurent confessed, "telling XML vocabulary developers to do those things
while keeping a straight face". He further suggested that an approach
that extracted RDF triples from XML by way of, presumably, XSLT
transformations might make more sense: "At some point it seems that it
makes a lot more sense for ordinary mortals to work in XML and let
geniuses write transformations if they want to reuse the information
in RDF processing. Creating markup in a straitjacket can be a lot of
fun, but only if you're genuinely fond of the straitjacket".
St.Laurent seemed especially critical of Cowan and DuCharme's
No. 7, about avoiding mixed content. As Simon
said, "'Eschew mixed content' seemed the most ridiculous (and
memorable) at the time, and I'd been having particular annoyances with
general failures to appreciate mixed content at that point...Looking
at the whole project in more detail and with examples, I find the
whole thing repulsive, at least when taken as an approach to creating
XML generally. On the other hand, as a human-readable syntax for RDF,
it's far better than anything else I've seen".
Bob DuCharme
responded
to St.Laurent's comments by pointing out that sometimes the
expressiveness of mixed content is outweighed by the pain of
processing it. DuCharme added that "[w]ith all the additional
constraints of RDF-conformant XML, it's even less expressive, and
often even easier to process, so it's well-suited to certain
applications". One of those applications is metadata, where RDF is
having considerable success, particularly among the library and
information science communities (understandable, since metadata was one of its first intended
uses.)
This fails to count as "real world
applications" only for those who are blinded by corporate IT, and only
insofar as they haven't had to implement a heterogeneous document
repository or knowledge management application.
"I still find it a little ironic," DuCharme said, "that while RDF
has gotten so much publicity as a technology for warm and fuzzy AI
pie-in-the-sky technology, it's gotten most of its traction in the
mundane world of metadata".
Adam Turoff's
reaction was the opposite of St.Laurent's, praising the
practicality of Cowan & DuCharme's suggestions, rather than
condemning their specifics: "I like this article because it...discuss
markup design issues for people who want to make their vocabularies
RDF-friendly". And, Turoff added, there is very little of that sort of
practical design advice around. "Instead, we are left with," he said,
"a hodgepodge of vocabularies where the primary design goal is, for
example, mimicking a particular database structure, not
vocabularies where the primary design goals are to be used as XML
files per se".
Stepping back from the presuppositions of Cowan and DuCharme's
article, Mike Champion cast doubt on RDF itself,
suggesting that he didn't see much interest in it, that resistance
to it during the RSS debates evinced the general lack of interest. "I
would really like to understand," Champion said, "what benefit one
might really get from using an RDF-friendly XML syntax...I'm not
hostile to RDF...just skeptical that it's worth a significant
investment of my time".
Danny Ayers tried to answer some of Champion's questions,
pointing
out that "RDF allows extensibility with minimal extra work". Ayers
offered several projects where RDF is being used, including dmoz.org,
the Mozilla browser, Adobe's RDF metadata initiative, the unfairly
maligned RSS 1.0, the Stanford TAP
project, MusicBrainz,
Mitch Kapor's
vaporware, open source PIM. I'll add two other projects, off the
top of my head: MIT's DSpace
digital repository and, since I mentioned him already, Andrew
Kuchling's Biographical
RDF.
Responding to both St.Laurent's claim about straitjackets and to
Champion's plea for a demonstration of RDF's utility, Eric van der
Vlist said
that lots of things -- like RDBMS and XML -- are straitjackets, that
every storage or representation technology has advantages and
disadvantages, including RDF. "RDF and its triples," van der Vlist
claimed, are "really lightweight when you have the right tools to
manipulate them. I like to think of them as a RDBMS with a variable
geometry: each 'row'...can have a variable number of columns..."
Also in XML-Deviant
The More Things Change
Agile XML
Composition
Apple Watch
Life After Ajax?
Van der Vlist makes nicely the point I made earlier
about Python's
rdflib. Being able to use RDF as a loose storage system,
without having to worry about outgrowing (or even fully specifying, in
advance) an RDBMS schema can be very helpful, in at least two
situations: because, first, you don't know what the data schema really
is yet, owing either to problem domain constraints or to an extended
prototype stage; and, second, because in some applications the storage
schema needs to stay very flexible and extensible for the lifetime of
the project. Or, as van der Vlist said, RDF is "like a RDBMS which you
could populate before having written any schema, that's really very
flexible..."
In next week's XML-Deviant column I'll continue to look at RDF, its
beauty marks and its warts, what it's good and bad at. In particular
I'll describe Tim Bray's proposal for a new, simplified XML
serialization of RDF graphs. Just like everyone's eccentric old uncle,
we may discover, once we get past all the blemishes and oddities, that
RDF has more going for it than it often seems.