Make Your XML RDF-Friendly
by Bob DuCharme, John Cowan
October 30, 2002
Suppose you're designing an XML application or maybe just
writing a DTD or schema. You've followed various best practices about
element and attribute names, when to use elements versus attributes,
and other design issues, because you want your XML to be useful in the
widest variety of situations.
As RDF interest and application development grows, there's
an increasing payoff in keeping RDF concerns in mind along with the
other best practices as you design document types. Your documents
store information, and small tweaks to their structure can allow an
RDF processor to see that information as subject-predicate-object
triples, which it can make good use of. (For an introduction to RDF,
see Tim Bray's article What is RDF?)
Making your documents more "RDF-friendly" -- that is, more easily
digestible by RDF applications -- broadens the range of applications
that can use your documents, thereby increasing their value.
A lot of XML RDF documents look like they were designed purely for
RDF applications, but that's not always the case. The frequent
verbosity of RDF XML, which often intimidates RDF beginners, is a
by-product of the flexibility that makes RDF easy to incorporate into
your existing XML. By observing eight guidelines when designing a DTD
or schema, you can use this flexibility to help your documents work
with RDF applications as well as non-RDF applications. Some of the
guidelines are easy, while some involve making choices based on
trade-offs. But knowing what the issues are gives you a better
perspective on the best ways to model your data.
1. Make sure that every element comes from a specific
namespace.
This doesn't mean that all your elements need a namespace
prefix. For convenience, many documents declare the most frequently
used namespace as the default one so that elements from that namespace
need no prefix. For example, the article, body,
title, and para elements in the following belong to
the http://www.snee.com/ns/dummy namespace because the
article element's first xmlns attribute declares
that as the default namespace. None of those elements need a namespace
prefix, and an RDF processor will have no problem with them. (The RDF
namespace, http://www.w3.org/1999/02/22-rdf-syntax-ns#, must
obviously be declared if an RDF parser is going to find the RDF
elements and know what each is for.)
<article xmlns:dc="http://purl.org/dc/elements/1.1/" rdf:ID="a1003"
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns="http://www.snee.com/ns/dummy">
<rdf:RDF>
<rdf:Description rdf:about="#a1003">
<dc:creator>Herman Melville</dc:creator>
<dc:date>1851</dc:date>
</rdf:Description>
</rdf:RDF>
<body>
<title>Moby Dick</title>
<para>Call me Ishmael.</para>
<para>Just <emph>don't</emph> call me late for supper.</para>
</body>
</article>
2. Use rdf:ID attributes instead of ID attributes.
When you want an RDF processor to know a property of something in a
document -- for example, that the article element in the
example above has a dc:creator value of "Herman Melville" --
you need a way to identify the subject that has the property. XML DTDs
let you declare that a particular attribute is used as an ID value,
but RDF doesn't care about DTDs. The only way to be sure that an RDF
processor can find the thing you're referring to is to give it a
unique value in an rdf:ID attribute.
You're certainly not limited to using the rdf:ID value in
RDF applications. A unique ID value is a unique ID value, and useful
in all kinds of applications. In fact, if you declare this attribute
in a DTD as having a type of ID, you'll get the benefit of both RDF
applications and XML 1.0 applications treating rdf:ID as an
ID value that is unique within each document.
3. When describing a resource that has an existing URI, put the
URI in an rdf:about attribute.
While rdf:ID identifies a resource in your document, which
you can then describe with an RDF statement, rdf:about lets
you create an RDF statement about anything that can be
referenced with a URI, whether it's in your document or not. The name
of the element with the rdf:about attribute identifies the
type of the subject. For example, the following tells us this fact
"about" Bridget Fonda: that her father is Peter Fonda. The
rdf:about attribute's presence in an Entertainer
element tells us that Bridget Fonda is a resource of the type
"Entertainer."
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:imdb="http://us.imdb.com/Name?"
xmlns="http://www.cyc.com/2002/04/08/cyc.daml#"
xmlns:gc="http://www.daml.org/2001/01/gedcom/gedcom#">
<Entertainer rdf:about="http://us.imdb.com/Name?Fonda,%20Bridget">
<gc:father>
<Entertainer rdf:about="http://us.imdb.com/Name?Fonda,%20Peter"/>
</gc:father>
</Entertainer>
</rdf:RDF>
4. When referencing something by its URI, put the URI in an
rdf:resource attribute in an empty element.
In our first example, the creator of the article Moby Dick& -- or,
more correctly, the creator of the work identified as "#a1003" -- is
named with the string "Herman Melville." If, instead of a string, it
identified the author using a URI in an rdf:resource
attribute, the RDF assertion about who created resource a1003 would
have more value, because it could then link to other RDF statements
that use the same URI.
For example, no RDF statement that tells you that Herman Melville
was born in New York City would refer to the author using the string
"Herman Melville," because an RDF statement's subject must be a
URI. Instead, it might say that the subject
http://www.online-literature.com/melville/ has the property
bornIn with a value of "New York City." An inference engine
could look at that assertion and the following revision of the first
RDF statement from the first example above, put the two together, and
tell you that the creator of a1003 was born in New York City.
<rdf:Description rdf:about="#a1003">
<dc:creator rdf:resource="http://www.online-literature.com/melville"/>
</rdf:Description>
While this element with the rdf:resource attribute isn't
absolutely required to be empty, any content that it has must follow
certain rules, so it's simplest to make it an empty element whose
rdf:resource attribute names a URI value for the type named
by the element name -- in this case, dc:creator.
5. If existing ontologies cover any of your element names, use
those instead of making up your own URIs.
Most of the power of RDF comes from the network effect of combining
RDF triples that reference the same resources. If one set of triples
says something about a particular resource and another set says more
about the same resource, they can be combined, making it a more
valuable collection. For example, guideline 4 above described two RDF
statements that could be linked this way; one used the URI
http://www.online-literature.com/melville to represent Herman
Melville as the creator of article a1003, and the other used the same
URI to show where the author was born.
To be honest, http://www.online-literature.com/melville
was just the result of some brief web searching. The odds that two
different people creating RDF about Melville will both use this URI
are pretty small. It's not really an ontology name, but just a URL for
a brief biography of Melville at a literary dot-com.
But what is an ontology? In software development, as distinct from
its meaning in philosophy, it generally means a set of terms with
defined relationships. There are plenty of real ontologies out there,
but in a pinch, you can use a recognized URL for a well-known web page
that identifies your resource -- as we saw above, any URI is better
than a simple string.
The more well-known an ontology is, the more likely others are to
use it, and the more useful your RDF statements will be when combined
with those others. For example, the Dublin Core ontology used for the
dc:creator and dc:date elements in the "Moby Dick"
example is one of the most popular, widely-used ontologies.
The DAML Ontology
Library is a good place to start looking for ontologies. It's
where I found the GEDCOM and CYC ontologies used in the example about
the Fondas. The people who created the Internet Movie Database never
considered their work to be an ontology, but because it lets you refer
to specific actors with URIs, it passes the first test for use in RDF
statements.
[1] [2] Next