Using the Jena API to Process RDF
by Joe Verzulli
May 23, 2001
There has been growing interest in the Resource Description
Framework (RDF) and a number of tools and libraries have been
developed for processing it. This article describes one such library,
Jena, a Java API for processing RDF. It is also the name of an open
source implementation of the API.
What is RDF?
XML is very flexible and allows information to be encoded in many
different ways. If meaningful tag names are used it is relatively easy
for a person to determine the intended interpretation of an XML
string. However, it is difficult for programs to determine the
intended interpretation since programs don't understand English tag
names. DTDs and XML Schemas don't really help in this regard. They
just allow a program to verify that XML strings conform to some set of
rules.
RDF (RDFMS, Bray,
Ogbuji, SWARDF) is a model
and XML syntax for representing information in a way that allows
programs to understand the intended meaning. It's built on the concept
of a statement, a triple of the form {predicate, subject, object}. The
interpretation of a triple is that <subject> has a property
<predicate> whose value is <object>. Examples of
statements are {numberOfHits, http://www.foo.com/index.html,
3000} and {title, http://bookstore.com/book12, "The
Connoisseur's Guide to the Mind"}. In RDF a <subject> is always
a resource named by a URI with an optional anchor id. The
<predicate> is a property of the resource, and the
<object> is the value of the property for the resource.
Consider the following triples (where the dc prefix is for the Dublin
Core).
{dc:Publisher, http://www.w3.org, "World Wide Web Consortium"}
{dc:Title, http://www.w3.org, "W3C Home Page"}
These triples can be represented graphically as follows.

In this graph the arcs are labeled with predicates. Each arc originates at
a node representing a subject and terminates at a node representing an
object. The triples and the graph are two different representations of the
same RDF data model.
There is also an XML representation of the model. RDF requires that
different kinds of semantic information (e.g., subjects, properties,
and values) be placed in prescribed locations in XML. Programs that
read an XML encoding of RDF can then tell whether a particular element
or attribute refers to a subject, a property, or the value of a
property.
The Jena API
Jena was developed by Brian McBride of Hewlett-Packard and is
derived from earlier work on the SiRPAC API. Jena allows one to parse,
create, and search RDF models.
Jena defines a number of interfaces for accessing and manipulating
RDF statements as shown in the figure below.

The RDFNode interface provides a common base for all elements that
can be parts of RDF triples. The Literal interface represents literals
such as "red fish" or 225 that can be used as the <object> in
{predicate, subject, object} triples. The Literal interface provides
accessor methods to convert literals to various Java types such as
String, int, and double.
Objects implementing the Property interface can be the
<predicate> in {predicate, subject, object} triples.
The Statement interface represents a {predicate, subject, object}
triple. It can also be used as the <object> in a triple since
RDF allows statements to be nested.
Objects implementing the Container, Alt, Bag, or Seq interface can be the
<object> in {predicate, subject, object} triples.
Parsing RDF With Jena
Related Articles
Building a Semantic Web Site
What is RDF?
An Introduction to Dublin Core
One area where RDF can be useful is for embedding metadata in web
pages. Such metadata might contain information about the author and
subject of the page. This RDF can be encoded as XML embedded within
the XHTML page.
An RDF-aware search engine can use this metadata to give more
relevant results than a search engine that relies on keyword
matching. An RDF-aware search engine crawler can use Jena to parse the
RDF. Jena can take an XHTML page that contains embedded RDF and
extract and parse the RDF. This is done with the read()
method in the Model interface as shown in the following code snippet
(exception handling has been omitted for clarity).
File f;
FileReader fr;
Model model;
f = new File("C:\\test1.html");
fr = new FileReader(f);
model = new ModelMem();
model.read(fr, RDFS.getURI());
In this example C:\test1.html is an XHTML file that has
RDF in the <head>. Jena automatically extracts the
RDF and ignores the rest of the XHTML. The result of parsing is an RDF
model containing the triples from the file. This model can then be
queried.
The first two statements after the declarations in the code
fragment above set fr to a FileReader associated with
C:\test1.html. Then model is set to an instance of
the ModelMem class. ModelMem is a class
provided with Jena that implements the Model interface using main
memory as the storage for the model. Other implementations are
possible; for example, one could create an implementation based on a
transactional database.
Getting All Statements from a Model
Once a search engine crawler has created an RDF model containing
the metadata for a web page it needs to add each triple in the model
to its index so that later searches can find the pages. This can be
done with the listStatements() method in the Model
interface. listStatements() returns a StmtIterator that
iterates over each statement in the model. It can be used as
follows.
Model model;
StmtIterator iter;
Statement stmt;
.
.
.
iter = model.listStatements();
while (iter.hasNext())
{
stmt = iter.next();
// Now use <stmt>
}
The Statement interface provides methods to access the
predicate, subject, and object of the statement as shown below.
Property predicate;
Resource subject;
RDFNode obj;
Statement stmt;
.
.
.
subject = stmt.getSubject();
System.out.println("Subject = " + subject.getURI());
predicate = stmt.getPredicate();
System.out.println("Predicate = " +predicate.getLocalName());
obj = stmt.getObject();
System.out.println("Object = " + obj.toString());
Adding Statements to a Model
Not all applications will read RDF from XML or XHTML files. Many
will need to create RDF statements based on user input or other data.
Consider an RDF personal information manager which maintains a
searchable archive of email messages, browser bookmarks, and calendar
entries. When the program receives a new email message it can extract
the sender and title and create RDF triples for them. It can also
allow the user to enter information about the topics discussed in the
message and create RDF triples containing the topic information.
The following code illustrates how Jena can be used to create triples in a
model.
Model model;
String namespace = "http://www.test.com";
.
.
.
model.createResource("http://www.foo.com/boats#sailboat")
.addProperty(model.createProperty(namespace, "length"), 25)
.addProperty(model.createProperty(namespace, "color"), "teal");
This adds the following statements to model:
{x:length, http://www.foo.com/boats#sailboat, 25}
{x:color, http://www.foo.com/boats#sailboat, "teal"}
where x is a namespace prefix corresponding to the namespace
URI http://www.test.com.
Querying Models
Once an RDF model has been created we need a way to query it. For
example, consider a travel FAQ that contains RDF metadata. Suppose
we want to find all questions that relate to traveling to Africa. In
other words we wish to find all values for res for which
there is a triple of the form {destination, res, Africa} in the
model. The following code shows how this can be done. It arbitrarily
assumes that the property destination is in a namespace
http://foo.org/.
Model model;
Resource r;
ResIterator resourceIter;
.
.
.
resourceIter = model.listSubjectsWithProperty(
model.createProperty("http://foo.org/destination"),
"Africa");
while (resourceIter.hasNext())
{
r = resourceIter.next();
System.out.println("Resource " + r.toString() +
" is about travel to Africa");
}
listSubjectsWithProperty(p, v) finds all triples of the form
{p, <subject>, v} for any subject <subject>. It returns an object
that iterates over the matching triples.
Where to Get Jena
Jena can be downloaded from http://www.hpl.hp.co.uk/people/bwm/rdf/jena/download.htm.
The download includes several examples, JavaDoc, source code and jar
files.
Acknowledgement
Thanks to Brian McBride for his helpful comments on a draft of this article
and for creating Jena.