Table of Contents
Introduction to 4RDF
Basic Example
Manipulating the Model
A Taste of Knowledge
Summary
4Suite is a library and
collection of tools for XML and object database development using Python, with support for most UNIX
flavors and Win32. Fourthought,
Inc. develops 4Suite as open source software, and the package
(this article discusses the 0.9.1 release) includes a set of
sub-components:
- 4DOM: an XML/HTML library based on DOM Level 2;
- 4XPath: a complete XPath 1.0 engine;
- 4XSLT: a complete XSLT 1.0 processor;
- 4XPointer: a (so far) partial implementation of XPointer;
- 4ODS: an object persistence library based on ODMG 3.0, including a persistent DOM
engine;
- 4RDF: a library based on the W3C RDF specifications.
There are other technologies supported in 4Suite, such as SAX and
UUID generation, but the focus of this article is 4RDF. I shall assume
familiarity with RDF. There are many resources providing introduction
and discussion at the W3C's RDF
page.
4RDF is a full-featured library based on the abstract models
defined by W3C in their RDF Model and Syntax
Recommendation 1.0 (RDF M&S) and RDF Schema
Candidate Recommendation 1.0. (RDF Schemas). It provides several
features beyond the RDF core, including multiple persistence
mechanisms and an experimental inference layer for RDF data. Note that
Fourthought is currently alpha-testing a 4Suite Server, a distribution
of 4Suite with a built-in CORBA interface to allow use as a black box
from other platforms and programming languages.
Introduction to 4RDF
Figure 1 shows a diagram of 4RDF's
architecture. The core component is the RDF model. This provides an
API for operations based on RDF M&S. The Model is a thin layer,
for instance, it doesn't control how RDF data is stored: this is
deferred to the driver. The driver provides a uniform interface so
that many back-ends can be plugged in. 4RDF comes with the Memory
back-end, which as its name implies, is very quick but provides no
persistence. There is also support for PostgresQL and Oracle database
storage.

Figure 1: 4RDF architecture
There is also a pluggable interface for serialization and
deserialization of the RDF model. 4RDF comes with support (through
DOM) for the XML serialization specified in RDF M&S. The
SchemaHandler provides basic RDF Schema support. First of all, it can
prep a model with all the RDF Schema classes and relationships from
the spec. Then it can check model modifications against Schema
constraints during processing.
Finally, there is an experimental inference engine that comes with
4RDF. It defines a special, open XML vocabulary known as RDF Inference
Language (RIL) to perform expert-systems-like processing on RDF data
with standard mappings between RDF predicates and the formal logic
predicates more common in inferencing systems.
Basic Example
A small example will give you flavor of 4RDF and its features. Listing 1 is a Python program
that reads in serialized RDF, performs some manipulations, and then
prints out a serialization of the result. To get it running, see the
packaging
info. If you use the source package, the INSTALL file in the
package should tell you how to set up. You don't have to be very
familiar with Python to read and understand the example or to try out
4RDF yourself.
It will help to have Listing
1 available in another window as you read the next section.
The listing starts with a serialized RDF string. The RDF is
actually an instance of RSS,
describing an item from the Opentechnology.org
site. (OpenTechnology.org is a site that Fourthought is working on as
a way to gather discussions, comments, and other resources of value to
the XML community as a dynamic knowledge base. There is strong
emphasis on using XML tools such as XSLT and RDF so that people
familiar with those technologies have a very free hand for customizing
their view and use of the site. Please note that Opentechnology.org's
RSS gateway is still in internal alpha, so treat this strictly as an
example for now.)
In brief, for anyone unfamiliar with RSS, the RSS document
describes a content channel: it first describes the basic channel,
then an image that can be associated with the channel, and finally an
item of content available on the channel. The descriptions give basic
content access data such as title and URL.
The code then sets up the driver for the model. This provides the
actual storage for the RDF data. In our example, we just use of the
memory driver. Using the database drivers is similar. I also use the
transaction features of 4RDF models, which aren't really meaningful
using the memory driver, but they illustrate the feature. With a
database backend, 4RDF helps manage the transactions for the
developer. 4Suite server expands this with CORBA Object Transaction
Service support.
Next, the code creates an RDF model instance itself, using the
driver we created. Note that we give the model a base URI (the first
parameter). This value might be the URI where the serialized version
is available. It can also be an empty string.
Now we come to complete(), the heart of the 4RDF
query engine. The complete() method is a very basic
pattern matching tool that returns all the statements in the model
whose parts are exactly the same as the given subject, predicate and
object. None is used as a wildcard, so our first print
statement, OUTPUT 1 in the listing, will return a list of all
statements in the model. Of course since we have a brand new model,
it's empty.
Note that if we were using 4RDF's schema support (which is beyond
the scope of this article), the model would begin with statements
representing all of the basic RDF meta-model, such as statements
describing rdfs:Class or rdfs:Domain.
Next the code illustrates 4RDF's ability to read serialized RDF
into a model. The XML serialization specified in RDF M&S is
supported, including all abbreviations, but excluding some
problematic features such as aboutEachPrefix. Now that
we have read in our sample RSS data, the model contains all the
corresponding statements, as we see when we print all the contents
again (OUTPUT 2). A portion of OUTPUT 2 follows, indented for clarity.
[<RDF Statement at 135860888:
[Subject: http://opentechnology.org/rssgateway.rss,
Predicate: http://purl.org/rss/1.0/#title,
Object: "OpenTechnology.org"]>
<RDF Statement at 135829880:
[Subject: http://opentechnology.org/rssgateway.rss,
Predicate: http://purl.org/rss/1.0/#description,
Object: "An XML community site for threaded discussion and
knowledge management, using XML, DOM, XSLT, and RDF. "]>,
<RDF Statement at 135182912:
[Subject: http://opentechnology.org/rssgateway.rss,
Predicate: http://www.w3.org/1999/02/22-rdf-syntax-ns#type,
Object: "http://purl.org/rss/1.0/#channel"]>,
... ]
Next we illustrate a more selective complete(). It
returns only the statements with a predicate of
"http://purl.org/rss/1.0/#title". OUTPUT 3 follows:
[<RDF Statement at 135182912:
[Subject: http://opentechnology.org/rssgateway.rss,
Predicate: http://purl.org/rss/1.0/#title,
Object: "OpenTechnology.org"]>,
<RDF Statement at 135862448:
[Subject: http://opentechnology.org/images/openlogo.gif,
Predicate: http://purl.org/rss/1.0/#title,
Object: "OpenTechnology.org Logo"]>,
<RDF Statement at 135831528:
[Subject: http://www.opentechnology.org/talk/view.html?
uri=urn:uuid:10a0b01-0-60b-a07-b090305f,
Predicate: http://purl.org/rss/1.0/#title,
Object: "RDF Inference Language (RIL)"]>]
Manipulating the Model
The contents of models can be manipulated directly from a program.
The next part of the code solves the problem: "I'd like to remove all
of the model that pertains to a particular RSS item for
OpenTechnology.org." It first does a complete() with the
offending item as the subject and all other parameters wildcards. It
then iterates over all the resulting statements to remove them.
Finally, the code writes what's left of the model back into
serialized form. Technically, it creates a DOM (4DOM, to be exact)
node representing the serialization. The code then finally uses 4DOM
features to convert the resulting node to an XML string, print it out
(OUTPUT 4), and then clean up. (Note that the
ReleaseNode clean-up is only required with Python 1.x,
and as Python 2.0 is in beta this code will be unnecessary before
long.) OUTPUT 4 follows.
<?xml version='1.0' encoding='UTF-8'>
<rdf:RDF
xmlns:rdf='http://www.w3.org/1999/02/22-rdf-syntax-ns#'
xmlns:ns1='http://purl.org/rss/1.0/#'>
<rdf:Description
about='http://opentechnology.org/images/openlogo.gif'>
<ns1:link resource='http://opentechnology.org'/>
<ns1:title>OpenTechnology.org Logo</ns1:title>
<rdf:type resource='http://purl.org/rss/1.0/#image'/>
<ns1:inchannel
resource='http://opentechnology.org/rssgateway.rss'/>
<ns1:url resource='http://opentechnology.org/images/openlogo.gif'/>
</rdf:Description>
<rdf:Description about='http://opentechnology.org/rssgateway.rss'>
<rdf:type resource='http://purl.org/rss/1.0/#channel'/>
<ns1:title>OpenTechnology.org</ns1:title>
<ns1:description>
An XML community site for threaded discussion and knowledge
management, using XML, DOM, XSLT, and RDF.
</ns1:description>
</rdf:Description>
</rdf:RDF>
You can see that the description of the RSS item is gone: we
removed it from the model. Also note that 4RDF will not satisfy
demands for strict round-tripping of RDF. First of all, the image and
channel descriptions are transposed. Secondly, 4RDF generates
automatic prefixes for some output namespaces. This is correct and
justifiable, but it might be annoying to some. Not as justifiable,
however,, is some mangling of output URIs such as
http://purl.org/rss/1.0/#image (notice the introduced
"#"). This is a recently discovered bug that will hopefully have been
fixed by the time you read this.