An Introduction to Prolog and RDF
by Bijan Parsia
April 25, 2001
Introduction: SW is AI
Many Semantic Web
advocates have gone out of their way to disassociate their visions and
projects from the Artificial Intelligence moniker. No surprise, since
the AI label has been the kiss of, if not death, at least scorn, since
Lisp machines were frozen out of the marketplace during the great "AI
winter" of the mid-1980s. Lisp still suffers from its
association with the AI label, though it does well by being connected
with the actual technologies.
However, it is a curious phenomenon that the AI label tends to get
dropped once the problem AI researchers were studying becomes
tractable to some degree and yields practical systems. Voice
recognition and text-to-speech, expert systems, machine vision, text
summarizers, and theorem provers are just a few examples of classic AI
tech that has become part of the standard bag of tricks. The AI label
tends to mark things which aren't yet implemented in a generally
useful manner, often because hardware or general practices haven't yet
caught up.
That seems to describe the Semantic Web pretty well.
An aside -- one interesting phenomenon is
that a lot of AI ends up, after fleeing the CS department, in
Information and Library Sciences. And, of course, librarians, even the
non-techie ones, are really into cataloging, searching, sharing,
correlating, using metadata, intelligent agents... to wit, all the
elements of the Semantic Web. AI folks don't end up in library
departments because librarians are pushovers (as my overdue fines
attest), but because there's a pretty good fit between what (some)
AI-ers like to do, what the library folks want, and between what the
librarians want and what the Semantic Web requires.
So the Semantic Web is an AI project, and we should be
proud of that fact. Not only is it more honest, but it means
that we can be clearer about what constitutes prior art, relevant
research and literature, similar projects, and available
technology. As I've written
before, narrowness of understanding is a pernicious barrier to
sensible progress. Reinventing the wheel isn't nearly as bad as
having to continually reconceptualize it: "not thought here" generally
causes more systematic problems than "not invented here".
In these articles, I'm going to do a little down-to-earth
exploration of RDF, a core Semantic Web technology, using a classic AI
programming language, Prolog, plus some standard AI techniques and
technologies.
A Gentle
Prolog Primer
Prolog was the first logic programming language, and it's still
popular in
industry and in the classroom. There are many implementations,
most of rather good quality. Interestingly, Prolog implementations are
often used as logic servers or drop-in inference engines for larger
programs, so the implementations have gotten fairly good at
integrating with other programs (for example, there are several
Prolog-style inference engines for the JVM, and some truly
fine ones built on Common Lisp).
Prolog is an excellent prototyping language. It's quite easy to
pull together programs with interesting and sometimes surprising
properties. There is a large, high-quality corpus of
Prolog literature and
code, much of which is easily adaptable to one's ad hoc
needs. For example, a simple backward-chaining expert system is
perhaps a page or two of sample code in just about any Prolog
textbook. While not production quality, such toys are ideal for
getting a concrete sense of the problems and possibilities of an
idea.
Syntax and Simple Semantics
There's not room in this article to give a reasonable Prolog
tutorial, but a few preliminaries will be useful for getting a grip on
RDF and how Prolog can deal with it.
It's helpful to contrast Prolog programs with invocations
of them. A typical Prolog program will form a knowledge base
-- a database of facts and rules which is used as a basis for
inferences. To initiate computations, you query the knowledge
base. Here's a very simple Prolog program which forms a small
knowledge base about the readers of some popular web sites.
reads(john, 'XML.com').
reads(mary, 'XML.com').
reads(mary, xmlhack).
reads(cristina, xmlhack).
Each line in this program asserts a fact. The first line claims
that john reads 'XML.com'; the second that
mary reads 'XML.com', and so on. reads,
john, mary, 'XML.com',
cristina, and xmlhack are all Prolog
atoms (a.k.a. "symbols"). The atom is the most basic and
prevalent datatype in Prolog. If an atom begins with an uppercase
letter, or contains certain special characters (like the full stop,
which is also the statement terminator), then one encloses it in
single quotes (hence, 'XML.com'; while standard, you may
find Prolog systems with alternative syntax for atom literals).
Given the types of characters that tend to show up in URIs, they
almost always need to be enclosed in single quotes to produce their
eponymous atoms. RDF makes heavy use of URIs, which basically means
that, worst case, when processing RDF with Prolog you'll be
writing 'http://purl.org/yadda/yadda/yadda/' a lot (for
some reasonable value of "yadda").
Now that we have our knowledge base, we can interrogate our Prolog
system. After loading the program into my Prolog ("consulting" it, in
Prolog lingo), I can enter questions and receive answers at the
"query" prompt.
?-reads(john, 'XML.com').
yes
"John reads 'XML.com'?"
Prolog says, "Sure does."
?-reads(mary, X).
X = 'XML.com'
yes
"mary reads what?"
X is a variable. Prolog searched the knowledge base and
found that if X was bound to 'XML.com' we get a "true"
statement (i.e., one in the knowledge base).
?-reads(Person, 'XML.com').
Person = john;
Person = mary;
No
"What Person reads 'XML.com'?"
"john does!" (read "Who else?" for ";")
"And mary!"
"Anyone else?"
"Nope."
(Thus we see one standard Prolog development cycle: edit the
knowledge base in a text editor. Load it into the system,
i.e. "consult it". Then interact with it from the read-query,
evaluate, print loop.)
(Note: an unquoted capitalized atom is a variable. Hence
X is a variable, as is Person.)
Notice that in the second and third examples, there's more than one
answer that will satisfy the query: mary reads both
'XML.com' and xmlhack, and both
john and mary read
'XML.com'. In the last session, after Prolog told me that
john read 'XML.com', instead of hitting
"enter", I hit the semicolon, which told Prolog to look again for
other ways my query can be satisfied. I kept doing this until there
were no solutions that hadn't already been given. (While these
particular commands are quite common in Prolog read-query-print loops,
they are not universal.)
Suppose we want to know if any one person reads both
'XML.com' and xmlhack?
?-reads(Person, 'XML.com'), reads(Person, xmlhack).
Person = mary;
No
(The comma between the clauses is pronounced "and".)
Suppose we want to derive some targeted email marketing
lists. We will probably find, in those circumstances, that this last
query is quite a common one . It would be quite a drag to have to type
that query out every time we wanted to send some spam. More
importantly, the concept "a reader of both 'XML.com' and
xmlhack" has a special status for us: it defines the term
spam_target. We could add the statement
spam_target(mary) to our knowledge base, but that's both
redundant (as we can figure out that mary's a
spam_target from what we already know) and a pain to
maintain (e.g., if mary stops reading
xmlhack due to having to spend all her time deleting our
spam, we have to change two lines in the program which aren't
obviously connected). Fortunately, we can add a rule to our
knowledge base to define our new concept.
spam_target(Sucker) :-
reads(Sucker, 'XML.com'),
reads(Sucker, xmlhack).
"A Sucker is a spam_target if
That Sucker reads 'XML.com' and
That Sucker reads xmlhack."
Assuming that we don't alter our knowledge base any other way, the query
spam_target(Person) will return mary.
Moving to RDF
In the pre-rule knowledge base, each fact had three parts:
- the predicate,
reads;
- the subject of the predicate, i.e., the reader (
john,
mary, and so forth);
- and the thing they read, i.e., the object of the predicate
(
'XML.com' and xmlhack).
By a striking and carefully planned coincidence, these are exactly
the components of an RDF triple
(hereafter, I'll use "RDF triple" and "triple" interchangeably). The
RDF triple is one of several formal
models offered by the core RDF spec, and it
consists of an ordered 3-tuple of URIs (with the exception that the
object position may take a string literal) with the first URI naming a
predicate, the second naming a subject, and the last item being either
an URI naming an object or a string literal. While the example Prolog
facts have the same slots as a triple, the symbols which fill those
slots aren't URIs. Happily, it's not that hard to convert our simple
knowledge base:
- The Objects: since all the objects currently in our
knowledge base are web sites, it seems natural to use their base
URL as their name, thus,
'http://www.xml.com/' and
'http://www.xmlhack.com/' (remember, to make URIs
into standard Prolog atoms, you typically need to single quote
them).
- The Predicate: the predicate atom (
reads)
has no intrinsic, natural URI, but we can simply use the URL of this
article (which is unique and not particularly useful for anything else)
prepended to the atom, which yields:
'http://www.xml.com/pub/a/2001/04/25/prologrdf/reads'.
- The Subjects: again, there's no natural intrinsic URIs,
but it seems a little nasty to use that same long URL prefix that we used
for the predicate. To add a little visual difference, we'll invent
mailto: based URIs for each person:
'mailto:mary@prologarticle.xml.com',
'mailto:john@prologarticle.xml.com', etc.
We can now covert the example knowledge base to a collection of RDF
triples:
Predicate
Subject
Object
'http://www.xml.com/pub/a/2001/04/25/prologrdf/reads'
'mailto:john@prologarticle.xml.com'
'http://www.xml.com/'
'http://www.xml.com/pub/a/2001/04/25/prologrdf/reads'
'mailto:mary@prologarticle.xml.com'
'http://www.xml.com/'
'http://www.xml.com/pub/a/2001/04/25/prologrdf/reads'
'mailto:mary@prologarticle.xml.com'
'http://www.xmlhack.com/'
'http://www.xml.com/pub/a/2001/04/25/prologrdf/reads'
'mailto:cristina@prologarticle.xml.com'
'http://www.xmlhack.com'
Of course, this table presentation of the triples is bit hard to
query. It would be nice if we could encode these triples in a form
that Prolog understood. Fortunately, those URI atoms are just atoms,
and we can use them just as we did our original (more concise)
ones:
'http://www.xml.com/pub/a/2001/04/25/prologrdf/reads'(
'mailto:john@prologarticle.xml.com',
'http://www.xml.com/').
'http://www.xml.com/pub/a/2001/04/25/prologrdf/reads'(
'mailto:mary@prologarticle.xml.com',
'http://www.xml.com/').
'http://www.xml.com/pub/a/2001/04/25/prologrdf/reads'(
'mailto:mary@prologarticle.xml.com',
'http://www.xmlhack.com/').
'http://www.xml.com/pub/a/2001/04/25/prologrdf/reads'(
'mailto:cristina@prologarticle.xml.com',
'http://www.xmlhack.com').
?-'http://www.xml.com/pub/a/2001/04/25/prologrdf/reads'(
Person,'http://www.xml.com/').
Person = 'mailto:john@prologarticle.xml.com'
Yes
This is rather ugly as it stands (adding XML style namespaces will
help), but it gives us a nice, constructive demonstration of how RDF
triples are, or can be seen as, Prolog facts; and, hence, how a
collection of RDF triples (say, as serialized in an RSS document) can be a Prolog
program.
However, since Prolog knowledge bases can have facts with many
arguments, and can have rules, we might want to keep our RDF-based
facts somewhat distinct from the rest of program. One way we might do
this is by explicitly saying that a triple of URIs are in the
RDF subject-predicate-object relation. We could call that predicate
rdf_triple, as in
rdf_triple('http://www.xml.com/pub/a/2001/04/25/prologrdf/reads',
'mailto:john@prologarticle.xml.com',
'http://www.xml.com/').
rdf_triple('http://www.xml.com/pub/a/2001/04/25/prologrdf/reads',
'mailto:mary@prologarticle.xml.com',
'http://www.xml.com/').
rdf_triple('http://www.xml.com/pub/a/2001/04/25/prologrdf/reads',
'mailto:mary@prologarticle.xml.com',
'http://www.xmlhack.com/').
rdf_triple('http://www.xml.com/pub/a/2001/04/25/prologrdf/reads,
'mailto:cristina@prologarticle.xml.com',
'http://www.xmlhack.com').
We can recover our old, easier to type, formulation by defining a
few rules:
reads(Person, Website) :-
rdf_triple('http://www.xml.com/pub/a/2001/04/25/prologrdf/reads',
Person,
Website).
Our spam_target rule will work with this new knowledge
base essentially as it did with the old one, without
modification.
The definition of the rdf_triple predicate establishes
a RDF knowledge base. Our reads rule can be thought of as
an RDF application. In other words, our rules process the
RDF. The kind of processing we do is a form of inference. We
can use inferences to produce results similar to other forms of
processing (such as transformations or SQL queries) though often with
less work and more clarity.
Taking Stock
The root RDF data model is deliberately very minimal and, as with
XML, that minimalism is intended to make things easier for
programs. One consequence of that minimalism, when coupled with other
machine-friendly design tropes, is that though "human readable", RDF
is not generally very human writable (although the Notation3
syntax tries to improve things.) Furthermore, while RDF's data
model is specified, the processing model isn't (deliberately), so one
should expect a wide variety of processors, each working in its own
way, depending on a variety of constraints and desiderata.
Standard Prolog provides a rich processing model which naturally
subsumes RDF data. As we saw above, deriving RDF triples from Prolog
predicates, and then the reverse, can deepen our understanding of
both. Furthermore, there is a lot of experience implementing a variety
of alternative processing models (both forward and backward chaining
systems, for example) in Prolog -- from the experimental toy, through
the serious research project, to the industrially deployed,
large-scale production system level. Furthermore, Prolog's roots in
symbolic processing and language manipulation support a wide array of
mechanisms for building expressive notations and languages for
knowledge management, which serve well for hiding the less friendly
aspects of RDF.
Some Useful Links
Here are a few more online Prolog tutorials:
- Adventure In
Prolog
- Building
Expert Systems in Prolog (read Adventure In
Prolog first)
- Prolog
Programming A First Course (an excellent starter)
- Quick
Prolog (even better for a fast overview)
- Logic, Programming and
Prolog (2ed) (the whole text in PDF)
And a few links to information about RDF and the Semantic Web:
- The W3C's Semantic Web
Activity
- The RDF Model and Syntax
Specification
- The RDF Interest Group and
RDFIG IRC
Scratchpad