Introduction to XFML
by Peter Van Dijck
January 22, 2003
XFML is a simple XML format for exchanging metadata in the form of
faceted hierarchies, sometimes called taxonomies. Its basic building
blocks are topics, also called categories. XFML won't solve all your
metadata needs. It's focused on interchanging faceted classification
and indexing data. XFML addresses the following problems with basic
hierarchical classification:
- Creating and maintaining a good topic hierarchy is a lot of
work, ask any librarian.
- Indexing (categorizing) large amounts of content consistently is
even harder. See Cory Doctorow's "Metacrap".
- Creating a centralized hierarchy to organize a large amount of
information doesn't scale. (If you think Yahoo's hierarchy scales,
ask yourself why you keep turning to Google.)
XFML provides a simple format to share classification and indexing
data. It also provides two ways to build connections between topics,
information that lets you write clever tools to automate the sharing
of indexing efforts. It's based on the principles of faceted
classification, addressing many of the scaling issues with simple
hierarchies.
What is Faceted Classification?
Facets sound scary and librarian-like, but they are really
just a common sense approach to classifying things. Instead of
building one huge tree of topics, a faceted classification uses
multiple smaller trees (each tree is called a facet) that can then be
combined by the user to find things more easily.
Say you're building a travel site about the USA. You could build a
hierarchy to browse it that looks something like this:
If you're going to New York and want to find a blues bar, browsing
this hierarchy will work just fine for you. That's because it's
organized by city first, type of place second, and type of music
third, which is exactly what you happen to need. But if you're about
to visit the USA and want to decide which city to go to based on its
blues bars, our classification breaks down. You first want to select
your type of music, not your city. Unless there's a good search, you
will have to browse every single city looking for blues bars, which is
neither elegant nor user friendly.
Combining different types of information (city, type of music, type
of place) in one big hierarchy can never address all possible
information needs. Faceted classification addresses this problem by
providing separate facets that can be combined in the user
interface. For example:
City (City is a facet)
- New York (New York is a topic within the facet City)
- L.A.
Type of place
Type of music
By combining these facets, a user could view all bars in New York,
all places that have Latin music throughout the country, or any other
combination. Things have suddenly become a lot more interesting. If
you want to know what an interface for this can look like, check out
Facetmap, a tool that automatically
generates four ways of browsing the same faceted classification. You
can even upload XFML files to it.
How XFML Works
The XFML core
spec gives an introduction,
defines the concepts, and
specifies the XML
format. The spec is stable and frozen, which means you can safely
build applications that use it.
An empty XFML Core document looks like this:
<?xml version="1.0" ?>
<xfml version="1.0" url="http://domain.com/xfml/map1.xml"
language="en-us">
</xfml>
It's a valid XML document and conforms to the XFML Core DTD. The url attribute is
required; it's the URL where the original XFML document can be
found. To be nice we add a comment pointing to the XFML Core spec:
<?xml version="1.0" ?>
<xfml version="1.0" url="http://domain.com/xfml/map1.xml"
language="en-us">
<!-- This document conforms to XFML Core. See
http://purl.oclc.org/NET/xfml/core/ -->
</xfml>
Facets and Topics
The building blocks of a faceted hierarchy in XFML are facets and
topics. A facet is the top node
of each tree. The nodes in the tree are called topics. XFML can
define multiple hierarchies, and each hierarchy is a facet. Our
hierarchy expressed in XFML looks like this:
<facet id="city">City</facet>
<facet id="place">Type of place</facet>
<facet id="music">Type of music</facet>
<topic id="ny" facetid="city"><name>New
York</name></topic>
<topic id="la" facetid="city"><name>Los
Angeles</name></topic>
<topic id="bar"
facetid="place"><name>bar</name></topic>
<topic id="restaurant"
facetid="place"><name>restaurant</name></topic>
<topic id="blues"
facetid="music"><name>blues</name></topic>
<topic id="latin"
facetid="music"><name>latin</name></topic>
The reason why topics have a child element called <name> and
facets don't is that topics can have other child elements. We'll get
to those later. Facet and topic id's are defined in the DTD as id's
and therefore cannot contain spaces or start with a number. The facetid
attribute for topics is required.
You can add unlimited topic hierarchies within a facet, using the
parentTopicid
attribute:
<topic id="ny" facetid="city"><name>New
York</name></topic>
<topic id="brooklyn" facetid="city"
parentTopicid="ny"><name>Brooklyn</name></topic>
<topic id="brooklyn_heights" facetid="city"
parentTopicid="brooklyn"><name>Brooklyn
Heights</name></topic>
So when do you make a hierarchy of topics become a facet? The spec
says, when describing the facet concept,
that "[f]acets are mutually exclusive containers that contain
hierarchies of topics. Mutually exclusive means that a certain topic
can only possibly belong to one facet". The mutual exclusivity
requirement is semantic: it can't be (realistically) enforced by
software. It means that you should separate out a new facet when you
are describing topics that can be usefully combined. Type of music and
city are mutually exclusive facets because a topic in type of music
(Latin) can never be a topic in city (New York). Note that the mutual
exclusivity requirement does not mean that pages (see next section)
can only have occurrences in one facet.
[1] [2] Next