Introduction to XFML
by Peter Van Dijck
|
Pages
Once you have some facets and topics defined, you will want to
classify or index some web
pages and add them to your XFML document so your indexing efforts can
be shared. You can only classify things that have a URI. Each URI (we
call them pages but you can use other filetypes as well) can be
classified under multiple topics. The homepage of the B.B. King Blues
Club and Grill in New York can be classified under NY, bar and blues
topics. We say these topics occur on the page and we call them
topic
occurrences:
<page url="http://bbkingblues.com/">
<title>B. B. Kind blues club and grill</title>
<description>Conveniently located in the heart of Times
Square near Penn Station and Port Authority, The B.B. King
Blues Club and Grill offers music fans a unique experience.
Owned by the Bensusan Family, proprietors of the world renowned
Blue Note Jazz Club, the club features world-class musical
talent and consists of two distinct spaces: the Showcase Room
and Lucille’s Grill.</description>
<occurrence topicid="bar" />
<occurrence topicid="blues" />
<occurrence topicid="ny" />
</page>
The mapInfo Element
MapInfo is an optional element containing administrative metadata
about the map. Usage is simple, check the spec. For our
example, mapInfo could look something like this:
<mapInfo>
<managingEditor>
<name>Joe Blogs</name>
<email>feedback@joeblogs.com</email>
<url>http://joeblogs.com/</url>
</managingEditor>
<license>
<name>GNU Free Documentation License</name>
<url>http://www.gnu.org/licenses/fdl.html</url>
</license>
</mapInfo>
The mapInfo element can also contain child elements describing additional
editors, a
technical contact, the owner of the
map, and the software used
to generate the map.
Distributed Metadata
What we have so far (facets, topics, pages, and occurrences) lets
us build a file that provides some interesting metadata for others to
reuse. Typically you will write some code that regularly downloads an
updated XFML file from web sites with similar topics to yours, then
takes all the topic occurrences that are relevant to your topics and
copies those occurrences to your XFML document. That's how you can
automate the reuse of indexing efforts.
There is a problem though. If site A wants to reuse the indexing
work of site B, they have to use exactly the same topics. That's not
how the world works. Site A might have topics "blues" and "latin", and
site B might have topics "blues & jazz" and "Latino". They
probably mean the same thing, and B might want to reuse the indexing
of A, but how can your code know which topic occurrences to reuse?
XFML provides two answers. You can create direct connections
between two topics in different maps, indicating that for example the
topic "latin" in map A is equal to the topic "Latino" in map B. You
can also create implicit connections by pointing a topic to a
web page that describes that topic, for example a page with the dictionary
definition for Latino. The software can then infer that any topics
it finds that point to that same page are really the same topic, no
matter what the topic is called.
These two approaches mean that you can create a web of loosely
distributed metadata, which is how XFML attempts to address the
problems with centralized hierarchies.
Connecting Topics
The first approach to reusing indexing efforts is to connect
individual topics between maps. The connect element is a
child of the topic element; its content is the concatenation of three
strings: the URL of another map, the "#" character, and the id of a
topic in that map:
<topic id="latin" facetid="music">
<name>latin</name>
<connect>http://domainb.com/mapb.xml#latino</connect>
</topic>
A topic can contain multiple connect elements.
Published Subject Indicators
The second approach to reusing indexing efforts is to point a topic
to a resource on the web that describes it; in other words, to point
to a
published subject indicator represented by the psi element.
<topic id="latin" facetid="music">
<name>latin</name>
<psi>http://dictionary.reference.com/search?q=latino</psi>
</topic>
A topic can have multiple psi elements. It can even have multiple
connect and psi elements: the more psi or connect elements it has, the
higher the value of your XFML document. Also note that, once you have
established a connection with a topic in another map (through
<connect> or a common <psi>), your software can safely
copy all of the <psi>'s and <connect>'s from that topic to
your topic. Two topics in the same map are not allowed to have the
same <psi> or <connect> elements. Some network effects can
cause contradictions when automatically copying <connect> or
<psi> elements, but those can be resolved by presenting a choice
to the administrator when that happens.
Using XFML
Don't try to fit all your internal metadata into the XFML format.
It's an export format like RSS, and your database will surely have
more fields than XFML can handle. That's okay. If you want a format
that can handle (almost) all your metadata, check out Topicmaps or RDF. When programming XFML support into
your system, check the processing
instructions in the spec. They are just recommendations, however;
you may come up with better ways of doing things.
Exporting XFML is easy; often you can just add a template to your
content management system and leave it at that. A (somewhat rough) example template for
Moveable Type took about half an hour to hack together. Most
content management systems don't support faceted classification
internally, so you are limited in the richness of metadata you can
export. However, you can automatically generate data for facets like
date of publication, length of entry, number of comments, and so on;
or, if you have categories that don't change often, hardcode the
facets and just generate occurrences.
When you make XFML feeds available on your site, indicate them with
an XFML button and add a link element in your HTML as described here
for auto discovery purposes.
Expect some experimentation when importing XFML and automating
indexing work: you'll be traveling in unknown territory. Taxomita is currently the only tool
under development that does advanced importing of XFML. However,
importing is the cutting edge. This is where you take advantage of the
real strength of XFML, namely, distributed metadata. Importing will
allow you to use the information in the <connect> and
<psi> elements to automatically expand your metadata without
resorting to a central list of metadata. We expect exciting things to
happen in this area in 2003.
The XFML.org website has a page with tools that support the
standard. Livetopics
(a plug in for Radio Userland) and Drupal (a content management system)
export XFML. Facetmap lets you
import and browse XFML files, and Taxomita is an upcoming authoring tool built
around XFML. Templates and code libraries are being developed for a
variety of environments.
XFML Core (XFML version 1.0) is the first version of XFML. Work is
being done on XFML 2.0, but that version won't be finished for at
least another year. It may feature elements to describe controlled
vocabularies and more ways to distribute metadata. Check the XFML mailing list for the
latest developments.
Conclusion
XFML is a simple standard to exchange faceted, hierarchical
metadata. What makes it
different is the way it addresses specific problems with metadata
authoring by allowing for distributed metadata through the
<connect> and <psi> elements. It is designed to be easy to
code for and is already supported by a number of tools.
To get started with XFML, I recommend writing an XFML file by hand
and uploading it to Facetmap. There's nothing like seeing this
in action to get your head around the possibilities. After that, try
exporting your existing data (if you have a site with some existing
metadata) as XFML or play around with some of the available tools.
The XFML site has a page
with relevant links to learn more about XFML and faceted
classification. Let me just highlight the Faceted Classification mailing
list, an excellent (non-techie) list about faceted classification,
as well as Mark Pilgrims'
Really Understandable Introduction to XFML.
Prev [1] [2]