
Never Mind the Namespaces: An XSLT RSS Client
by Bob DuCharme
January 02, 2003
RSS is an XML-based format for summarizing and providing links to news
stories. If you collect RSS feed URIs from your favorite news sites, you
can easily build dynamic, customized collections of news stories. In a
recent
XML.com article Mark Pilgrim explained the history and formats used
for RSS. He also showed a simple Python program that can read RSS files
conforming to the three RSS formats still in popular use: 0.91, 1.0, and
2.0. While reading Mark's article I couldn't help but think that it would
be really easy to do in XSLT.
Easy, that is, if you're familiar with the XPath
local-name()
function. In a
past column I
showed how this function retrieves the part of an element name that
identifies it within its namespace. For example, an element with a
qualified name of "blue:verse" has the local name "verse" (and not "blue",
as I wrote in a typo in that column and only just now caught; "blue" is
the namespace prefix).
Typical XSLT stylesheets care a great deal about an element's
namespace. If a channel element in an RSS 1.0 file comes from the
http://purl.org/rss/1.0/ namespace and a channel element
from an RSS 2.0 file comes from the
http://purl.org/dc/elements/1.1/ namespace, then an XSLT
processor considers these two element types to be as different as a
title element from a book publishing namespace and a
title element from a human resources namespace. However, by
basing match conditions (and, as we'll see later, select tests in
xsl:apply-templates instructions) on the local name of source
tree elements, we can explicitly tell the XSLT processor to ignore the
namespace of certain elements. For example, we can have a template rule
that applies to all elements with a local name of "channel," regardless of
their namespace.
The following stylesheet mimics the behavior of the rss1.py Python
program in Mark's article:
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:dc="http://purl.org/dc/elements/1.1/" version="1.0">
<xsl:output method="text"/>
<xsl:template match="*[local-name()='title']">
<xsl:text>title: </xsl:text>
<xsl:apply-templates/>
</xsl:template>
<xsl:template match="*[local-name()='link']">
<xsl:text>link: </xsl:text>
<xsl:apply-templates/>
</xsl:template>
<xsl:template match="*[local-name()='description']">
<xsl:text>description: </xsl:text>
<xsl:apply-templates/>
</xsl:template>
<xsl:template match="dc:creator">
<xsl:text>author: </xsl:text>
<xsl:apply-templates/>
</xsl:template>
<xsl:template match="dc:date">
<xsl:text>date: </xsl:text>
<xsl:apply-templates/>
</xsl:template>
<xsl:template match="language"/> <!-- suppress -->
</xsl:stylesheet>
Essential Reading
What Are Syndication Feeds
By ShelleyPowers
Syndication feeds have become a standard tool on the Web. But when you enter the world of syndicated content, you're often faced with the question of what is the "proper" way to do syndication. This edoc, which covers Atom and the two flavors of RSS--2.0 and 1.0--succinctly explains what a syndication feed is, then gets down to the nitty-gritty of what makes up a feed, how you can find and subscribe to them, and which feed will work best for you.
Read Online--Safari
Search this book on Safari:
There is one slight difference: it doesn't print the "date:" and
"author:" headers for news items that have no dc:creator or
dc:date children. RSS 0.91 doesn't use these two Dublin Core
elements. The first template rule in this stylesheet has an asterisk and a
predicate inside of square braces to specify that the XSLT engine should
apply that rule to any element meeting the predicate condition: its local
name is "title." The second and third template rules use a similar format
to handle the RSS link and description elements.
I won't show the input and output for this stylesheet: they're
essentially the same as the input and output in Mark's article. Instead,
I'd rather take the stylesheet a few steps further to create a standalone
news aggregator that requires no special software other than a web browser
and an XSLT processor.
Three basic XSLT techniques make this possible:
- Most XSLT processors can read remote documents using XSLT's
document()
function; our stylesheet will use it to retrieve the news feeds from their
servers.
- Converting the RSS elements and attributes to HTML for display by the
browser.
- Using the local-name() function to create template rules that
don't care about the namespace of RSS elements such as channel,
item, and link.
There are plenty of RSS-based news aggregating clients around:
Amphetadesk,
NewzCrawler,
NetNewsWire, among
many others. The advantage of using one written in XSLT means that you
don't have to install new software on your machine or login to a
server-based aggregator that needs to look up a list of your favorite
feeds. You can also more easily integrate the XSLT-based one into other
applications -- for example, to add customized news feeds to your
company's intranet site without relying on any software more expensive or
exotic than an XSLT processor.
Our stylesheet will transform the following XML document, which links
to summaries of several news feeds and blogs:
<?xml-stylesheet href="getRSS.xsl" type="text/xsl"?>
<RSSChannels>
<!-- RSS 0.91 feeds -->
<RSSChannel src="http://www.xml.com/cs/xml/query/q/19"/>
<RSSChannel src="http://xml.coverpages.org/covernews.xml"/>
<RSSChannel src="http://www.bbc.co.uk/syndication/feeds/news/ukfs_news/world/rss091.xml"/>
<!-- RSS 1.0 feeds -->
<RSSChannel src="http://www.ilrt.bristol.ac.uk/discovery/rdf/resources/rss.rdf"/>
<RSSChannel src="http://www.smartmobs.com/index.rdf"/>
<RSSChannel src="http://www.infoworld.com/rss/news.rdf"/>
<!-- RSS 2.0 feeds -->
<RSSChannel src="http://www.panix.com/~jbm/snappy/index.xml"/>
<RSSChannel src="http://www.antipixel.com/blog/index.xml"/>
<RSSChannel src="http://revjim.net/index.xml"/>
</RSSChannels>
As the document's comments tell us, it includes feeds from the three
currently popular RSS formats. For now, most feeds using RSS 2.0 come from
webloggers interested in playing with the latest technology, but I'm sure
we'll see more commercial sites take advantage of the richer metadata
possibilities offered by the post-0.91 releases.
The processing instruction in the document's first line identifies the
stylesheet to use for dynamic rendering in a web browser. Before looking
at how the stylesheet works, first watch it in action: unzip
this file onto your hard disk and use a recent
release of Internet Explorer to open RSSChannels.xml. There are a few
caveats to remember:
- This doesn't work with Mozilla, which, as of release 1.2.1, still has
some kinks in its implementation of the document() function.
- I'd hoped to put the XML file and its stylesheet on a public server so
that you could just link to it from this article to see it in action, but
I got an "Access denied" message when the stylesheet tried to use the
document() function to retrieve a document from a different
server. This could be a security precaution in IE's XSLT
implementation.
Using IE to open up local copies of RSSChannels.xml and its
accompanying getRSS.xsl stylesheet should work fine. A batch file or shell
script can also use Xalan or Saxon and these two files to create an HTML
file that any web browser can read. So, these caveats won't stand in the
way of anyone developing their own XSLT RSS client -- they just get in the
way of the flashy demo that I had originally planned.
Let's look at the getRSS.xsl stylesheet.
<!-- getRSS.xsl: retrieve RSS feed(s) and convert to HTML. -->
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:dc="http://purl.org/dc/elements/1.1/" version="1.0">
<xsl:output method="html"/>
<xsl:template match="RSSChannels">
<html><head><title>Today's Headlines</title></head>
<style><xsl:comment>
p { font-size: 8pt;
font-family: arial,helvetica; }
h4 { font-size: 12pt;
font-family: arial,helvetica;
font-weight: bold; }
a:link { color:blue;
font-weight: bold;
text-decoration: none; }
a:visited { font-weight: bold;
color: darkblue;
text-decoration: none; }
</xsl:comment></style>
<body>
<xsl:apply-templates/>
</body></html>
</xsl:template>
<xsl:template match="RSSChannel">
<xsl:apply-templates select="document(@src)"/>
</xsl:template>
<!-- Named template outputs HTML a element with href link and RSS
description as title to show up in mouseOver message. -->
<xsl:template name="a-element">
<xsl:element name="a">
<xsl:attribute name="href">
<xsl:apply-templates select="*[local-name()='link']"/>
</xsl:attribute>
<xsl:attribute name="title">
<xsl:apply-templates select="*[local-name()='description']"/>
</xsl:attribute>
<xsl:value-of select="*[local-name()='title']"/>
</xsl:element>
</xsl:template>
<!-- Output RSS channel name as HTML a link inside of h4 element. -->
<xsl:template match="*[local-name()='channel']">
<xsl:element name="h4">
<xsl:call-template name="a-element"/>
</xsl:element>
<!-- Following line for RSS .091 -->
<xsl:apply-templates select="*[local-name()='item']"/>
</xsl:template>
<!-- Output RSS item as HTML a link inside of p element. -->
<xsl:template match="*[local-name()='item']">
<xsl:element name="p">
<xsl:call-template name="a-element"/>
<xsl:text> </xsl:text>
<xsl:if test="dc:date"> <!-- Show date if available -->
<xsl:text>( </xsl:text>
<xsl:value-of select="dc:date"/>
<xsl:text>) </xsl:text>
</xsl:if>
</xsl:element>
</xsl:template>
</xsl:stylesheet>
Even with whitespace and comments, the whole thing is less than 80
lines. It has five template rules:
- The first is for the root RSSChannels element of the main
document that holds the RSS feed URIs. It does the basic setup of the
result HTML document, including the addition of a CSS stylesheet.
- The short second template rule acts on an RSSChannel element,
using the XSLT document() function to read in the document named
by the element's src attribute. The stylesheet assumes that the
document being read is an RSS document, and the stylesheet uses the
remaining three template rules to transform the elements of the RSS
document read in by the document() function into HTML.
- The third template rule's xsl:template element has a
name attribute instead of a match attribute, making it a
named template rule that must be explicitly called from a template rule.
Because the fourth and fifth template rules surround their result contents
with an HTML a element of a similar structure, the common code is
stored in this named template. Note how the xsl:apply-templates
instruction uses the local-name() function to selectively
identify which element types to use for attribute values in the
result.
- The fourth template rule outputs the name of an RSS channel --
typically, the title of the news channel such as "XML.com" or "InfoWorld:
Top News" -- as an HTML h4 element. The h4 element wraps
an a element that links back to the main page of the site using
the URI named in the channel element's link child
element. The a element includes the description of the channel in
a title element so that when the resulting HTML is displayed
using recent releases of Internet Explorer, Mozilla, or Opera, a
mouseOver event displays that description in a pop-up
box. The actual a element is output with a call to the
"a-element" named template.
- The last template rule outputs an HTML p element containing a
link to a particular news item. It uses the RSS item element's
link and description child elements the same way that
the preceding template rule does, which is why the creation of the
a element with these attributes was moved to a separate template
rule that these two both call. This final template rule adds one more bit
of information: if a dc:date element is supplied with the news
item, the template rule adds that to the result tree as plain
text.
On December 31st I used Saxon to apply this stylesheet to the
RSSChannels document shown above and created an HTML result version that
you can see
here. (Don't forget to
try the mouseOvers...) If I applied the same stylesheet to
the same XML document at a later date, the result would be different, with
more up-to-date news. That's the beauty of RSS.
The actual HTML and CSS that I used create a pretty stark layout. Some
simple additions to the stylesheet could add some glitz to the resulting
appearance, but despite its visual simplicity, this stylesheet still does
a great deal: it retrieves a customized set of news feeds listed in a
simple, easily customizable file, and then displays a menu of the news
items where you can see their titles, read their descriptions, and then
follow the links to the actual stories. You could modify the layout to
make it fancier, or you could modify it to make it simpler -- slight
modifications will let you convert the RSS to
WML, plain text
delivery, or some new markup language being developed for new output
devices. XSLT helps you grab these RSS feeds; what you do with them is up
to you.
Modify the stylesheet to your heart's content and change the URIs in
the RSSChannels document as well. You can find a wide choice of feeds to
choose from at
WebReference.com,
Alternative News
on the Web, Yahoo's
RSS
News Aggregators category, and the massive
news4sites
list. Happy aggregating!