
Lady and the Tramp
by Edd Dumbill
September 29, 2004
This week's XML Deviant is a tale of two specifications. One,
a scruffy ragtag affair that barely seems to do the job. The
other, a well-heeled, aristocratic sort of document, groomed for
longevity. You know what's coming next, of course. The tramp of
which I speak, the unlikely RSS, goes from strength to strength.
But the latest offspring in the noble line of XML, version 1.1,
is doomed.
Getting Our RSS into Gear
I must confess that these days urgent debates between RSS
fanatics hold even less fascination for me than the grimmest
recesses of the post schema validation infoset. However, recent
noise suggests that there are developments worthy of
attention.
RSS, and I include Atom under this umbrella term,
has become the ultimate reinvention of what was once falsely
called "push technology." Push meaning, of course, scheduled
polling. What was originally intended as a metadata format has
become an envelope for the entire contents of web sites,
advertising and all, pushed into a coherent user interface on a
user's desktop.
In various ways, we who applaud open development models and
scorn over-engineered standards should be very happy with what
happened with RSS. Something quick and nasty has blossomed and
burgeoned. Through pruning and training it is developing and
emerging into something more refined. I can't help feeling that
we might have been able to avoid at least one or two of the
issues along the way by accepting a little more upfront design,
but of course, I would say that.
The latest growing pain for RSS is the problem of distribution.
As anyone who observes web server logs regularly will see,
the pounding a site gets from dumb feed readers polling RSS feeds
can be considerable. Despite the original inclusion of
recommended polling schedules in RSS files, little or no
attention is paid to them now.
When you start putting in the full text of a site into an
RSS feed, not just the metadata, this can mean problems. The
issue culminated in trouble for Microsoft's MSDN, which had to disable some
RSS feeds for this very reason.
Such an eventuality has gotten the RSS world talking about
alternative distribution strategies and ways to keep the
bandwidth bill down. At Tim O'Reilly's recent geekfest, foo
camp, various RSS wonks including Tim Bray, Robert Scoble, and
Sam Ruby started to work out a potential solution to bandwidth
reduction, which they're calling Vary: ETag.
The new proposal appears to advocate building smarts into web
servers so that feed readers only retrieve new RSS items when the
RSS file changes, rather than the whole last ten or so entries
posted to a web site. This means a web server needs to
understand the syndication format. If indeed it can be made as
an easy Apache drop-in, it sounds like a hopeful move forward.
While encouraging, Vary: ETag doesn't feel to me
like a complete solution to the problem. I have been wondering
idly about other distribution mechanisms. The time-honored NNTP
news protocol would seem to make a world of sense, as RSS
directly fits the model of news. Unfortunately in our
HTTP-or-nothing world, it doesn't seem like it will fly too far.
Another option would be to use peer-to-peer client techniques
to disseminate the content. This works very well for sharing
media content using programs such as BitTorrent and seems to
be working out well too for internet telephony company Skype.
One of the advantages of RSS is that many people are willing to
download, and regularly update, custom software in order to read
it. If even just a modicum of the installed base of desktop
feed readers acquired BitTorrent-like functionality for sharing
feeds, it seems to me that the RSS distribution problem could be
nailed quite quickly. Any takers?
If you're not so worried about decentralization, then an
announcement
this week from RSS aggregator Bloglines could be a good solution to
the RSS distribution problem. Many people I know have switched to
Bloglines' web-based RSS reading service in order to be able to read
their RSS from multiple locations. Now Bloglines is offering to
redistribute its aggregated RSS via a web service. The authors of
RSS-reading applications FeedDemon, NetNewsWire, and blogbot have
already pledged to support the new API.
The attraction is, of course, less bandwidth consumption and also the
offer of Bloglines to "insulate developers from the current blog
syndication format wars." Of course, one person's insulator is another's
gatekeeper.
I note that Bloglines only outputs RSS 2.0, eschewing
both RSS 1.0 and Atom. Perhaps not so much an insulation
from the war, as an attempt to fire a winning salvo.
The Bloglines technology is RESTful, at least. It looks a bit like NNTP re-implemented in HTTP. The service appears to be free.
So where is the gain for Bloglines?
I'll watch with interest
to see if this service is embraced or treated with caution. (There's more about Bloglines in Marc Hedlund's an O'Reilly Network article.)
XML 1.1 Dead in the Water?
The XML 1.1 recommendation, published at the beginning of this year,
changes what is permitted in the names of elements and
attributes in order to accommodate the growth of Unicode. As
the list of changes says:
Whereas XML 1.0 provided a rigid definition of names, wherein
everything that was not permitted was forbidden, XML 1.1 names
are designed so that everything that is not forbidden (for a
specific reason) is permitted.
This change substantially affects the definition of what is a
well-formed XML document, and so all things that depend upon the
base XML recommendation,
whether software or specification, must change too. That is the big
issue looming over XML 1.1 of course: does enough momentum exist
to change the very large installed base of XML 1.0?
Recent indications are that schema technologies won't be
changing in a hurry. A message from
Norm Walsh to the RELAX NG mailing list comments on the
decision by the ISO working group not to incorporate XML 1.1 names
into the current RELAX NG standard.
Well, I suppose I should console myself that at least W3C XML
Schema and RELAX NG have consistent views on the matter. It
makes XML 1.1 a nearly pointless waste of time, but that's just
the way it is, I guess.
Walsh refers to the fact that the W3C XML Schema working group
will not amend WXS 1.0 to incorporate XML 1.1 names and will
instead wait
until W3C XML Schema 1.1 is published. The situation as it
stands now for XML 1.1 is uncertain. It may be years before
even schema technologies support it and even longer before
tools do. There's a very real risk that XML 1.1 documents might
end up as mere interesting curiosities.
Births, Deaths, and Marriages
The latest announcements from the XML-DEV mailing list. Thin
pickings this week, I'm afraid.
- Examploforms 0.1
Start of an XForms authoring and modeling tool from Micah
Dubinko. Bringing Examplotron's design-by-example mentality
to forms.
- XML 2004 Deadlines Approaching -- Early Bird Registration, Late-Breaking/Product Submissions
Time to get your house in order for the main U.S. XML
event of the year, Nov. 15-19, in Washington, D.C.
- Saxon 8.1
New release of Michael Kay's XSLT and XQuery processor,
including "added goodies in Saxon-SA beyond schema-awareness,
most notably an extension to support higher-order functions."
Scrapings
Fear my magic
firewall ... I was wrong about WS-*, apparently analysts say it's all good ...
52 messages to XML-DEV last week, Len rating 10% (bonus points for understatement
of the week, "OWL is relatively academic") ...
taking a proactive approach to out of office autoresponders ... revisiting history and whetting our appetites for next week.