The Impact of Site Finder on Web Services
by Steve Loughran
October 28, 2003
This quotation is from a presentation by VeriSign to ICANN, stating
that their recent and temporarily suspended changes to the root DNS
servers have had no reported effect on automated HTTP tools; and, further,
we shouldn't be automating HTTP access anyway.
Unfortunately, the entire web service protocol stack that IBM,
Microsoft, the W3C, Apache, and others have been busy working on for the
past few years is effectively "automated processes using HTTP over TCP
port 80". Thus the niche processes that are being so glibly inconvenienced
by these changes happen to include what many people believe is the future
of distributed systems.
This article shows how SOAP-based web service stacks do in fact suffer
from VeriSign's changes and discusses what can be done to fix them. The
simplest solution is to leave Site Finder turned off. If it comes back,
regardless of what changes we make to the SOAP stacks, the process of
identifying configuration defects will be made more complex.
Introduction
A few years ago, when we were bringing up an early web service, we got
a support call from the customers: our XML-RPC service was "sending back
bad XML". Their client stack, written for an appropriately large fee by
some consultancy group, was failing with SAX parser errors. Yet everything
was working perfectly on our tests, so the fault had to be somewhere on
their side. During the debugging session that ensued, we managed to get
hold of the XML content that was causing the trouble. It was the HTML 404
page automatically generated by IIS. This lead to a highly memorable
conversation.
"We have found the problem: your client program is receiving an IIS
error page and failing to parse it."
"I knew it -- there is a problem on your site."
"We aren't running IIS"
You see, we were running a Java
Application server fronted by Apache 1.3. The client-side configuration
file was wrong and the client system was pointing at some random server,
an IIS server sending back its error page. Their client software was
handing this page to an XML parser, with predictable consequences.
I learned a lot from that incident. I learned that a client-side XML
parser error is often caused by HTML coming down the wire. I learned that
home-rolled web service protocol stacks often neglect to test for HTTP
error codes. And I learned that the first thing to do with any problem
that you don't see yourself is to figure out which URL you are trying to
talk to.
This is a question that everyone building a SOAP, XML-RPC, or REST web
service should be prepared to ask more often as a result of the new Site
Finder service.
Site Finder
On September 15, 2003, VeriSign tweaked the .com and .net DNS
registries so that every lookup for an unknown host resolved to a search
service web site, Site Finder, rather than return the NXDOMAIN response
traditionally associated with DNS lookup failures.
This led many users' web browsers to the service, which VeriSign hoped
would lead to the users clicking through on the paid links in the search
service, thus bringing revenue to the company. Unfortunately, Site Finder
also happens to break many existing programs: all those that assume a
missing hostname maps to an immediate error. These programs will get back
a hostname, but when they connect for a conversation, they will get back a
"connection refused" error, wrapped into the language and toolkit specific
exception, fault, or error code the client program expects. All such
programs are now going to have to their documentation rewritten so that
people know that a connection refused error may mean the hostname is
wrong.
An interesting question is what impact will the changes have on web
services -- anything using XML over HTTP as the means of coupling
computers. One assumption of VeriSign's is mostly valid: such applications
do use HTTP, albeit often on a different port. The other assumption --
that whoever is making the request would be grateful to see a search page
-- is clearly false.
Theory
Here is what used to happen on a SOAP request to an invalid endpoint
hostname, such as http://nosuchhost.com/endpoint:
-
Caller does DNS lookup.
-
DNS returns an error.
-
The protocol stack returns something like
java.io.UnknownHostException.
-
If the application is smart, it maps this to a meaningful error
such as that may be an incorrect hostname.
-
If the application is simple. it shows the framework's error
and assumes the end user is smart enough to understand it.
-
If a person is at the end of the application, they see the error
and either fix their endpoint or phone up support.
-
If it is unattended operation, the machine ought to retry
later. Applications aren't meant to cache failed lookups, but Java is
naughty: some versions do exactly that unless told not to.
-
If the host comes back later, all is well. If not, then the
application should have a recovery policy.
Now let's look at how things would be expected to change with
Site Finder intervening:
-
Caller does DNS lookup.
-
DNS returns the IP address of something.
-
Caller creates a TCP link to a port 80 on that machine, then sends its
SOAP request; usually a POST, although SOAP 1.2 adds GET.
-
The endpoint returns 302, "moved temporarily", redirecting the
caller to a URL under http://sitefinder.verisign.com.
-
If the client handles 302 responses, then it resends the request
to Site Finder.
-
Site Finder returns 200, "OK", and an HTML search page
A SOAP client would normally POST its SOAP request, expecting an
XML formatted SOAP response and a 200 code on success, 500 on a
fault. Only now it would get a 200 response with text/html
content. What is it going to do?
Either it is going to test the MIME type and bail out when that is not
XML; or, as in the example cited above, it will hand it off to the XML
parser, which will then break as the content is not valid XML. Even if it
were valid XHTML, as per the W3C, the parsing would quite probably fail
messily when the application tried to make sense of the data.
The result of this is that the VeriSign response does not parse. The
client application is going to give some kind of error, perhaps an XML
parser error, and that is going to lead to a support call.
The result of the change, therefore, is that if 302 redirects are
handled in the web service client, then you are going to get more support
calls. What about frameworks that don't? Well, they will report it
somehow. Again, it is a more subtle error than Unknown Host,
which means support get a call.
Not only is the 302 or search page going to result in meaningless
errors, because the responses are only sent after the request is sent, a
big request -- such as a POST of binary data or SOAP with Attachments
message -- will only fail after the upload. This will waste time and
bandwidth. Requests made from a device that pays by the second or by the
byte -- such as a cellphone -- will be costing the user even more money
than before.