A Bright, Shiny Service: Sparklines
by Joe Gregorio
|
A Web Application for Sparklines
Not everybody is a web services hacker either, so let's put together a web application
that will allow anyone to create a sparkline interactively.
Let's build our web application using JavaScript to create a smoother application.
We're not going to go whole hog on this little app (No sliding tiles a la Google Maps.); we'll
just use JavaScript to reduce the number of round trips to the server.
The first step is to build our form, which has all the form controls that we need to specify a sparkline.
The first problem we run into is that the parameters are different if we are making a discrete sparkline as
opposed to a smooth sparkline. We only want to show the parameters that are relevant. That can be accomplished
by tweaking the CSS of the page on the fly via JavaScript. We'll enclose each section of type specific
controls in a <div/> and when that type of sparkline is selected we'll show that div by setting its display
property. Similarly, we'll hide the divs of the controls that are not relevant.
The JavaScript for this is heavily table driven, there are only 38 lines of non-table code. Here are the tables:
// All the controls for the sparkline graphing, mapped
// to the events we use to track if they have changed,
// and the function to call when that event occurs.
var controls = {
'type_s': ['onclick', create_swapper('type_s')],
'type_d':['onclick', create_swapper('type_d')],
'd': ['onchange', controlChanged],
'height': ['onchange', controlChanged],
'min': ['onclick', controlChanged],
'max': ['onclick', controlChanged],
'last': ['onclick', controlChanged],
'step': ['onchange', controlChanged],
'upper': ['onchange', controlChanged],
'above-color': ['onchange', controlChanged],
'below-color': ['onchange', controlChanged],
'min-color': ['onchange', controlChanged],
'max-color': ['onchange', controlChanged],
'last-color': ['onchange', controlChanged],
};
// Each type of curve takes a different set of parameters
parameters_per_type = {
"smooth" : ['d', 'height', 'min', 'max', 'last',
'min-color', 'max-color', 'last-color', 'step'],
"discrete" : ['d', 'height', 'upper', 'above-color', 'below-color']
};
// Different controls have different ways of
// having their values accessed
parameters_accessor = {
'd': 'value',
'height': 'value',
'min': 'checked',
'max': 'checked',
'last': 'checked',
'step': 'value',
'upper': 'value',
'above-color': 'value',
'below-color': 'value',
'min-color': 'value',
'max-color': 'value',
'last-color': 'value'
}
// Associates the type of sparkline with the div that
// contains the controls specific to it.
var shape_specific_divs = {
'type_s': 'smooth_specific',
'type_d': 'discrete_specific'
};
The "controls" table lists all the controls in the form and
maps the control id to the event they fire when they change and a
pointer to the function to call when the event occurs. This table
will make hooking up all the form control events to the right callback
function easy. We can just loop over each entry in this table, find
the specified control, and hook the listed function in the controls
event.
The "parameters_per_type" table categorizes all the
controls based on which ones apply based on the type of
sparkline. This table makes it easy to construct the URI of the
sparkline.
The "parameters_accessor" table lists all the controls and
maps their id to the name of the property that you use to access the
value of the control. Yes, believe it or not, HTML has some
inconsistencies; luckily we can again use a table driven design to
hide those problems.
The "shape_specific_divs" table maps the type of sparkline,
either smooth or discrete, to the divs that contain the controls that
are specific to each type of sparkline. This is the table we use when
we hide and show controls based on the type of sparkline the user
wants to create.
Now for the code:
function controlChanged() {
var type = "discrete"
for (shape in shape_specific_divs) {
if (document.getElementById(shape).checked) {
type = document.getElementById(shape).value;
}
}
var output_uri = 'spark.cgi?type=' + type;
var parameters = parameters_per_type[type];
for (var i=0; i<parameters.length; i++) {
output_uri = output_uri + "&"
+ parameters[i] +"="
+ document.getElementById(parameters[i])
[parameters_accessor[parameters[i]]];
}
document.getElementById('output_uri').value =
'http://bitworking.org/projects/sparklines/' + output_uri;
document.getElementById('output_img').src = output_uri;
return true;
}
function create_swapper(choice) {
return function swap_specific() {
for (type in shape_specific_divs) {
var s = document.getElementById(shape_specific_divs[type]);
if (type == choice) {
s.style.display = 'block';
} else {
s.style.display = 'none';
}
}
controlChanged();
}
}
function setup() {
for (id in controls) {
document.getElementById(id)[controls[id][0]] = controls[id][1];
}
controlChanged();
}
The setup function is the first function called when the
page is loaded. It hooks up all the control events to the correct
callback function. I told you the controls table would make this
function easy.
The controlChanged function is the
function that gets called every time a control gets
updated. The function scoops up all the values in the
controls and builds the URI of the new sparkline. It then
updates the DOM of the page to use the new URI.
You may have noticed that I've not really been telling the
whole story. Almost every control event calls
controlChanged, but there are two exceptions.
The exceptions are the "type_s" and "type_d"
controls. These are the radio buttons that are used to
select between discrete and smooth sparkline types. When
those radio buttons change, not only do we want to update
the sparkline image, but we also need to swap out the divs that
contain the type-specific controls. To do that we could
have created two functions, one for "type_s" and another
for "type_d," but their code would have been too
similar.
In each case the function would have just
looped through all the divs for type-specific controls,
displayed the div we needed, and hidden all the rest. We
avoid writing two functions by creating a function
create_swapper that returns functions. We pass
the name of the div we want displayed into
create_swapper and in turn it returns a
function that, when called, will display that div, hide all
the rest of the shape_specific_divs, and then call
controlChanged to update the sparkline image.
The thing returned by create_swapper is not
actually just a function since it also keeps around the
value of choice. That difference changes the
return value from merely a function to a
closure. You
may find it easier to
learn
about continuations first, of which closures are a
specific type.
Optimizing
Our application would be even faster if we could cut down
on the number of GETs we did for each image. How can HTTP help us
optimize?
- ETags
-
The ETags: and If-None-Match: headers are used to change a regular GET into a conditional GET. The idea is that
when you do the first GET on a resource an entity tag is returned in the ETag header. That entity tag is
then sent in an If-None-Match: header on each following GET to the same resource. If the resource is unchanged
then the server can detect this by looking at the entity tag and the GET is not performed and a
response of 304 (Not Modified) is returned.
Client Server
| ----------GET-----------> |
| |
| <-- Response + ETag ----- |
If we then do a subsequent GET and the resource has changed then we will receive a full response:
Client Server
| --GET+If-None-Match-----> |
| |
| <------ Response -------- |
The speed increase comes when we do a subsequent GET and the resource has not changed. In that case we get a
304 Not Modified response from the server, which contains no response entity.
Client Server
| ----GET+If-None-Match---> |
| |
| <-----304 Not Modified--- |
That
means if our image is unchanged, then the response body is empty. Now
100% reduction in size, that's what I call a good performance increase.
To implement conditional GET here we need to add two items
to our implementation. The first is the generation of the
ETag: header. For that we need to have a good algorithm
for generating an entity tag, the value of the ETag
header. We need a value that will be the same for images
that are the same, and different for images that are
different. Since all the information that defines an image
is in the query parameters, then we should begin by trying
to generate a value from that. In Python:
print 'ETag: "%d"' % hash(os.environ['QUERY_STRING'])
But there is one more thing that could cause the image to
change for the very same query string. That would be if
we upgrade our CGI script and modify how the images are
constructed. So to be perfectly safe we should also
include the version of our CGI application in the
hash:
print 'ETag: "%d"' % hash(os.environ['QUERY_STRING'] + __version__)i
The ok function can be modified to return the ETag header:
def ok():
print "Content-type: image.gif"
print "Status: 200 Ok"
print "ETag: " + str(hash(os.environ['QUERY_STRING'] + __version__))
print ""
The second addition will be the check for a match if the
If-None-Match: header is included in the request.
if_none_match = os.environ.get('HTTP_IF_NONE_MATCH', '')
if if_none_match and str(hash(\
os.environ.get('QUERY_STRING', '') + __version__)\
) == if_none_match:
not_modified()
And we introduce the not_modified() function, which just issues a 304 Not Modified and exits:
def not_modified():
print "Status: 304 Not Modified"
print ""
sys.exit()
So does this really save us anything? Here is an excerpt
from my log file showing the frequently requested images
used in the web application: the first GET returns the
full image; the second GET is conditional and gets a
response with no entity body and a status code of 304 Not
Modified since the image hasn't changed since we last
requested it.
68.221.46.94 - - [12/Jun/2005:22:50:24 -0400]
"GET /projects/sparklines/spark.cg...tep=3
HTTP/1.1" 200 452 "-" "curl/7.11.1
(i686-pc-cygwin) libcurl/7.11.1 OpenSSL/0.9.7g zlib/1.2.2"
68.221.46.94 - - [12/Jun/2005:22:50:52 -0400]
"GET /projects/sparklines/spark.cg...tep=3
HTTP/1.1" 304 - "-" "curl/7.11.1
(i686-pc-cygwin) libcurl/7.11.1 OpenSSL/0.9.7g zlib/1.2.2"
That first response is 391 bytes long, and includes
just the image, not the headers. The next time
the same request comes along the conditional GET
returns a 304 Not Modified and the whole response
is 0 bytes long, that's what the dash means. Not
only is this saving us bandwidth, it is also
saving us computation time since we avoid replotting
the sparkline.
- gzip
- We might also see some performance improvements by
implementing gzip compression. HTTP allows the client
to indicate that it will accept a gzip'd response body
by sending an Accept-Encoding: header with the value of
"gzip." If the server supports gzip encoding it can
then compress the response body and return the
Content-Encoding: header with a value of "gzip" to
indicate that the body has been compressed.
Rip, Mix and Burn
Now having this shiny, new web service and web application
is fun, but the real power of Web 2.0 comes from combining
services in new ways. Let's combine our new sparkline web
service with data from Technorati. At the beginning of
this article I showed a sparkline that displayed the links
per month for a URI based on data from Technorati. This is
just a matter of combining the two services, taking the
output of a Technorati search, and pumping that data in the
sparkline web service. Here is the code for that
service:
import urllib
import libxml2
import time
LICENSE_KEY = "insert your technorati API key here"
def cosmos(uri, start=0):
"""Get a list of struct_time's for the creation time of a
link to the given URI."""
args = {'url':uri, 'type':'link', 'start':start, 'format':'xml',
'key':LICENSE_KEY, 'limit':'100'}
url = "http://api.technorati.com/cosmos?" + urllib.urlencode(args)
doc = libxml2.parseDoc(urllib.urlopen(url).read())
return [time.strptime(e.content.split(" ")[0], '%Y-%m-%d')
for e in doc.xpathEval("//tapi/document/item/linkcreated")]
def diff_dates(oldest, newest):
""" """
return (newest[0] - oldest[0]) * 12 + newest[1] - oldest[1]
today = time.localtime()
URI = 'http://bitworking.org'
alldates = dates = cosmos(URI)
# We can only get 100 items at a time, keep looping until we
# get less than 100 items to ensure that we get them all.
start_from = 101
while len(dates) == 100:
dates = cosmos(URI, start=start_from)
alldates.extend(dates)
start_from += 100
links_per_month = [0] * (diff_dates(alldates[-1], today) + 1)
for l in alldates:
links_per_month[diff_dates(l, today)] += 1
# Since we indexed by counting the difference in time between
# today and the time of the link creation, we have our list in
# reverse order.
links_per_month.reverse()
max_links_per_month = max(links_per_month)
points = ",".join([str(int(float(d)/max_links_per_month * 100))
for d in links_per_month])
points_unscaled = ",".join([str(d) for d in links_per_month])
print """<html>
<body>
<div>
<p>
<img src="http://bitworking.org/projects/sparklines/spark.cgi?type=smooth&\
d=%s&height=15&min-m=true&max-m=true\
&min-color=red&max-color=blue&step=2" title="%s"/>
<span style="color:red">%d</span>
<span style="color:blue">%d</span>
</p>
</div>
</body>
</html>
""" % (points, points_unscaled, min(links_per_month), max_links_per_month)
The
"cosmos" function uses the Technorati API to find the date of creation
for all the links to our target URI. Once we have the XML representation
we then use libxml2 to pick out the "linkcreated" element from each
item. We then parse each time stamp and return the complete list of
times. Note that since we only care about the month we don't bother
parsing in the time, but only the date.
Since
the Technorati API limits all such query results to 100, we need to loop
and keep getting the next 100 results until we have all the results.
Once we have all the dates, it's only a matter of creating a bin
for each month and incrementing a bin each time we find a link
that was created in that month. After that we use the sparkline web
service to plot the results. For an example of what this script
produces, here
2 43
is the links per month for bitworking.org. Note that if you are using a
capable browser that you can hover you mouse over the sparkline and get a little pop-up window that
shows the raw data used to generate the plot.
Lessons Learned
This was a fun project and in the course I learned quite a few important lessons.
- JavaScript Is Nice
- With support for programming constructs like
closures, JavaScript surprised me with its
expressiveness and compactness.
- Web Service First
- If possible, build the web service first and
then the web application. This helped in several ways. It allows
you to better leverage the work you did on the web service by
utilizing it in the web application. It also forces you to be a
consumer of your web service and that will give you ideas on how to
make it better. Third, if you've built a web service and can't find a
way to use it in your web application, then maybe you need to go back
to the drawing board.
- Leverage GET
- By
using GET to retrieve the sparklines we can use ETags and If-None-Match
headers to reduce the bandwidth our web service uses. In addition those
changes make our web application that much quicker.
- Look Ma, No XML
- While most of the web services we talk about have XML somewhere in them, it's good to have a
reminder once in a while that XML isn't a prerequisite for a RESTful web service.
Prev [1] [2]