XML Technologies: A Success Story
by J. David Eisenberg
May 16, 2001
We've all heard stories of how new XML technologies have helped
build immense corporate databases and complex, dynamic web
sites. Well, this isn't one of those stories. This story is about how
the Apache Software Foundation's
XML tools helped improve this year's California Central Coast Section High School
wrestling tournament.
Background
I've been using a computer to do the scorekeeping for the CCS
wrestling tournament for the past ten years. The program, written in C
for DOS, displays the bracket on the screen. As you enter results of
matches, the winners are automatically advanced and the losers
automatically dropped to the consolation bracket or out of the
tournament. The program prints bracket sheets to an inkjet printer. A
portion of the output is shown in the image at the right. A dot matrix
printer produces mailing labels with the results of the matches. These
labels are affixed to large wall charts.
Other labels containing the information for upcoming matches are
affixed to pre-printed bout sheets such as the one you see at the
right. Up until about 1998, the wall charts and bout sheets were
designed for a label size of 2.5 inches by 15/16 inches. These labels
have to be ordered in advance, and are hideously expensive, as they
are a non-standard size. Recently, the wall charts were redesigned to
permit the standard 3.5 inches by 15/16 inches, but the bout sheets
weren't. Fate intervened when the CCS Events Coordinator asked me to
send him a blank bout sheet to copy. This was my opportunity to create
a new bout sheet template that would let me use the larger labels.
Since I'm using Linux and the CCS uses Windows, I needed a
cross-platform solution. Adobe PDF format was the answer, and this is
where Scalable Vector Graphics (SVG) and Formatting Objects to PDF
(FOP) enter the story.
Creating the Bout Sheet
The bout sheet is not a typical text document; it's mostly a set of
lines, empty boxes and a circle with minimal text labeling. Thus, I
decided to use Scalable Vector Graphics (SVG) to describe the form,
and use FOP as a wrapper to produce the desired PDF output. I took a
ruler and an old bout sheet, redrew the lines, and measured the widths
and locations of the boxes and text, and created the formatting
objects XML file by hand. You may download the
FO file and the resulting PDF file.
The advantages from the new bout sheet were small but
significant. I was able to use cheaper labels, and they didn't overlap
the area where the scorekeepers write the match statistics. This meant
that the results were easier to read. But the real payoff from XML
technologies came near the end of the tournament.
Printing the Results
A few years ago, I added code to the scorekeeping program to output the
brackets as a Rich Text Format (RTF) file. The file didn't describe the
bracket completely; once you loaded it into a word processor you had to set
the font to a small size so that the contents would fit on the page width, and
you had to use a monospace font since the RTF mirrored the screen display and
bracket print code, which was also monospaced. It worked, but it was ugly.
I thought that it would be nicer to use a proportional font with true
underlining and vertical rules, but I didn't know RTF well enough to achieve
this effect. However, I had been learning about XSL Formatting Objects (XSL
FO), and, at the beginning of the second day of the tournament, I realized
that formatting objects, in conjunction with FOP, would do exactly what I
wanted.
Data to XML
The first problem was converting the match data, stored in binary
data files, to XML. Luckily, there's a three-hour break between the
end of the consolation matches and the finals. I used part of this
time to write a Perl script that would produce an XML file from by
reading the files that describe the bracket and match results. I
didn't want to go directly to formatting objects; doing that would
only complicate the Perl script. I decided to create a simple ad-hoc
XML notation that would be nearly a one-to-one correspondence to the
data file structure. The result looked like this:
<line num="1">
<cell num="0" short="yes"></cell>
<cell num="1" underline="yes">Chris Jaworski</cell>
</line>
<line num="2">
<cell num="0" short="yes">Bout 1</cell>
<cell num="1" vbar="yes">(St. Francis)</cell>
<cell num="2" underline="yes">Chris Jaworski</cell>
<cell num="5" text="yes">CCS Championships</cell>
</line>
I associated a num attribute with each line and cell; this
permitted me to skip empty cells.
XML to XSL FO
I could now use XSL Transformation (XSLT) to convert the ad hoc XML
to XSL FO. The plan was to make each cell of the bracket into a
<fo:block> inside an absolutely-positioned
<fo:block-container>. By turning on the bottom and
right borders of each block-container, I could construct the
underlines and vertical bars of the bracket. You may download the XSL
file that does the conversion. I used Xalan to do the transformation
and FOP (0.16.0) to convert
the formatting objects to PDF.
If this were a press release, I'd rhapsodize about how successful
the whole process was and that would be the end of this article. That
would not, however, be the entire story -- a little “post-game
analysis” is in order.
The construction of the bout sheet, in fact, was entirely
straightforward. I had to correct some minor errors, and I had to
convert some of the units to points and do some minor adjustment to
get the text right where I wanted it, but the SVG and FO themselves
worked perfectly.
Converting the data files to XML was not a big problem; the biggest
annoyance was having to look at code I had written years ago to figure
out the internal data structure of the bracket.
Writing the XSLT file to convert to XSL FO was a bit more
problematic and required quite a bit of experimentation. I encountered
some problems during this process:
- After a failed attempt to place a
<fo:blockbreak="page-after"> around a
<fo:block-container>, I tried putting the
break-after="page" in the
<fo:block-container> element. However, block containers
are not part of the text flow, so that doesn't work either. That's why the
template for page has an added
<fo:blockbreak-after="page"> to force a new page to
occur.
- The names were too close to the vertical bars; I fixed it by adding a
start-indent to each cell.
- The names were too far above the underlines. I tried to use the
alignment-adjust attribute, but it wasn't implemented
in FOP 0.16.0. The vertical-align attribute didn't do
the trick either, so I ended up adding a start-before
to move the text closer to the line.
- The title at the upper right of the bracket was longer than the cell
width, it word-wrapped, only the first word showed up. That's why I had to
add the
wrap-option="no-wrap".
You can download a ZIP
file that contains the XML and XSL files if you'd like to take a
closer look. You can download the resulting PDF files from this directory.
Summary
So after all that work and trouble, was it worth it? Yes. It took
me less time to produce the bout sheet with SVG than it would have
taken to find a Windows machine, learn to use a drawing program, and
produce a file that would have been in a proprietary format.
The bracket printout was also worthwhile, mostly as a learning
exercise and also as a proof of concept. The PDF output also looks
better than the RTF. Again, there was a time savings; it was easier
for me to learn the syntax for formatting objects than it would have
been for me to learn the RTF to produce an equally good-looking result
in that format.
Finally, the fact that I was able to accomplish all of these tasks
with open source software is the icing on the cake.