
Comparing and Replacing Strings
by Bob DuCharme
June 05, 2002
In last month's column we looked at XSLT techniques for splitting up strings of
text, for checking whether strings had certain substrings, and for normalizing
white space out of an element. This month we'll learn more ways to gain control
over strings in your source document, as we see how to compare strings for
equality and what kind of search-and-replace operations are possible in
XSLT.
To see if two elements are the same, XSLT compares their string values using
the equals sign ("="). To demonstrate several variations on this, our
next stylesheet compares the a element in the following with its
sibling elements. (All stylesheets, input documents, and output documents shown
in this article are in this zip file.)
<poem>
<a>fullofPompandGold</a>
<b>fullofPompandGold</b>
<c>fullofpompandgold</c>
<d>
fullofPompandGold
</d>
</poem>
The stylesheet has a template rule for the a element with a series
of xsl:if instructions. Each of these instructions compares the
a element's content with something and reports whether the test is
true.
<!--xq327.xsl:convertsxq326.xmlintoxq328.txt-->
<xsl:templatematch="a">
<xsl:iftest=".='fullofPompandGold'">
1.a="fullofPompandGold"
</xsl:if>
<xsl:iftest=".=../b">
2.a=../b
</xsl:if>
<xsl:iftest=".=../c">
3.a=../c
</xsl:if>
<xsl:iftest=".!=../c">
4.a!=../c
</xsl:if>
<xsl:if
test="translate(.,'abcdefghijklmnopqrstuvwxyz',
'ABCDEFGHIJKLMNOPQRSTUVWXYZ')=
translate(../c,'abcdefghijklmnopqrstuvwxyz',
'ABCDEFGHIJKLMNOPQRSTUVWXYZ')">
5.a=../c(ignoringcase)
</xsl:if>
<xsl:iftest=".=../d">
6.a=../d
</xsl:if>
<xsl:iftest=".=normalize-space(../d)">
7.a=normalize-space(../d)
</xsl:if>
</xsl:template>
As the result shows, xsl:if elements 1, 2,
4, 5, and 7 are true for the document above:
1.a="fullofPompandGold"
2.a=../b
4.a!=../c
5.a=../c(ignoringcase)
7.a=normalize-space(../d)
Test number 1 in this stylesheet compares the a element (represented
by ".") with the literal string "full of Pomp and Gold". They're equal,
as the message added to the result tree tells us. Test 2 compares the a
element with its sibling b element, and as the result shows, they too
are equal. (If you're unfamiliar with the ../b notation to point to the
b sibling, see the "Transforming XML" column Finding Relatives.)
Test 3 compares element a with element c, and they're not
equal—two characters are in a different case. XML is very case-sensitive,
so this xsl:if instruction adds nothing to the result.
Test 4 compares element a and c again, but using the
!= comparison operator to check for inequality. This test is true, so a
message about Test 4 gets added to the result.
The fifth test uses the translate() function that we looked at last
month to map the a and c elements to upper-case versions and
compares those. Because upper-case versions of these two elements are the same,
Test 5 is true, and the appropriate message gets added to the result.
XSLT offers no built-in way to automatically convert a string's case because
the mapping is often dependent on the language being used—and sometimes,
even on where it's being used. For example, an upper-case "" at the start of a
word is "" in France but "E" in Canada.
Test 6 compares element a with element d, which has the
same text and some additional white space—a few carriage returns and
either spacebar spaces or tabs to indent the text. As the result document shows,
the two elements are not equal.
Test 7 compares a and d again, but it compares a
to a version of the d element returned by the
normalize-space() function. This time, the equality test is true.
The normalize-space() function has been the savior of many string
equality tests. XML's treatment of white space can be a complex topic, because
it's not always clear which white space it ignores and which it recognizes. Any
automated process that creates XML elements may put white space between elements
or it may not, so a way to say "get rid of extraneous white space before
comparing this string to something" is very useful in XSLT. In fact, the seventh
xsl:if instruction above would be even better if both sides of the
comparison in the xsl:if element's test attribute were passed
to this function, like this:
<!--xq329.xsl-->
<xsl:iftest="normalize-space(.)=normalize-space(../d)">
7.a=normalize-space(../d)
</xsl:if>
Search and Replace
The translate() function can replace specific characters with other
characters, but XSLT offers no built-in method for globally replacing one string
of text with another.
Global replacement is a basic text transformation task and XSLT is a language
for transforming text (that is, a language for transforming XML documents, which
are text) so string replacement is closely related to the tasks that a
stylesheet developer often attacks with XSLT. Fortunately, existing XSLT
techniques can be combined to give a stylesheet a search-and-replace
capability. The most important technique is the use of parameters with recursive
named templates; see the "Transforming XML" column Getting Loopy
if you're unfamiliar with it.
As an example, we'll look at a stylesheet that converts the string "finish"
to "FINISH" throughout the following XML document.
<winelist>
<winegrape="Chardonnay">
<winery>Benziger</winery>
<product>Carneros</product>
<year>1997</year>
<desc>Well-texturedflavors,goodfinish.</desc>
<prices>
<list>10.99</list>
<discounted>9.50</discounted>
<case>114.00</case>
</prices>
</wine>
<winegrape="Cabernet">
<winery>Duckpond</winery>
<product>MeritSelection</product>
<year>1996</year>
<desc>Sturdyandgenerousflavors,longfinish.</desc>
<prices>
<list>13.99</list>
<discounted>11.99</discounted>
<case>143.50</case>
</prices>
</wine>
</winelist>
The stylesheet has three template rules. The third one just copies all the
source tree nodes except for text nodes to the result tree.
The second template rule handles text nodes. It calls the first template, the
named "globalReplace" template, to add the text node template's contents to the
result tree.
<!--xq332.xsl:convertsxq331.xmlintoxq333.xml-->
<xsl:stylesheetxmlns:xsl="http://www.w3.org/1999/XSL/Transform"
version="1.0">
<xsl:outputmethod="xml"omit-xml-declaration="yes"/>
<xsl:templatename="globalReplace">
<xsl:paramname="outputString"/>
<xsl:paramname="target"/>
<xsl:paramname="replacement"/>
<xsl:choose>
<xsl:whentest="contains($outputString,$target)">
<xsl:value-ofselect=
"concat(substring-before($outputString,$target),
$replacement)"/>
<xsl:call-templatename="globalReplace">
<xsl:with-paramname="outputString"
select="substring-after($outputString,$target)"/>
<xsl:with-paramname="target"select="$target"/>
<xsl:with-paramname="replacement"
select="$replacement"/>
</xsl:call-template>
</xsl:when>
<xsl:otherwise>
<xsl:value-ofselect="$outputString"/>
</xsl:otherwise>
</xsl:choose>
</xsl:template>
<xsl:templatematch="text()">
<xsl:call-templatename="globalReplace">
<xsl:with-paramname="outputString"select="."/>
<xsl:with-paramname="target"select="'finish'"/>
<xsl:with-paramname="replacement"select="'FINISH'"/>
</xsl:call-template>
</xsl:template>
<xsl:templatematch="@*|*">
<xsl:copy>
<xsl:apply-templatesselect="@*|node()"/>
</xsl:copy>
</xsl:template>
</xsl:stylesheet>
The "globalReplace" named template is a general-purpose string replacement
template based on one posted to the XSL-List mailing list by Mike
J. Brown. As the example shows, it gets called with three parameters:
-
outputString is the string on which it will perform
the global replacement.
-
target is the string that it will look
for in outputString—the string that will be
replaced.
-
replacement is the new string that will
be substituted for any occurrence of target in
outputString.
The template must add outputString to the result tree unchanged if
it has no occurrence of the target string, so first it checks whether
the target string is there or not. An if-else construction would be great for
this, but XSLT offers no equivalent of an "else" condition to go with its
xsl:if instruction. However, an xsl:choose instruction can
perform the same logic with a single xsl:when element followed by an
xsl:otherwise element. In the template, the xsl:when condition
uses the contains() function to check whether outputString has
target in it. If it does, an xsl:value-of instruction uses a
concat() function to put together two strings for the result tree:
everything in outputString before the first target and then
the replacement string.
What about the rest of outputString, after the target that
got found and replaced by the replacement string? The "globalReplace"
named template makes a recursive call to itself to make any more substitutions
necessary in the remaining part of the string, passing
substring-after($outputString,$target) (that is, everything in
outputString after the found occurrence of target) as the
value of outputString for this new invocation of the function. If that
new invocation finds another occurrence of the target string, it will add
everything up to it and the replacement string to the result tree and then call
the function again for the remainder of that string if necessary. By making
recursive calls to handle the remainder of the string, it really is a global
replace, because multiple occurrences of the target all get
replaced.
If the xsl:when instruction's test attribute doesn't find
the target string in outputString, the xsl:otherwise
element's xsl:value-of instruction just adds the value of
outputString to the result tree. This is the crucial stopping condition
that any recursive template needs to ensure that it doesn't call itself
forever. Whether outputString has zero occurrences of target
or fifty of them, eventually this xsl:otherwise part of the
xsl:choose instruction will get chosen and the "globalReplace" named
template will not call itself again for this source tree text node.
The result of calling this stylesheet with the document above
has both occurrences of the string "finish" replaced with "FINISH":
<winelist>
<winegrape="Chardonnay">
<winery>Benziger</winery>
<product>Carneros</product>
<year>1997</year>
<desc>Well-texturedflavors,goodFINISH.</desc>
<prices>
<list>10.99</list>
<discounted>9.50</discounted>
<case>114.00</case>
</prices>
</wine>
<winegrape="Cabernet">
<winery>Duckpond</winery>
<product>MeritSelection</product>
<year>1996</year>
<desc>Sturdyandgenerousflavors,longFINISH.</desc>
<prices>
<list>13.99</list>
<discounted>11.99</discounted>
<case>143.50</case>
</prices>
</wine>
</winelist>
One nice thing about this "globalReplace" named template is that it's a
general purpose named template—it still works when called in other
situations. For example, the following template also calls it, but note the
template's match condition: it only replaces the one-character string "9" with
the "0" in text nodes that are child nodes of year elements, because
those are the nodes specified by the template rule's match condition.
<!--xq334.xsl:convertsxq331.xmlintoxq335.xml-->
<xsl:templatematch="year/text()">
<xsl:call-templatename="globalReplace">
<xsl:with-paramname="outputString"select="."/>
<xsl:with-paramname="target"select="'9'"/>
<xsl:with-paramname="replacement"select="'0'"/>
</xsl:call-template>
</xsl:template>
When run with the same source document as the previous example, this template
replaces the nines in the year elements and leaves the nines in the
prices elements alone:
<?xmlversion="1.0"encoding="UTF-8"?>
<winelist>
<winegrape="Chardonnay">
<winery>Benziger</winery>
<product>Carneros</product>
<year>1007</year>
<desc>Well-texturedflavors,goodfinish.</desc>
<prices>
<list>10.99</list>
<discounted>9.50</discounted>
<case>114.00</case>
</prices>
</wine>
<winegrape="Cabernet">
<winery>Duckpond</winery>
<product>MeritSelection</product>
<year>1006</year>
<desc>Sturdyandgenerousflavors,longfinish.</desc>
<prices>
<list>13.99</list>
<discounted>11.99</discounted>
<case>143.50</case>
</prices>
</wine>
</winelist>
(If you really want to replace one character with another like this, the
translate() function would be more efficient.) This demonstrates how
customizing the stylesheet's use of the "globalReplace" template doesn't have to
mean tinkering with the template itself. Instead, being more selective about the
outputString value passed to the template allows the stylesheet to
focus the template's power. The named template can be used in multiple
situations exactly as it is.
These two columns have provided a tour of XSLT 1.0's string manipulation
functions. XSLT 2.0 promises us some more, partly inspired by the string
manipulation extension
functions available in some XSLT processors. Check out your XSLT engine's
documentation to see what else you may not have available to you; also see my
book XSLT Quickly fordescriptions of more functions that can add power to your XSLT stylesheets.