Comparing and Replacing Strings
Comparing and Replacing Strings

Comparing and Replacing Strings

by Bob DuCharme
June 05, 2002

In last month's column we looked at XSLT techniques for splitting up strings of text, for checking whether strings had certain substrings, and for normalizing white space out of an element. This month we'll learn more ways to gain control over strings in your source document, as we see how to compare strings for equality and what kind of search-and-replace operations are possible in XSLT.

To see if two elements are the same, XSLT compares their string values using the equals sign ("="). To demonstrate several variations on this, our next stylesheet compares the a element in the following with its sibling elements. (All stylesheets, input documents, and output documents shown in this article are in this zip file.)

<poem>

<a>fullofPompandGold</a>

<b>fullofPompandGold</b>

<c>fullofpompandgold</c>

<d>

fullofPompandGold



</d>

</poem>

The stylesheet has a template rule for the a element with a series of xsl:if instructions. Each of these instructions compares the a element's content with something and reports whether the test is true.

<!--xq327.xsl:convertsxq326.xmlintoxq328.txt-->



<xsl:templatematch="a">



<xsl:iftest=".='fullofPompandGold'">

1.a="fullofPompandGold"

</xsl:if>



<xsl:iftest=".=../b">

2.a=../b

</xsl:if>



<xsl:iftest=".=../c">

3.a=../c

</xsl:if>



<xsl:iftest=".!=../c">

4.a!=../c

</xsl:if>



<xsl:if

test="translate(.,'abcdefghijklmnopqrstuvwxyz',

'ABCDEFGHIJKLMNOPQRSTUVWXYZ')=

translate(../c,'abcdefghijklmnopqrstuvwxyz',

'ABCDEFGHIJKLMNOPQRSTUVWXYZ')">

5.a=../c(ignoringcase)

</xsl:if>



<xsl:iftest=".=../d">

6.a=../d

</xsl:if>



<xsl:iftest=".=normalize-space(../d)">

7.a=normalize-space(../d)

</xsl:if>



</xsl:template>



As the result shows, xsl:if elements 1, 2, 4, 5, and 7 are true for the document above:

1.a="fullofPompandGold"



2.a=../b



4.a!=../c



5.a=../c(ignoringcase)



7.a=normalize-space(../d)

Test number 1 in this stylesheet compares the a element (represented by ".") with the literal string "full of Pomp and Gold". They're equal, as the message added to the result tree tells us. Test 2 compares the a element with its sibling b element, and as the result shows, they too are equal. (If you're unfamiliar with the ../b notation to point to the b sibling, see the "Transforming XML" column Finding Relatives.)

Test 3 compares element a with element c, and they're not equal—two characters are in a different case. XML is very case-sensitive, so this xsl:if instruction adds nothing to the result.

Test 4 compares element a and c again, but using the != comparison operator to check for inequality. This test is true, so a message about Test 4 gets added to the result.

The fifth test uses the translate() function that we looked at last month to map the a and c elements to upper-case versions and compares those. Because upper-case versions of these two elements are the same, Test 5 is true, and the appropriate message gets added to the result.

XSLT offers no built-in way to automatically convert a string's case because the mapping is often dependent on the language being used—and sometimes, even on where it's being used. For example, an upper-case "" at the start of a word is "" in France but "E" in Canada.

Test 6 compares element a with element d, which has the same text and some additional white space—a few carriage returns and either spacebar spaces or tabs to indent the text. As the result document shows, the two elements are not equal.

Test 7 compares a and d again, but it compares a to a version of the d element returned by the normalize-space() function. This time, the equality test is true.

The normalize-space() function has been the savior of many string equality tests. XML's treatment of white space can be a complex topic, because it's not always clear which white space it ignores and which it recognizes. Any automated process that creates XML elements may put white space between elements or it may not, so a way to say "get rid of extraneous white space before comparing this string to something" is very useful in XSLT. In fact, the seventh xsl:if instruction above would be even better if both sides of the comparison in the xsl:if element's test attribute were passed to this function, like this:

<!--xq329.xsl-->



<xsl:iftest="normalize-space(.)=normalize-space(../d)">

7.a=normalize-space(../d)

</xsl:if>

Search and Replace

The translate() function can replace specific characters with other characters, but XSLT offers no built-in method for globally replacing one string of text with another.

Global replacement is a basic text transformation task and XSLT is a language for transforming text (that is, a language for transforming XML documents, which are text) so string replacement is closely related to the tasks that a stylesheet developer often attacks with XSLT. Fortunately, existing XSLT techniques can be combined to give a stylesheet a search-and-replace capability. The most important technique is the use of parameters with recursive named templates; see the "Transforming XML" column Getting Loopy if you're unfamiliar with it.

As an example, we'll look at a stylesheet that converts the string "finish" to "FINISH" throughout the following XML document.

<winelist>



<winegrape="Chardonnay">

<winery>Benziger</winery>

<product>Carneros</product>

<year>1997</year>

<desc>Well-texturedflavors,goodfinish.</desc>

<prices>

<list>10.99</list>

<discounted>9.50</discounted>

<case>114.00</case>

</prices>

</wine>



<winegrape="Cabernet">

<winery>Duckpond</winery>

<product>MeritSelection</product>

<year>1996</year>

<desc>Sturdyandgenerousflavors,longfinish.</desc>

<prices>

<list>13.99</list>

<discounted>11.99</discounted>

<case>143.50</case>

</prices>

</wine>



</winelist>

The stylesheet has three template rules. The third one just copies all the source tree nodes except for text nodes to the result tree.

The second template rule handles text nodes. It calls the first template, the named "globalReplace" template, to add the text node template's contents to the result tree.

<!--xq332.xsl:convertsxq331.xmlintoxq333.xml-->

<xsl:stylesheetxmlns:xsl="http://www.w3.org/1999/XSL/Transform"

version="1.0">

<xsl:outputmethod="xml"omit-xml-declaration="yes"/>



<xsl:templatename="globalReplace">

<xsl:paramname="outputString"/>

<xsl:paramname="target"/>

<xsl:paramname="replacement"/>

<xsl:choose>

<xsl:whentest="contains($outputString,$target)">



<xsl:value-ofselect=

"concat(substring-before($outputString,$target),

$replacement)"/>

<xsl:call-templatename="globalReplace">

<xsl:with-paramname="outputString"

select="substring-after($outputString,$target)"/>

<xsl:with-paramname="target"select="$target"/>

<xsl:with-paramname="replacement"

select="$replacement"/>

</xsl:call-template>

</xsl:when>

<xsl:otherwise>

<xsl:value-ofselect="$outputString"/>

</xsl:otherwise>

</xsl:choose>

</xsl:template>



<xsl:templatematch="text()">

<xsl:call-templatename="globalReplace">

<xsl:with-paramname="outputString"select="."/>

<xsl:with-paramname="target"select="'finish'"/>

<xsl:with-paramname="replacement"select="'FINISH'"/>

</xsl:call-template>

</xsl:template>



<xsl:templatematch="@*|*">

<xsl:copy>

<xsl:apply-templatesselect="@*|node()"/>

</xsl:copy>

</xsl:template>



</xsl:stylesheet>

The "globalReplace" named template is a general-purpose string replacement template based on one posted to the XSL-List mailing list by Mike J. Brown. As the example shows, it gets called with three parameters:

  • outputString is the string on which it will perform the global replacement.

  • target is the string that it will look for in outputString—the string that will be replaced.

  • replacement is the new string that will be substituted for any occurrence of target in outputString.

The template must add outputString to the result tree unchanged if it has no occurrence of the target string, so first it checks whether the target string is there or not. An if-else construction would be great for this, but XSLT offers no equivalent of an "else" condition to go with its xsl:if instruction. However, an xsl:choose instruction can perform the same logic with a single xsl:when element followed by an xsl:otherwise element. In the template, the xsl:when condition uses the contains() function to check whether outputString has target in it. If it does, an xsl:value-of instruction uses a concat() function to put together two strings for the result tree: everything in outputString before the first target and then the replacement string.

What about the rest of outputString, after the target that got found and replaced by the replacement string? The "globalReplace" named template makes a recursive call to itself to make any more substitutions necessary in the remaining part of the string, passing substring-after($outputString,$target) (that is, everything in outputString after the found occurrence of target) as the value of outputString for this new invocation of the function. If that new invocation finds another occurrence of the target string, it will add everything up to it and the replacement string to the result tree and then call the function again for the remainder of that string if necessary. By making recursive calls to handle the remainder of the string, it really is a global replace, because multiple occurrences of the target all get replaced.

If the xsl:when instruction's test attribute doesn't find the target string in outputString, the xsl:otherwise element's xsl:value-of instruction just adds the value of outputString to the result tree. This is the crucial stopping condition that any recursive template needs to ensure that it doesn't call itself forever. Whether outputString has zero occurrences of target or fifty of them, eventually this xsl:otherwise part of the xsl:choose instruction will get chosen and the "globalReplace" named template will not call itself again for this source tree text node.

The result of calling this stylesheet with the document above has both occurrences of the string "finish" replaced with "FINISH":

<winelist>



<winegrape="Chardonnay">

<winery>Benziger</winery>

<product>Carneros</product>

<year>1997</year>

<desc>Well-texturedflavors,goodFINISH.</desc>

<prices>

<list>10.99</list>

<discounted>9.50</discounted>

<case>114.00</case>

</prices>

</wine>



<winegrape="Cabernet">

<winery>Duckpond</winery>

<product>MeritSelection</product>

<year>1996</year>

<desc>Sturdyandgenerousflavors,longFINISH.</desc>

<prices>

<list>13.99</list>

<discounted>11.99</discounted>

<case>143.50</case>

</prices>

</wine>



</winelist>

One nice thing about this "globalReplace" named template is that it's a general purpose named template—it still works when called in other situations. For example, the following template also calls it, but note the template's match condition: it only replaces the one-character string "9" with the "0" in text nodes that are child nodes of year elements, because those are the nodes specified by the template rule's match condition.

<!--xq334.xsl:convertsxq331.xmlintoxq335.xml-->

<xsl:templatematch="year/text()">

<xsl:call-templatename="globalReplace">

<xsl:with-paramname="outputString"select="."/>

<xsl:with-paramname="target"select="'9'"/>

<xsl:with-paramname="replacement"select="'0'"/>

</xsl:call-template>

</xsl:template>

When run with the same source document as the previous example, this template replaces the nines in the year elements and leaves the nines in the prices elements alone:

<?xmlversion="1.0"encoding="UTF-8"?>

<winelist>



<winegrape="Chardonnay">

<winery>Benziger</winery>

<product>Carneros</product>

<year>1007</year>

<desc>Well-texturedflavors,goodfinish.</desc>

<prices>

<list>10.99</list>

<discounted>9.50</discounted>

<case>114.00</case>

</prices>

</wine>



<winegrape="Cabernet">

<winery>Duckpond</winery>

<product>MeritSelection</product>

<year>1006</year>

<desc>Sturdyandgenerousflavors,longfinish.</desc>

<prices>

<list>13.99</list>

<discounted>11.99</discounted>

<case>143.50</case>

</prices>

</wine>



</winelist>

(If you really want to replace one character with another like this, the translate() function would be more efficient.) This demonstrates how customizing the stylesheet's use of the "globalReplace" template doesn't have to mean tinkering with the template itself. Instead, being more selective about the outputString value passed to the template allows the stylesheet to focus the template's power. The named template can be used in multiple situations exactly as it is.

These two columns have provided a tour of XSLT 1.0's string manipulation functions. XSLT 2.0 promises us some more, partly inspired by the string manipulation extension functions available in some XSLT processors. Check out your XSLT engine's documentation to see what else you may not have available to you; also see my book XSLT Quickly fordescriptions of more functions that can add power to your XSLT stylesheets.

Close    To Top
  • Prev Article-XML:
  • Next Article-XML:
  • Now: Tutorial for Web and Software Design > XML > Styles > XML Content
    Photoshop Tutorial
     

    Special Effect

      3D Effect
      Photoshop Articles
    Programming Tutorial
     

    C/C++ Tutorial

      Visual Basic
      C# Tutorial
    Database Tutorial
     

    MySQL Tutorial

      MS SQL Tutorial
      Oracle Tutorial
    Geek Tutorial
     

    Blogging Tutorial

      RSS Tutorial
      Podcasting Tutorial
    Graphic Design Tutorial
      Coreldraw Tutorial
      Illustrator Tutorial
      3D Tutorials
    Webmaster Articles
     

    Domain Service

      Web Hosting
      Site Promotion
    Java Tutorial/ Articles
     

    Java Servlets

      JavaEE Tutorial
     

    JavaBeans Tutorial

    XML Tutorial/ Articles
     

    XML Style

      AJAX Tutorial
      XML Mobile
    Flash Tutorial/ Articles
     

    Flash Video

      Action Script
      Flash Articles
    OS Tutorial/ Articles
      Linux Tutorial
      Symbian Tutorial
      MacOS Tutorial
    Personal Tech
      Hardware Tutorial
      Software Tutorial
      Online Auction