Controlling Whitespace, Part 1
Controlling Whitespace, Part 1

Controlling Whitespace, Part 1

by Bob DuCharme
November 07, 2001

XML considers four characters to be whitespace: the carriage return, the linefeed, the tab, and the spacebar space. Microsoft operating systems put both a carriage return and a linefeed at the end of each line of a text file, and people usually refer to the combination as the "carriage return". XSLT stylesheet developers often get frustrated over the whitespace that shows up in their result documents -- sometimes there's more than they wanted, sometimes there's less, and sometimes it's in the wrong place. Over the next few columns, we'll discuss how XML and XSLT treat whitespace to gain a better understanding of what can happen, and we'll look at some techniques for controlling how an XSLT processor adds whitespace to the result document.

Before we start, however, it's important to remember two things if you get frustrated over a lack of control:

  • XSLT is an XML application that was originally designed to convert XML documents into XML documents.

  • XML applications often seem to take a cavalier attitude toward whitespace because the rules about the places in an XML document where whitespace doesn't matter sometimes give these applications free rein to add or remove whitespace in certain places.

Comment on this article Have you had problems with whitespace in XSLT transforms? Share your experience in the forum. Post your comments

The moral of the story is that when you're using XSLT to create XML documents, you shouldn't worry too much about whitespace. When using it to create text documents whose whitespace isn't coming out the way you want, remember that XSLT is a transformation language, not a formatting language, and some other tool may be necessary to give you the control you need. Extension functions may also provide relief; string manipulation is one of the most popular reasons for writing these functions. See the September column "XSLT Extensions" for more detail .

xsl:strip-space and xsl:preserve-space

The xsl:strip-space instruction lets you specify source tree elements that should have whitespace text nodes (that is, text nodes composed entirely of whitespace characters) stripped.

Let's look at how this element can affect the following sample source document.

<colors>



<color>red</color>



<color>yellow</color>



<color>

blue

</color>



<!--

Nextcolorelementhaswhitespacecontent.

-->

<color></color>



</colors>

To establish a baseline, this first stylesheet has no xsl:strip-space element. It's just an identity stylesheet that copies that source tree document to the result tree.

<xsl:stylesheetxmlns:xsl="http://www.w3.org/1999/XSL/Transform"

version="1.0">



<xsl:outputmethod="xml"omit-xml-declaration="yes"/>



<xsl:templatematch="@*|node()">

<xsl:copy>

<xsl:apply-templatesselect="@*|node()"/>

</xsl:copy>

</xsl:template>



</xsl:stylesheet>

The result looks just like the source:

<colors>



<color>red</color>



<color>yellow</color>



<color>

blue

</color>



<!--

Nextcolorelementhaswhitespacecontent.

-->

<color></color>



</colors>

Now we add an xsl:strip-space element to have the stylesheet strip whitespace text nodes from the color elements.

<xsl:stylesheetxmlns:xsl="http://www.w3.org/1999/XSL/Transform"

version="1.0">



<xsl:outputmethod="xml"omit-xml-declaration="yes"/>



<xsl:strip-spaceelements="color"/>



<xsl:templatematch="@*|node()">

<xsl:copy>

<xsl:apply-templatesselect="@*|node()"/>

</xsl:copy>

</xsl:template>



</xsl:stylesheet>

When applied to the same source tree document, the result looks the same, except that the last color element is now an empty element. In the source tree, its only content was a text node of whitespace characters, and this node got stripped. While the yellow color element has plenty of whitespace, it's in a text node along with the string "yellow", so xsl:strip-space, which only affects nodes that are pure whitespace, leaves it alone.

<colors>



<color>red</color>



<color>yellow</color>



<color>

blue

</color>



<!--

Nextcolorelementhaswhitespacecontent.

-->

<color/>



</colors>

Now let's tell the XSLT processor to strip the whitespace nodes from the parent colors element instead of the color elements.

<xsl:stylesheetxmlns:xsl="http://www.w3.org/1999/XSL/Transform"

version="1.0">



<xsl:outputmethod="xml"omit-xml-declaration="yes"/>



<xsl:strip-spaceelements="colors"/>



<xsl:templatematch="@*|node()">

<xsl:copy>

<xsl:apply-templatesselect="@*|node()"/>

</xsl:copy>

</xsl:template>



</xsl:stylesheet>

This has a more drastic effect, because the colors element had many more whitespace-only text nodes -- all those carriage returns between the color elements. The only carriage returns in the whole document that made it to the result document are the ones that were either inside a color element (before and after "blue") or inside of the comment.

<colors><color>red</color><color>yellow</color><color>

blue

</color><!--

Nextcolorelementhaswhitespacecontent.

--><color></color></colors>

You can list more than one element type name in the xsl:strip-space instruction's elements attribute, as long as their names are separated by whitespace. You can also use an asterisk as this attribute's value to tell the XSLT processor to strip whitespace text nodes from all the elements in the source tree.

    

Also in Transforming XML

Automating Stylesheet Creation

Appreciating Libxslt

Push, Pull, Next!

Seeking Equality

The Path of Control

The xsl:preserve-space instruction does the opposite of the xsl:strip-space instruction: for all elements listed in its elements attribute, the XSLT processor will leave whitespace text nodes alone. By default, the XSLT processor treats all elements as xsl:preserve-space elements, so you only need it to override an xsl:strip-space instruction. For example, if your source document has twenty different element types and you want to strip whitespace nodes in all of them except the codeListing and sampleOutput elements, you don't have to list the other eighteen in an xsl:strip-space element's elements attribute. Instead, use an asterisk for the xsl:strip-space element's elements attribute value and list the two exceptions as the xsl:preserve-space element's elements attribute value.

<xsl:stylesheetxmlns:xsl="http://www.w3.org/1999/XSL/Transform"

version="1.0">



<xsl:outputmethod="xml"omit-xml-declaration="yes"/>



<xsl:strip-spaceelements="*"/>

<xsl:preserve-spaceelements="codeListingsampleOutput"/>



<xsl:templatematch="@*|node()">

<xsl:copy>

<xsl:apply-templatesselect="@*|node()"/>

</xsl:copy>

</xsl:template>



</xsl:stylesheet>

Close    To Top
  • Prev Article-XML:
  • Next Article-XML:
  • Now: Tutorial for Web and Software Design > XML > Styles > XML Content
    Photoshop Tutorial
     

    Special Effect

      3D Effect
      Photoshop Articles
    Programming Tutorial
     

    C/C++ Tutorial

      Visual Basic
      C# Tutorial
    Database Tutorial
     

    MySQL Tutorial

      MS SQL Tutorial
      Oracle Tutorial
    Geek Tutorial
     

    Blogging Tutorial

      RSS Tutorial
      Podcasting Tutorial
    Graphic Design Tutorial
      Coreldraw Tutorial
      Illustrator Tutorial
      3D Tutorials
    Webmaster Articles
     

    Domain Service

      Web Hosting
      Site Promotion
    Java Tutorial/ Articles
     

    Java Servlets

      JavaEE Tutorial
     

    JavaBeans Tutorial

    XML Tutorial/ Articles
     

    XML Style

      AJAX Tutorial
      XML Mobile
    Flash Tutorial/ Articles
     

    Flash Video

      Action Script
      Flash Articles
    OS Tutorial/ Articles
      Linux Tutorial
      Symbian Tutorial
      MacOS Tutorial
    Personal Tech
      Hardware Tutorial
      Software Tutorial
      Online Auction