Controlling Whitespace, Part Three
Controlling Whitespace, Part Three

Controlling Whitespace, Part Three

by Bob DuCharme
January 02, 2002

In the first and second parts of this three-part series, we looked at techniques for stripping, processing, and adding whitespace when creating a result document from a source document. This month we'll see how to add tab characters to a result document, and how to automate the indenting of a result document according to the nesting of its elements.

Adding Tabs to your Output

A stylesheet can add tabs to output using the character reference "	". For example, let's say we want to convert this source document into a text file that uses tabs to line up the columns of information.

<employees>



<employeehireDate="04/23/1999">

<last>Hill</last>

<first>Phil</first>

<salary>100000</salary>

</employee>



<employeehireDate="09/01/1998">

<last>Herbert</last>

<first>Johnny</first>

<salary>95000</salary>

</employee>



<employeehireDate="08/20/2000">

<last>Hill</last>

<first>Graham</first>

<salary>89000</salary>

</employee>



</employees>

Ample use of this character reference in this stylesheet


<xsl:stylesheetxmlns:xsl="http://www.w3.org/1999/XSL/Transform"

version="1.0">

<xsl:outputmethod="text"/>



<xsl:strip-spaceelements="*"/>



<xsl:templatematch="employees">

Last&#9;First&#9;Salary&#9;HireDate

----&#9;-----&#9;------&#9;----------

<xsl:apply-templates/>

</xsl:template>



<xsl:templatematch="employee">

<xsl:apply-templatesselect="last"/>

<xsl:text>&#9;</xsl:text>

<xsl:apply-templatesselect="first"/>

<xsl:text>&#9;</xsl:text>

<xsl:apply-templatesselect="salary"/>

<xsl:text>&#9;</xsl:text>

<xsl:apply-templatesselect="@hireDate"/><xsl:text>

</xsl:text>

</xsl:template>



</xsl:stylesheet>



produces this result from that source document:


LastFirstSalaryHireDate

-------------------------

HillPhil10000004/23/1999

HerbertJohnny9500009/01/1998

HillGraham8900008/20/2000

Related Reading

XSLT

XSLT
By DougTidwell

Table of Contents
Index
Sample Chapter
Author's Article
Read Online--Safari

When the stylesheet's first template sees an employees element, it adds a two-line header to the result tree before applying the appropriate templates to the children of the employees element: one line consisting of the field names separated by "&#9;" character references and another line with several groups of hyphens, with each group separated by the same character reference.

The only possible child of the employees element is the employee element, and its template rule individually applies templates (in this case, the default XSLT template that outputs an element's text content) to its children with the "&#9;" character reference between each one. This character reference doesn't always have to be inside of an xsl:text instruction (note that it's not in the stylesheet's first template), but if it had been added without this element in the second template, the XSLT processor would have ignored it -- remember, like carriage returns and the spacebar space, tab characters are considered whitespace, and an XSLT processor ignores white spacecharacters between elements if they're the only characters there and not enclosed by an xsl:text instruction.

Tip Although stylesheets are easier to read when elements are indented to show their levels of nesting, when you're concerned with controlling it, extraneous whitespace in your stylesheet can cause alignment problems in your output. Thus, this section's examples are not always indented.

Defining a general entity for this "<xsl:text>&#9;</xsl:text>" string can make the stylesheet easier to read, especially if you call the entity "tab":

<!DOCTYPEstylesheet[

<!ENTITYtab"<xsl:text>&#9;</xsl:text>">

<!ENTITYcr"<xsl:text>

</xsl:text>">

]>

<xsl:stylesheetxmlns:xsl="http://www.w3.org/1999/XSL/Transform"

version="1.0">

<xsl:outputmethod="text"/>



<xsl:templatematch="employees">

Last&tab;First&tab;Salary&tab;HireDate

----&tab;-----&tab;------&tab;----------

<xsl:apply-templates/>

</xsl:template>



<xsl:templatematch="employee">

<xsl:apply-templatesselect="last"/>&tab;

<xsl:apply-templatesselect="first"/>&tab;

<xsl:apply-templatesselect="salary"/>&tab;

<xsl:apply-templatesselect="@hireDate"/>&cr;

</xsl:template>



</xsl:stylesheet>

This stylesheet has the same effect as the previous one, but it's easier to read. As long as I was defining a "tab" entity, I defined a "cr" one as well for "carriage return," which also makes the stylesheet easier to read.

See the earlier column Entities and XSLT (or my book, XSLT Quickly) for more on defining and referencing entities in XSLT and XML.

Indenting

Setting the xsl:output element's indent attribute to a value of "yes" tells the XSLT processor that it may add additional whitespace to the result tree. The default value is "no".

Warning An indent value of "yes" means that an XSLT processor may add whitespace to the result. It doesn't have to, so if setting this doesn't have the desired effect when you use it, try it with a different XSLT processor. Or check the processor's documentation -- the Xalan C++ XSLT processor, for example, indents elements zero spaces as a default, but this figure can be reset with the -INDENT command line parameter.

The following stylesheet is the identity stylesheet with the xsl:output element's indent value set to "yes". In other words, this stylesheet copies all the nodes of the source tree document to the result tree without changing any, except that the XSLT processor may add more whitespace.

<xsl:stylesheetxmlns:xsl="http://www.w3.org/1999/XSL/Transform"

version="1.0">



<xsl:outputmethod="xml"omit-xml-declaration="yes"indent="yes"/>



<xsl:templatematch="@*|node()">

<xsl:copy>

<xsl:apply-templatesselect="@*|node()"/>

</xsl:copy>

</xsl:template>





</xsl:stylesheet>

With an XSLT processor that does add whitespace, this stylesheet turns this source document


<chapter><title>MyChapter</title>

<para>Thisparagraphintroducesthechapter'ssections.</para>

<sect1><title>Section1of"MyChapter"</title>

<para>Hereisthefirstsection'sfirstparagraph.</para>

<para>Hereisthefirstsection'ssecondparagraph.</para>

</sect1>

<sect1><title>Section2of"MyChapter"</title>

<para>Hereisthefirstsection'sfirstparagraph.</para>

<sect2><title>Section2.2</title>

<para>Thissectionhasasubsection.</para>

</sect2>

</sect1>

</chapter>

into this:


<chapter>

<title>MyChapter</title>

<para>Thisparagraphintroducesthechapter'ssections.</para>

<sect1>

<title>Section1of"MyChapter"</title>

<para>Hereisthefirstsection'sfirstparagraph.</para>

<para>Hereisthefirstsection'ssecondparagraph.</para>

</sect1>

<sect1>

<title>Section2of"MyChapter"</title>

<para>Hereisthefirstsection'sfirstparagraph.</para>

<sect2>

<title>Section2.2</title>

<para>Thissectionhasasubsection.</para>

</sect2>

</sect1>

</chapter>
Comment on this article Have you mastered whitespace handling in XSLT? Share your experience in our forums. Post your comments

The added indenting makes the parent-child and sibling relationships of the elements much clearer, because a child element's tags are indented further than a parent element's tags and siblings are all indented to the same level. When someone gives you an XML document with no DTD or schema, and you need to figure out its structure, a pass through this little stylesheet is a great first step. I use this stylesheet at least several times a week, even when I'm not engaged in XSLT-related work.

Section 16.1 of the XSLT Recommendation warns us that it's "usually not safe" to set indent to "yes" with documents that have elements that mix character data with child elements. For example, the first color child of the colors element in the following document has the string "red:" as character data followed by three shade elements that are children of that color element. The second color element only has character data content (the string "yellow"), and the third one has a structure similar to the first one.

<colors>

<color>red:

<shade>fireengine</shade>

<shade>candyapple</shade>

<shade>brick</shade>

</color>

<color>yellow</color>

<color>blue:

<shade>navy</shade>

<shade>robin'segg</shade>

<shade>cerulean</shade>

</color>

</colors>

The same stylesheet indents the elements of this document, but not the first shade element in the first and third color elements.

<colors>

<color>red:

      <shade>fireengine</shade>

<shade>candyapple</shade>

<shade>brick</shade>

</color>

<color>yellow</color>

<color>blue:

      <shade>navy</shade>

<shade>robin'segg</shade>

<shade>cerulean</shade>

</color>

</colors>
    

Also in Transforming XML

Automating Stylesheet Creation

Appreciating Libxslt

Push, Pull, Next!

Seeking Equality

The Path of Control

It doesn't indent those two shade elements because that would add character data to the document. Adding whitespace between two elements (for example, between a </color> end-tag and a <color> start-tag in the example) doesn't affect a document's contents, but adding it within an element that has character data content adds text that an XML parser considers significant -- in other words, it changes the content of the document.

To summarize, an indent value of "yes" is useful if every element in your source document has either character data and no elements as content (like the shade elements above) or elements and no character data as content (like the colors element in the example); but it can lead to unpredictability if your source document has elements that mix child elements with character data like the color elements above. The spaces that indent the other shade elements are also inside of the "red" color element, but because this whitespace isn't being added to existing character data at those positions, the text nodes that they're in are pure whitespace, so the XML processor will ignore them. (It's a tricky concept; see the earlier column Controlling White Space with XSLT, Part 1 for more on this.)

Close    To Top
  • Prev Article-XML:
  • Next Article-XML:
  • Now: Tutorial for Web and Software Design > XML > Styles > XML Content
    Photoshop Tutorial
     

    Special Effect

      3D Effect
      Photoshop Articles
    Programming Tutorial
     

    C/C++ Tutorial

      Visual Basic
      C# Tutorial
    Database Tutorial
     

    MySQL Tutorial

      MS SQL Tutorial
      Oracle Tutorial
    Geek Tutorial
     

    Blogging Tutorial

      RSS Tutorial
      Podcasting Tutorial
    Graphic Design Tutorial
      Coreldraw Tutorial
      Illustrator Tutorial
      3D Tutorials
    Webmaster Articles
     

    Domain Service

      Web Hosting
      Site Promotion
    Java Tutorial/ Articles
     

    Java Servlets

      JavaEE Tutorial
     

    JavaBeans Tutorial

    XML Tutorial/ Articles
     

    XML Style

      AJAX Tutorial
      XML Mobile
    Flash Tutorial/ Articles
     

    Flash Video

      Action Script
      Flash Articles
    OS Tutorial/ Articles
      Linux Tutorial
      Symbian Tutorial
      MacOS Tutorial
    Personal Tech
      Hardware Tutorial
      Software Tutorial
      Online Auction