
Controlling Whitespace, Part Three
by Bob DuCharme
January 02, 2002
In the first
and
second parts of this three-part series, we looked at techniques
for stripping, processing, and adding whitespace when creating a
result document from a source document. This month we'll see how to
add tab characters to a result document, and how to automate the
indenting of a result document according to the nesting of its
elements.
Adding Tabs to your Output
A stylesheet can add tabs to output using the character reference
"	". For example, let's say we want to convert this source
document into a text file that uses tabs to line up the columns of
information.
<employees>
<employeehireDate="04/23/1999">
<last>Hill</last>
<first>Phil</first>
<salary>100000</salary>
</employee>
<employeehireDate="09/01/1998">
<last>Herbert</last>
<first>Johnny</first>
<salary>95000</salary>
</employee>
<employeehireDate="08/20/2000">
<last>Hill</last>
<first>Graham</first>
<salary>89000</salary>
</employee>
</employees>
Ample use of this character reference in this stylesheet
<xsl:stylesheetxmlns:xsl="http://www.w3.org/1999/XSL/Transform"
version="1.0">
<xsl:outputmethod="text"/>
<xsl:strip-spaceelements="*"/>
<xsl:templatematch="employees">
Last	First	Salary	HireDate
----	-----	------	----------
<xsl:apply-templates/>
</xsl:template>
<xsl:templatematch="employee">
<xsl:apply-templatesselect="last"/>
<xsl:text>	</xsl:text>
<xsl:apply-templatesselect="first"/>
<xsl:text>	</xsl:text>
<xsl:apply-templatesselect="salary"/>
<xsl:text>	</xsl:text>
<xsl:apply-templatesselect="@hireDate"/><xsl:text>
</xsl:text>
</xsl:template>
</xsl:stylesheet>
produces this result from that source document:
LastFirstSalaryHireDate
-------------------------
HillPhil10000004/23/1999
HerbertJohnny9500009/01/1998
HillGraham8900008/20/2000
Related Reading
XSLT
By DougTidwell
Table of Contents
Index
Sample Chapter
Author's Article
Read Online--Safari
When the stylesheet's first template sees an employees
element, it adds a two-line header to the result tree before applying
the appropriate templates to the children of the employees
element: one line consisting of the field names separated by
"	" character references and another line with several groups
of hyphens, with each group separated by the same character
reference.
The only possible child of the employees element is the
employee element, and its template rule individually applies
templates (in this case, the default XSLT template that outputs an
element's text content) to its children with the "	" character
reference between each one. This character reference doesn't always
have to be inside of an xsl:text instruction (note that it's
not in the stylesheet's first template), but if it had been added
without this element in the second template, the XSLT processor would
have ignored it -- remember, like carriage returns and the spacebar
space, tab characters are considered whitespace, and an XSLT processor
ignores white spacecharacters between elements if they're the only
characters there and not enclosed by an xsl:text
instruction.
Tip Although
stylesheets are easier to read when elements are indented to show
their levels of nesting, when you're concerned with controlling it,
extraneous whitespace in your stylesheet can cause alignment problems
in your output. Thus, this section's examples are not always
indented.
Defining a general entity for this
"<xsl:text>	</xsl:text>" string can make the
stylesheet easier to read, especially if you call the entity
"tab":
<!DOCTYPEstylesheet[
<!ENTITYtab"<xsl:text>	</xsl:text>">
<!ENTITYcr"<xsl:text>
</xsl:text>">
]>
<xsl:stylesheetxmlns:xsl="http://www.w3.org/1999/XSL/Transform"
version="1.0">
<xsl:outputmethod="text"/>
<xsl:templatematch="employees">
Last&tab;First&tab;Salary&tab;HireDate
----&tab;-----&tab;------&tab;----------
<xsl:apply-templates/>
</xsl:template>
<xsl:templatematch="employee">
<xsl:apply-templatesselect="last"/>&tab;
<xsl:apply-templatesselect="first"/>&tab;
<xsl:apply-templatesselect="salary"/>&tab;
<xsl:apply-templatesselect="@hireDate"/>&cr;
</xsl:template>
</xsl:stylesheet>
This stylesheet has the same effect as the previous one, but it's
easier to read. As long as I was defining a "tab" entity, I defined a
"cr" one as well for "carriage return," which also makes the
stylesheet easier to read.
See the earlier column Entities and
XSLT (or my book, XSLT
Quickly) for more on defining and referencing entities in XSLT and
XML.
Indenting
Setting the xsl:output element's indent attribute
to a value of "yes" tells the XSLT processor that it may add
additional whitespace to the result tree. The default value is
"no".
Warning An
indent value of "yes" means that an XSLT processor may
add whitespace to the result. It doesn't have to, so if setting this
doesn't have the desired effect when you use it, try it with a
different XSLT processor. Or check the processor's documentation --
the Xalan C++ XSLT processor, for example, indents elements zero
spaces as a default, but this figure can be reset with the
-INDENT command line parameter.
The following stylesheet is the identity stylesheet with the
xsl:output element's indent value set to "yes". In
other words, this stylesheet copies all the nodes of the source tree
document to the result tree without changing any, except that the XSLT
processor may add more whitespace.
<xsl:stylesheetxmlns:xsl="http://www.w3.org/1999/XSL/Transform"
version="1.0">
<xsl:outputmethod="xml"omit-xml-declaration="yes"indent="yes"/>
<xsl:templatematch="@*|node()">
<xsl:copy>
<xsl:apply-templatesselect="@*|node()"/>
</xsl:copy>
</xsl:template>
</xsl:stylesheet>
With an XSLT processor that does add whitespace, this stylesheet
turns this source document
<chapter><title>MyChapter</title>
<para>Thisparagraphintroducesthechapter'ssections.</para>
<sect1><title>Section1of"MyChapter"</title>
<para>Hereisthefirstsection'sfirstparagraph.</para>
<para>Hereisthefirstsection'ssecondparagraph.</para>
</sect1>
<sect1><title>Section2of"MyChapter"</title>
<para>Hereisthefirstsection'sfirstparagraph.</para>
<sect2><title>Section2.2</title>
<para>Thissectionhasasubsection.</para>
</sect2>
</sect1>
</chapter>
into this:
<chapter>
<title>MyChapter</title>
<para>Thisparagraphintroducesthechapter'ssections.</para>
<sect1>
<title>Section1of"MyChapter"</title>
<para>Hereisthefirstsection'sfirstparagraph.</para>
<para>Hereisthefirstsection'ssecondparagraph.</para>
</sect1>
<sect1>
<title>Section2of"MyChapter"</title>
<para>Hereisthefirstsection'sfirstparagraph.</para>
<sect2>
<title>Section2.2</title>
<para>Thissectionhasasubsection.</para>
</sect2>
</sect1>
</chapter>
Have you mastered whitespace handling in XSLT? Share your experience in our forums.
Post your comments
The added indenting makes the parent-child and sibling
relationships of the elements much clearer, because a child element's
tags are indented further than a parent element's tags and siblings
are all indented to the same level. When someone gives you an XML
document with no DTD or schema, and you need to figure out its
structure, a pass through this little stylesheet is a great first
step. I use this stylesheet at least several times a week, even when
I'm not engaged in XSLT-related work.
Section 16.1 of the XSLT Recommendation warns us that it's
"usually not safe" to set indent to "yes" with documents that
have elements that mix character data with child elements. For
example, the first color child of the colors element
in the following document has the string "red:" as character data
followed by three shade elements that are children of that
color element. The second color element only has
character data content (the string "yellow"), and the third one has a
structure similar to the first one.
<colors>
<color>red:
<shade>fireengine</shade>
<shade>candyapple</shade>
<shade>brick</shade>
</color>
<color>yellow</color>
<color>blue:
<shade>navy</shade>
<shade>robin'segg</shade>
<shade>cerulean</shade>
</color>
</colors>
The same stylesheet indents the elements of this document, but not
the first shade element in the first and third color
elements.
<colors>
<color>red:
<shade>fireengine</shade>
<shade>candyapple</shade>
<shade>brick</shade>
</color>
<color>yellow</color>
<color>blue:
<shade>navy</shade>
<shade>robin'segg</shade>
<shade>cerulean</shade>
</color>
</colors>
Also in Transforming XML
Automating Stylesheet Creation
Appreciating Libxslt
Push, Pull, Next!
Seeking Equality
The Path of Control
It doesn't indent those two shade elements because that
would add character data to the document. Adding whitespace
between two elements (for example, between a
</color> end-tag and a <color> start-tag
in the example) doesn't affect a document's contents, but adding it
within an element that has character data content adds text
that an XML parser considers significant -- in other words, it changes
the content of the document.
To summarize, an indent value of "yes" is useful if every
element in your source document has either character data and no
elements as content (like the shade elements above) or
elements and no character data as content (like the colors
element in the example); but it can lead to unpredictability if your
source document has elements that mix child elements with character
data like the color elements above. The spaces that indent
the other shade elements are also inside of the "red"
color element, but because this whitespace isn't being added
to existing character data at those positions, the text nodes that
they're in are pure whitespace, so the XML processor will ignore
them. (It's a tricky concept; see the earlier column Controlling White Space with XSLT, Part 1 for more on
this.)