
Controlling the DOCTYPE and XML Declaration
by Bob DuCharme
September 04, 2002
XSLT processors usually create result documents that are well-formed
XML with a simple XML declaration at the top. They don't have to add that
XML declaration, though; it's easy to suppress it. It's also easy to add
one and control exactly what it shows, such as an encoding declaration or
a declaration of the version of XML being used. Your result document can
also include a document type declaration that specifies the DTD to which
it conforms, which is necessary for your result document to be a valid XML
document. This month we'll see how to add these.
XML Declarations
The XML declaration at the beginning of an XML document is not
necessary, but it's the best way to say "this is definitely an XML
document and here's the release of XML it conforms to." The following is
typical:
<?xml version="1.0"?>
Note Despite its
beginning and ending question mark, an XML declaration is not a
processing instruction; it's a separate kind of markup declaration. In
fact, the XML specification explicitly prohibits the processing
instruction target (the name right after a processing instruction's
opening question mark) from being "xml" in any case in order to prevent a
processing instruction from being confused with an XML
declaration.
An XSLT processor's default behavior is to add an XML declaration to
the beginning of an XML document that it creates in the result tree. If
your stylesheet includes an xsl:output instruction with a
method value of "text" or "html" the XSLT processor doesn't
consider the result tree's document to be XML, so it won't add an XML
declaration. If method is "xml" or the stylesheet has no
xsl:output element (in which case the default value of "xml" is
assumed), the result is considered an XML document. To show the simplest
case, we'll apply the simplest possible stylesheet
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
version="1.0"/>
to this little document:
<test>Dagon his Name, Sea Monster</test>
The result, thanks to XSLT's built-in template rules, shows the
element's character data with the XML declaration preceding it:
<?xml version="1.0" encoding="utf-8" ?>Dagon his Name, Sea Monster
Although an XML declaration is optional, when it is included it must
have the version information. (As I write this, XML 1.1 is in Last Call
status, so we'll have to start worrying about whether XML processors are
aware of 1.1's new features soon.) In the example above, after the version
information, the XML declaration includes an encoding declaration to tell
us how the characters in the document are encoded. While the XML
specification considers an encoding declaration to be optional if the
document is encoded as UTF-8 or UTF-16, the XSLT specification says that
XSLT processors must add one to the result document with a value of
"utf-8" or "utf-16" if no other encoding value is specified.
You can specify one yourself or change the version value by
adding encoding and version attributes to an
xsl:output element in your stylesheet. The encoding
attribute actually does more than add an encoding declaration to the
result document; it tells the XSLT processor to write out the result using
that encoding. If you specify an encoding that it can't handle, the
processor will let you know.
The following stylesheet adds an encoding declaration and version
information to the result document.
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
version="1.0">
<xsl:output method="xml" version="1.1" encoding="utf-16"/>
</xsl:stylesheet>
This produces the following using the same input as the previous
example (although it may not look right in text editors that can't handle
UTF-16):
<?xml version="1.1" encoding="utf-16" ?>Dagon his Name, Sea Monster
That's just a toy example. The following slightly longer program is
actually useful. It copies an XML document without changing anything,
except that it writes out the result as a UTF-16 document:
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
version="1.0">
<xsl:output encoding="utf-16"/>
<xsl:template match="@*|node()">
<xsl:copy>
<xsl:apply-templates select="@*|node()"/>
</xsl:copy>
</xsl:template>
</xsl:stylesheet>
By changing the value of its encoding attribute, you can
create a general-purpose stylesheet to copy an XML document with the copy
being in any encoding that you want, as long as your XSLT processor
supports that encoding.
What if you don't want an XML declaration in the result of your
transformation? For example, I rarely show them in the result of my
examples because I want the examples to be as concise as possible. I
suppress them by adding an omit-xml-declaration attribute to most
of the sample stylesheets' xsl:output elements, like this:
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
version="1.0">
<xsl:output method="xml" omit-xml-declaration="yes"/>
</xsl:stylesheet>
The output of this stylesheet applied to the earlier XML document is
identical to the output created with the earlier stylesheet, minus the XML
declaration:
Dagon his Name, Sea Monster
[1] [2] Next