EXSLT for MSXML
by Dimitre Novatchev
August 06, 2003
Introduction
The EXSLT project specifies a set of
standard extension functions for XSLT that, when implemented by all
vendors of XSLT processors, will allow writing portable XSLT
applications. Until now most of the major XSLT processors had already some
form of support for EXSLT, with the notable exception of MSXML.
This article describes a third party implementation of EXSLT for MSXML4 by
Dimitre Novatchev. The author had no access to internal product interfaces
and had to overcome some serious difficulties, which until now had
prevented the development of any such third party implementation of EXSLT
for MSXML4.
How to extend an XSLT processor
There are different possible ways to implement extensions to an XSLT processor:
- Modify the implementation of the XSLT processor, recompile, rebuild, test and redeploy it.
- Implement one or more extension elements.
- Provide a library of inline extension functions.
- Provide external extension objects, whose methods will be referenced as extension functions.
The first and the second option require that either the source code of
the XSLT processor (and the right to modify and extend it) is available or
that documentation is provided explaining how to implement extension
elements. None of these is true in the case of MSXML.
Following the third option will require the MSXML programmers to
include some inline scripting code written in a language like
Javascript. This is not completely convenient and differs from the way
extension functions are implemented and used in other XSLT processors.
The decision to choose the last option reflects its advantages:
- A namespace prefix is associated with an extension object.
- The code of this object is not inlined in the transformation and in fact the XSLT programmer may not know anything about it.
- This is the way to reference and use extension functions in most XSLT processors.
What EXSLT modules to implement
There are eight different modules of functions specified by EXSLT:
- Common. Covers common, basic extension elements and functions.
- Dates and Times. Covers common, basic extension elements and functions.
- Dynamic. Covers extension elements and functions that deal with the dynamic evaluation of strings containing XPath expressions.
- Functions. Extension elements and functions that allow users to define their own functions for use in expressions and patterns in XSLT.
- Math. Covers extension elements and functions that provide facilities to do with maths.
- Regular Expressions. Covers extension elements and functions that provide facilities to do with regular expressions.
- Sets. Covers those extension elements and functions that provide facilities to do with set manipulation.
- Strings Covers extension elements and functions that provide facilities to do with string manipulation.
Dynamic and Functions require access to the internals
(code/data structures) of the XSLT processor -- something not achievable
in the case of MSXML. Math is too trivial and has already an
implementation in XSLT 1.0 and XSLT 2.0 (See The
FXSL Functional Programming Library for XSLT1 and The
FXSL Functional Programming Library for XSLT2 Functions,
Dates and Times, Regular expressions and part of
Strings will be covered by the standard XSLT 2.0 (see XQuery 1.0 and XPath 2.0
Functions and Operators.) There are also pure XSLT 1.0 libraries
covering dates and time (A date_time XSLT 1.0 template library available as part of
XSelerator)
From the standpoint of immediate usability in XSLT 1.0, the most useful
EXSLT function is common:node-set(). Other necessary and useful
functions, which cannot be implemented in XSLT 1.0, are those from the
Sets module.
So I decided to implement the following EXSLT functions:
- common:node-set()
And all functions from the Sets module:
- set:intersection()
- set:difference()
- set:distinct()
- set:leading()
- set:trailing()
- set:has-same-node()
The Big Problem
Have you ever wondered why for more than two years there has been no
attempt at a third-party implementation of EXSLT for MSXML? Try to produce
such and you'll know that there is a big obstacle, which no one until now
had been able to remove.
In the object model of MSXML there is one method for obtaining a
node set as result of evaluating an XPath expression. This method is
selectNodes() member of the IXMLDOMNode object
and defined as follows:
HRESULT selectNodes( BSTR expression, IXMLDOMNodeList ** resultList);
As can be seen, in the MSXML object model a node set is represented by
an IXMLDOMNodeList object. A node is represented by an
IXMLDOMNode object.
selectNodes() can be issued only against a "current node" -- some IXMLDOMNode object.
The problem is that there is no documented way to create an
IXMLDOMNodeList, except as returned by
selectNodes(). This means that one cannot perform even such
simple tasks as getting the union of two IXMLDOMNode
nodes.
0
How then can we get a subset of a node set using only the MSXML object
model? Impossible. But this is exactly what is needed in order to
implement any of the six functions in the EXSLT Sets module.
I must confess that it was exactly the challenge of the impossible task
that attracted me. Someone, who didn't know better, told me that
implementing the Sets module was impossible without asking Microsoft to
provide a more powerful interface, containing methods that can create an
IXMLDOMNodeList from any collection of
IXMLDOMNode objects. It was also implied that an XSLT
specialist was inferior in attacking, solving and even understanding this
problem.
What happened next is described below.
The Solution: Steal an IXMLDOMNodeList
As explained above, using the MSXML4 object model it is impossible to
create an IXMLDOMNodeList other than one returned by the
selectNodes() method and this has strong limitations making
impossible the implementation of any EXSLT Sets functions.
The solution is to try to create an IXMLDOMNodeList
outside of the MSXML object model. In XSLT there are no such
limitations. It is straightforward to obtain the result of evaluating any
XPath expression. So why not perform an XSLT transformation, which will
evaluate any Xpath expression we need and produce its result to us?
A nice idea, but there is a major flaw in it -- an XSLT transformation
always produces copies of the original nodes, not the nodes
themselves. This is probably the moment when anybody stopped in
desperation.
A transformation can evaluate any XPath expression internally and have
access to the resulting node set, but it cannot "pass it back", it can
only produce copies of the original nodes.
Can a transformation pass the result node set to any piece of code at
all? Yes, it can pass it to another template it calls or instantiates or
to an extension function.
This seems absolutely unusable in our case. We called the
transformation so how it can call us? Even if this were possible, we still
must make a return and will lose the valuable node set that the
transformation passed to us.
The answer is simple: we store it in a property of our extension
object. When the transformation returns to our code that started it, the
node set will still be the value of this property.
This manipulation is reflected in the following picture:

Figure 1: How to create a desired, new IXMLDOMNodeList
Initial algorithms
Having solved the big problem, it's time for the complete
implementation. The XSLT solutions in the "Aux. Transform" box can be
really simple and compact. Thus for set:intersection() we can
have:
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:caller="urn:my-object">
<xsl:param name="ns1" select="/.."/>
<xsl:param name="ns2" select="/.."/>
<xsl:variable name="vCnt" select="count($ns2)"/>
<xsl:template match="intersect">
<xsl:value-of
select="caller:storeXPathResult($ns1[count(. | $ns2) = $vCnt])"/>
</xsl:template>
</xsl:stylesheet>
For set:distinct() we can have the following XSLT implementation:
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:caller="urn:my-object">
<xsl:param name="ns1" select="/.."/>
<xsl:template match="makeDistinct" name="makeDistinct">
<xsl:param name="pDistinct" select="/.."/>
<xsl:param name="pNodes" select="$ns1"/>
<xsl:choose>
<xsl:when test="$pNodes">
<xsl:variable name="pnewDistinct"
select="$pDistinct | $pNodes[1]"/>
<xsl:call-template name="makeDistinct">
<xsl:with-param name="pDistinct" select="$pnewDistinct"/>
<xsl:with-param name="pNodes"
select="$pNodes[position() > 1][not(. = $pnewDistinct)]"/>
</xsl:call-template>
</xsl:when>
<xsl:otherwise>
<xsl:value-of select="caller:storeXPathResult($pDistinct)"/>
</xsl:otherwise>
</xsl:choose>
</xsl:template>
</xsl:stylesheet>
Finally, for set:leading() we could have:
<xsl:stylesheet version="1.0"
xmlns:xsl = "http://www.w3.org/1999/XSL/Transform"
xmlns:caller="urn:my-object">
<xsl:param name="ns1" select="/.."/>
<xsl:param name="ns2" select="/.."/>
<xsl:template match="leading" name="leading">
<xsl:param name="pNodes" select="$ns1"/>
<xsl:param name="pANode" select="$ns2[1]"/>
<xsl:param name="pLeading" select="/.."/>
<xsl:choose>
<xsl:when test="not($pNodes) or not($pANode)">
<xsl:value-of select="caller:storeXPathResult($pNodes)"/>
</xsl:when>
<xsl:when test="count($pANode | $pNodes[1]) = 1
or count($pANode | $pNodes) != count($pNodes)">
<xsl:value-of select="caller:storeXPathResult($pLeading)"/>
</xsl:when>
<xsl:otherwise>
<xsl:variable name="pnewLeading"
select="$pLeading | $pNodes[1]"/>
<xsl:call-template name="leading">
<xsl:with-param name="pLeading"
select="$pnewLeading"/>
<xsl:with-param name="pNodes"
select="$pNodes[position() > 1]"/>
<xsl:with-param name="pANode" select="$pANode"/>
</xsl:call-template>
</xsl:otherwise>
</xsl:choose>
</xsl:template>
</xsl:stylesheet>
The XSLT implementation of the other functions of the Sets
module is similar -- set:difference() is coded in a similar
way to set:intersection(), set:trailing() is
similar to set:leading(), and
set:has-same-node() returns true if
set:intersection() is non-empty.