XML Canonicalization, Part 2
by Bilal Siddiqui
|
Excluding the Ancestor Context
We have seen that the ancestor context is included while canonicalizing
XML document subsets. However, doing this may introduce problems under
certain circumstances. In order to elaborate the scenarios in which
including the ancestor context creates problems, we first need to discuss
Enveloping, a concept that is of paramount importance in web
services interoperability.
Enveloping
SOAP is fast becoming the de-facto standard for XML messaging over the
Internet. SOAP defines the format to wrap XML data inside envelopes.
Look at Listing 7,
which wraps the PackageBooking element inside the
SOAP:Body element. Listing 7 demonstrates a
simple enveloping mechanism, in which the message payload (i.e. the
message that needs to be sent across the Internet) is wrapped inside a
SOAP:Body element and the entire SOAP:Body is
wrapped by the SOAP:Envelope element.
The advantage of this simple enveloping lies in its ability to enable
vertical stacking of XML-based protocols. Vertical stacking means that
protocols and message formats can be defined for specific low-level tasks
(such as signing, encrypting, routing etc.) and higher protocol layers
will use the services provided by lower layers. For example, WS-Security,
a high-level XML security protocol being developed by an OASIS Technical
Committee, uses the SOAP format to utilize the signing and encrypting
mechanisms provided by W3C's XML Digital Signature and XML Encryption
specifications respectively.
Listing 7 also
contains a SOAP:Header element in addition to the
SOAP:Body element. The SOAP:Header element is
optional and is meant to contain protocol-specific information. This
effectively means that the message payloads are contained inside
SOAP:Body elements and protocol headers are contained in the
SOAP:Header element. For instance, WS-Security uses the SOAP
Header to wrap signature related information.
Envelope Handling
The application which receives a SOAP message is likely to tear the
envelope (wrapper) and extract the XML payload (the XML message) in order
to be able to process the message received. This tearing of a SOAP
envelope and extracting the XML payload is referred to as
de-enveloping. Further, the receiving application might need to
re-envelope the XML message received in a new envelope.
The need for re-enveloping emerges in federated web services, which
rely on partner applications to do part of a job, thus integrating, for
example, an entire supply chain into interoperable and loosely coupled
systems.
As an example of federated applications, let's consider a tourism
industry B2B scenario. A tourist wants to know the details of a vacation
tour being offered by a tour operator's web service. The tourist sends an
XML message containing information about the places he would like to visit
and the dates on which he is planning to travel.
Naturally the tourist's XML message will be authored by some
client-side XML- and SOAP-aware application, which will author and wrap
all information inside a SOAP envelope without requiring the tourist to
know anything about XML and SOAP.
Upon receipt of this SOAP message, the tour operator's web service will
extract the information related to time and place of travel from the SOAP
envelope. The tour operator's service will need to send pieces of the
travel information to different partner hotels and car rental
companies. Therefore, the tour operator's service will author fresh SOAP
envelopes containing the relevant pieces of information and forward them
to partner hotels and car rental companies.
In a similar fashion, upon receipt of the response from partner hotels
and car rental companies, the tour operator's service will re-envelope the
information received before sending the fresh envelope back to the
tourist.
Exclusive XML Canonicalization
With the above discussion in mind, have a look at Listing 10, which is a SOAP
response message that a fictitious partner hotel has just sent back to the
tour operator's web service.
The tour operator would have also received SOAP response messages from
other partner hotels and car rental companies. These messages need to be
combined to form a complete packaged vacation tour.
You may now have a look again at Listing 7, which is actually
a packaged vacation tour that the tour operator's web service will
ultimately send back to the tourist. The first booking
element of Listing 7
(whose unitCharge attribute shows a value of "50" and which
we canonicalized in Listing
9) is the same as the booking element of Listing 10.
In order to demonstrate the role of canonicalization in federated web
service applications, let's assume that the partner hotel wanted to sign
the booking element of Listing 10 while sending the
SOAP message to the tour operator, thus allowing the tourist to verify
that the booking is not fake.
Listing 11 shows the
canonical form of the booking element of Listing 10. This is the canonical
form that the partner hotel will use to create a message digest. On the
other hand, recall that Listing 9 was the canonical
form of the same booking element, when it was part of Listing 7. Therefore, the
tourist will use Listing 9
to verify the message digest of the partner hotel.
Compare Listing 11
with Listing 9 and you
will find that they are different from each other. The difference comes
from the fact that we conserved the ancestor contexts from two different
XML documents while canonicalizing the same booking element.
Therefore, message digest and signature verification will fail at the
tourist's client application end. This clearly establishes the need to
exclude ancestor context while employing canonicalization
concepts in federated web service applications. W3C has released the
Exclusive XML Canonicalization recommendation for this purpose.
Exclusive canonicalization applies only while canonicalizing fragments
of XML files and differs from (inclusive) canonical XML in
the following two ways:
- Attributes from the xml namespace are not imported from ancestors
into orphan nodes.
- Omitted namespace declarations are included in
exclusive canonical form to an element only if:
- The namespace declaration is used by the element or any or its child
attributes.
- The namespace declaration is not already in effect in
the exclusive canonical form.
Note that the second point above also applies to empty default
namespace declarations (xmlns=""). This means that the
exclusive canonical form of an element will include the
xmlns="" declaration if the elements belongs to the default
empty namespace, and the nearest ancestor occurrence of the default
namespace declaration in the exclusive canonical form has some non-empty
default namespace (xmlns="http://someURI...").
Applying these rules to the booking element of Listings 10, the exclusive
canonical form comes to be as shown in Listing 12. You may notice
that the exclusive canonical form of the first booking element of Listing 7 (whose
unitCharge attribute has the value "50") is also exactly the
same as Listing 12.
Problematic Scenarios
I should point at two important problems that may result by applying
the Canonical XML specification:
-
If the XML file being canonicalized contains external parsed entity
references, the external entity references will be replaced with external
content during canonicalization as already discussed above under the
heading "External Entity References." However, if the external content
contains some relative URIs, the URIs may become non-operational after the
replacement (since the DTD declaration will be removed during
canonicalization and there will be no way to reach the external content
after XML canonicalization).
Non-operational URIs may create problems in signature applications, as
there is no way to detect whether the original operational XML document or
its non-operational canonical form was intended to be signed. If an
application thinks that the purpose of signature applications may be
defeated by this ambiguity, such scenarios should be resolved prior to
canonicalization (e.g. relative URIs be converted to absolute URIs before
starting to canonicalize).
-
There may be some application specific equivalence criteria that
cannot be covered in a generalized specification. For example, an XML file
carrying an invoice with all prices in French francs will not produce the
same canonical form as that of the same invoice with equivalent prices in
Euros (although the two invoices will be logically equivalent). Therefore,
such application specific issues need to be resolved in an application
specific manner.
Prev [1] [2]