Modeling XML Vocabularies with UML: Part I
by Dave Carlson
August 22, 2001
A Russian translation of this article is available here.
The arrival of the W3C's XML Schema specification has evoked a
variety of responses from software developers, system integrators, XML
document analysts, authors, and designers of B2B vocabularies. Some
like the richer structure and semantics that can be expressed with
these new schemas as compared to DTDs, while others complain about
excessive complexity. Many find that the resulting schemas are
difficult to share with wider audiences of users and business
partners.
I look past many of these differences of opinion to view XML Schema
simply as implementation syntax for models of business
vocabularies. Other forms of model representation and presentation are
more effective than W3C XML Schema when specifying new vocabularies or
sharing definitions with users. In particular, I favor the Unified
Modeling Language (UML) as a widely adopted standard for system
specification and design. My goal in this article and in this series
is to share some thoughts about how these two standards are
complementary and to work through a simple example that makes the
ideas concrete.
Although this discussion is focused on the W3C XML Schema
specification, the same concepts are easily transferred to other XML
schema languages. Indeed, I have already applied the same techniques
to creating and reverse engineering DTDs and SOX schemas, as well as
RELAX, TREX, and RELAX NG. In general, I use the term "schema" when
referring to the family of XML schema languages.
The Role of Models in XML Applications
Also in this Series
Modeling XML Vocabularies with UML: Part Two
Modeling XML Vocabularies with UML: Part Three
It can be difficult to understand the breadth of a large
multi-enterprise system. Most people need to divide and conquer the
problem as a set of alternate models and views. Each of these models
deliberately ignores aspects of the system that are not relevant to
its purpose. Building these kinds of models is fundamental to the way
we cope with the complexity of everyday life by ignoring unnecessary
details to enable us to focus on the task at hand. Different
stakeholder groups have different needs with respect to abstraction
and focus.
In the context of B2B system integration, all business partners
must agree on the information models that define the vocabulary for
task-oriented communication. The models include both the data
structure for XML documents that are exchanged, as well as the process
models of the extended dialogs that are required to complete complex
business transactions.
Historically, in system analysis and design, a variety of
techniques, tools, and methodologies has existed for guiding and
supporting these alternative models of system structure and
behavior. In the absence of formal methods or tools, models are
created using PowerPoint, Visio, or paper and pencil to help
communicate a system's purpose and function. And when there are no
written models, system architects work from mental models as a way to
comprehend the whole and its parts. An XML schema is also a vocabulary
model written in the syntax of that specification language.
A high-level process for developing XML vocabularies is shown in
Figure 1 below. It includes three decision points that determine the
final vocabulary definition, regardless of which schema language is
used. Data-oriented versus text-oriented applications may have
different usage requirements. For example, a data-oriented vocabulary
can be optimized for serialization of objects or database query
results and its constraints should be carefully aligned with the
data-types and referential integrity constraints of its sources. These
data-oriented documents may never be viewed by humans, other than by
developers testing the application.
A text-oriented vocabulary often has human users who need to edit
the XML documents, with or without the assistance of GUI editing
tools. Its structure must be easily understood by people who write
stylesheets that transform and present the documents' content. An XML
vocabulary design that works perfectly for data interchange might
cause human users unnecessary pain and distress. Don't forget the
needs of your users when creating the XML schema!

Figure 1: UML activity diagram for schema development process
The process diagram in Figure 1 is a UML activity diagram, which is
one of nine diagram types defined by that standard. This diagram was
created using Rational Rose, one of the most widely used UML modeling
tools. Most of our discussion, however, is focused on the UML class
diagram that is used to specify the static information structure of a
system's XML vocabulary in our application context.
What is UML?
Got questions about UML and XML, or any experiences using them together?
Post your comments
The Unified Modeling Language (UML) defines a standard language and
graphical notation for creating models of business and technical
systems. Contrary to popular opinion, UML is not limited to use as a
tool for programmers. The UML defines model types that span a range
from functional requirements and activity workflow models to class
structure design and component diagrams. These models, and a
development process that uses them, improve and simplify communication
among an application's many diverse stakeholders.
A UML class diagram can be constructed to represent the elements,
relationships, and constraints of an XML vocabulary visually. With a
little initial coaching, class diagrams allow complex vocabularies to
be shared with non-technical business stakeholders. A very simple
subset of a product catalog vocabulary is shown as a class diagram in
Figure 2 [1].

Figure 2: A
simple UML class diagram
The primary elements of a UML class diagram are as follows.
- Class
-- this example defines two classes: CatalogItem and
Organization. A class represents an aggregation of structural features
and defines a namespace for those feature names. Thus, both classes
can contain an attribute named "name" but their class namespace scope
makes the two attributes distinct.
- Attribute
-- each class may optionally define a set of
attributes. Each attribute has a type; in this example string, double,
and float refer to the built-in datatypes as defined by the XML Schema
specification. For those of you thinking ahead to XML schema design,
specifying a UML attribute does not limit the schema to an XML
attribute; the mapping to schema syntax allows either an XML attribute
or child element.
- Operation
-- the computeTax() operation of CatalogItem
specifies part of the behavior for this class. In other words, what
does the class do, in addition to defining the structure of its data?
In object-oriented parlance, if you send a computeTax message to a
CatalogItem object, it will return a floating-point data value. This
operation does not expect any parameters, but they could be specified
between the parentheses. We will not use class operations in the
specification of XML vocabulary, but their definition would be
critical to Web Services, especially a WSDL specification of SOAP
messages.
- Association
-- an association relates two or more classes
in a model. If an association has an arrow on one end, it means that
the association is usually navigated in one direction and provides a
hint to design and implementation of this vocabulary.
- Role & Multiplicity
-- the end of an association may
specify the role of the class; the Organization plays a supplier role
for a CatalogItem in this model. In addition, the "1..*"
multiplicity means that there must be one or more suppliers for each
catalog item.
- Generalization
-- although Figure 2 does not include class
inheritance, this structure is fundamental to object-oriented models
and is included in the next expanded example.
[1] [2] Next