About the Series ...
This is the tenth tutorial article of the series, MDX
in Analysis Services. The series is designed to provide hands-on
application of the fundamentals of MDX from the perspective of MS SQL Server
2000 Analysis Services ("Analysis Services,");
our primary focus is the manipulation of multidimensional data sources, using
MDX expressions in a variety of scenarios designed to meet real-world business
intelligence needs.
For more
information on the series, as well as the hardware / software requirements to
prepare for the tutorials we will undertake, please see the first lesson of
this series: MDX
Concepts and Navigation.
Note: At the time of writing, Service
Pack 3 updates
are assumed for MSSQL Server 2000, MSSQL Server 2000 Analysis
Services, and the related Books Online and Samples.
The screen shots that appear in this article were taken from a Windows 2003
Server, and may appear somewhat different from coinciding views in other
operating systems.
Introduction
With
this lesson, we begin a three-part mini-series, Optimizing MDX. In
these articles, we will explore methods for optimizing the performance we
obtain from our MDX queries. Throughout the three tutorials, we will discuss
the main types of intervention through which we can
enhance performance of MDX, and we will explore examples of the various types
in turn, in a hands-on manner, to provide a practical level of awareness that
we can apply in our daily work with MDX.
In this
lesson, we will:
-
Expose
general types of intervention we can use to optimize MDX queries;
-
Discuss
how Analysis Services processes queries, as a preparation for our focus on
optimization techniques;
-
Examine
the Large Level Threshold property, and its pervasive importance;
-
Begin
an exploration of optimization through the control of location of query
execution (to be concluded in our next lesson).
The Types of Intervention
As most of us know, from
having worked with MDX in the preceding articles of this series and elsewhere,
MDX syntax allows us to define and manipulate multidimensional objects and
data. As with an SQL query, we have found that an MDX query requires a data
request (the SELECT clause), a point from which to begin (the FROM
clause), and a filter (the WHERE clause). These and other keywords,
which may or may not actually be visible in our queries, (many assume the
values of defaults when left unspecified), provide the tools used to extract
specific portions of data from a cube for analysis. MDX also supplies a robust
set of functions for the manipulation of data, data definition language (DDL)
syntax for managing data structures, and support for extending itself with
user-defined functions. MDX commands abound for creating (and deleting) cubes,
dimensions, measures, and their subordinate objects.
Many approaches exist
in the rich MDX toolset to help us accomplish our business objectives; and, as
we might reasonably expect, some approaches are more efficient in operation than
others are, predominantly in terms of memory and CPU utilization. In this
article, we will discuss some of the types of intervention that
we might perform to achieve optimization of a query, discussing the points at
which adjustments and / or enhancements might give us the performance boosts we
desire. While we will touch upon enhancements that can be made outside
our queries to contribute to performance, our focus in this "triptych"
of articles will be largely upon items that can be considered within the
queries themselves to contribute to overall efficiency.
The types of
intervention that we might investigate as options for the optimization
of the MDX queries we construct consist of the following general groups:
-
Control
of the location of processing;
-
Optimization
of set operations and syntax arrangement;
-
Optimal
uses of caching;
-
Substitution
of external functions;
-
Leveraging
database structure.
We will examine each of
these types of intervention in the following tutorials, and look at examples of
actions we might take within the context of most, to illustrate possible
performance enhancements we might bring about.
Understanding How Queries are Processed
Before we delve into our examination of
MDX query optimization techniques, a general understanding of how Analysis
Services processes our queries will be useful. It is from the perspective of
each of the process points that we can seek to optimize a given query. From a
high level, these steps occur in the following general order:
-
An MDX statement or query is
issued by a client application.
-
The statement or query is
parsed by the PivotTable Service, which rearranges it into an expression tree
to facilitate ease of use in subsequent steps.
-
The metadata identification is
resolved, and the determination made to process the query on the server or on
the client.
-
Final preparations for
processing occur, including possible further optimization procedures.
-
Rows and columns are identified
(the axes are resolved) and, in some cases (isolation mode, predominantly),
cell values are pre-fetched and cached.
-
The combined results are
returned to the application from which the query originated.
Integral to our lesson will be the location
at which the calculations are performed. The location at which the
operations occur is highly relevant to performance optimization, as most of us
can readily see. The location under consideration can be either the Analysis
Server or the client PivotTable service.
The Location of Query Processing
Control of the location
of processing, as we shall see, can be managed from the Large Level
Threshold property setting. While the large level value is a critical
influencer of which location is selected for query processing, we can also
control the location using other means, the chief of which is the Execution
Location parameter. We will examine these settings in more depth as we
proceed, but before we undertake to control the location of processing, we
should understand the benefits and costs associated with the client / server
processing options from which we can select. Some of the more important of
these are summarized in Table 1.
|
|
|
Client vs. Server Processing
|
|
|
|
|
|
|
Costs and Benefits
|
|
|
|
|
|
|
|
|
|
|
|
Processing
|
|
|
|
|
|
|
Location
|
|
Disadvantages
|
|
Advantages
|
|
|
|
|
|
|
|
|
|
Server
|
|
Consumes
more CPU and memory resources
|
|
Results
can be cached on the server, so that requests for the same result sets by
other client applications are quickly available
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Client
|
|
Takes
longer to process
|
|
Placing
processing burden on the client relieves the server of virtually all resource
requirements
|
|
|
|
|
|
|
|
|
|
|
|
Potentially
much greater network traffic, as large amounts of data are being dispatched
from the server.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Client-based
processing means client-based caching. This results in a loss of rapid
fulfillment of recurring requests, by multiple clients, for the same
information
|
|
|
|
Table 1: Advantages and Disadvantages of Processing
Location Options
Let's take a look at
the first of these two prominent settings, Large Level Threshold property,
with which we will begin a hands-on examination of managing processing
location.
The Large Level Threshold Property
The Large Level
Threshold property fixes the point at which a dimension level is too large
to be sent to the client application as a single body. The property, by
default, is set at the server at 1,000. We can use the property to help us to
manage both CPU and memory utilization from the client perspective. Because
the property stipulates the point at which level is "too large for the
client," it also determines the level at which the number of members from
a given level force server-based execution of the PivotTable Service.
The property, therefore, is highly useful to us in determining query execution
location, which, as we are learning, is a significant point of intervention
in the optimization of our MDX queries.
If the number of
members in a level is below the threshold we set in this property, it is
sent to the client application in a single piece. The level is broken into
smaller groups, which are sent to the client application as needed, if the
number of members in the level under consideration is equal to or more than the
threshold setting. Should the level contain a number of members smaller than
this threshold value, the entire level will be sent to the client
application. If the level under consideration contains a number of members that
is greater than or equal to the value of the Large Level Threshold property,
the level is incrementally sent from the server to the client application.
While the default value
for the Large Level Threshold property is 1,000, the minimum value for
this property is 10. (If the value is set to less than 10, the value will reset
itself at 10, automatically preventing any errors in processing). The Large
Level Threshold property can be modified at the server, or can be
specified by the client upon establishment of a connection to the server. Let's
explore the process for setting the parameter in the former case, with the
following steps.
1.
Start Analysis Manager (Start
-> Programs -> Microsoft SQL
Server -> Analysis Services ->
Analysis Manager).
2.
Expand the Analysis Servers
folder by clicking the "+" sign to its left.
3.
Right-click
the Server (MOTHER1 in my case, as shown in Illustration 1),
typically named after the machine on which it resides.
4.
Select Properties
from the context menu that appears, as shown in Illustration 1.
Illustration 1: Select the Properties Option from
the Context Menu
The Properties
dialog appears.
5.
Select the Environment
tab.
6.
Type 750
in the Large Level Defined As box, as shown in Illustration 2.
Illustration 2: Modification: Large Level Threshold
Defined as 750 Members
We have now
set the number of members a dimension level must contain to be processed as a
large level. The range here can be from 1 - 10,000 (default is 1,000, as we
mentioned earlier), and it is meaningful in the context (the number of level
members) with which Analysis Services classifies dimension levels into large
and small categories only. Keep in mind, as we have already discovered, that
large levels are not sent to the client unless they are specifically
requested, but small levels are sent to the client even if the entire level
is not requested.
We are
greeted with a dialog that announces we must restart the Analysis server for
changes to take effect, as shown in Illustration 3.
Illustration 3: A Restart of Analysis Services is
Required for Changes to Take Effect
8.
Click OK.
9.
Click File
--> Exit from the console main menu to
close Analysis Services.
The
new threshold level will be enacted upon our next start of Analysis Services.
As we
have mentioned, we can set the Large
Level Threshold property via an MDX
query we generate from a client application. When we define the large-level
threshold at the client application, we do so by specifying the setpoint as a connection
property. Because the property value is set upon opening the connection,
this approach for setting the large-level limit does not provide a means of
controlling the location of query processing on a query-by-query basis, any
more than the same setpoint within Analysis Services provides for ad hoc
location management. The Large Level Threshold property therefore
allows for setting the property from the client, much as it is set from within
the Analysis Services properties that we explored above. A connection string
example is as follows:
Large Level = 750;
Thus, we see that the
control of the location of processing can be managed from the Large Level
Threshold property setpoint. While the Large Level value is a
critical influencer of which location is selected for query processing, we can
also control the location using other means, the chief of which is the Execution
Location parameter, which we will take up in our next section.
The Execution Location Parameter
We have learned that
large levels, as defined by the threshold, are processed solely at the server
level, and not sent to the client unless a specific request is made. Small
levels, in contrast, are sent to the client to be processed even if the
associated level is not requested in its entirety.
The Execution
Location connection parameter provides another, more direct means of
controlling location of processing of our MDX queries. To build a query that
will be executed on the server, we need only use the OLE DB property ExecutionLocation,
which specifies where the query is to be resolved. The setpoint options for ExecutionLocation
are displayed in Table 2.
|
|
|
|
|
|
|
|
Setpoint
|
|
Meaning
|
|
|
|
|
|
|
|
|
|
0
|
|
Default.
For compatibility with earlier versions, this means the same as setpoint 1,
and, thus, "Automatic." (Subject to change in future versions.)
|
|
|
|
|
|
|
|
|
|
1
|
|
The
PivotTable Service selects the query processing location (server or client
application) that it predicts will provide the best performance ("Automatic"
setting).
|
|
|
|
|
|
|
|
|
|
2
|
|
Queries
are processed on the client application.
|
|
|
|
|
|
|
|
|
|
3
|
|
Queries
are executed on the server. (Queries that contain session-scoped calculated
members, user-defined sets, or user-defined functions are exceptions.)
|
|
|
|
|
|
|
|
Table 2: Cube
Editor View of the Store Dimension Member Properties
The location
of query processing can be forced, using the Execution Location property,
to our choice of server or client. The default setting (Automatic),
allows the PivotTable Service to determine where the query should be processed,
based upon its prediction of which option will mean better performance.
A critical
factor in the determination of the execution location, when using the default
option (where the processing location is determined internally), is the Large
Level Threshold property we have already discussed. Say the level is set
at 1,000. The PivotTable Service forces "large level" treatment if
it determines that the query will require the aggregation of 1000 or more
members in a given dimension level. If the query does not involve a large
level, the processing of the query occurs at the client. If a large level is apparent,
the query processing occurs at the server.
Other means of controlling the location of
processing involve named sets, calculated members, and calculated cells, and
the circumstances under which they are created. We will delve into the
mechanics of these options further in our next session together, Optimizing
MDX: More on Location, and the Importance of Arrangement, where we will also move into arrangement
considerations in MDX query optimization.
Summary and Conclusion ...
In this lesson, we began the Optimizing
MDX group of articles. We established the objective,
within this mini-series, of exploring methods for optimizing the performance of
our MDX queries. We introduced what I like to call types of intervention,
or classifications of approaches that we can take, to enhance the performance
of MDX; we will revisit the types of intervention theme throughout the series.
Within
the context of each intervention type that we expose, we will perform practice
examples to reinforce an awareness that we can apply in our daily work with
MDX. In this lesson, we initially discussed the general steps Analysis
Services takes in processing queries, as a preparation for our overall focus on
optimization techniques. We then "drilled down" our focus to an examination
of the important Large Level Threshold property, followed by an
introduction to the first intervention type, to which the property contributes
its strongest influence, control of location of query execution.
»
See All Articles by Columnist William E. Pearson, III