About the Series ...
This is the fourteenth tutorial article of the series, MDX
in Analysis Services. The series is designed to provide hands-on
application of the fundamentals of MDX from the perspective of MS SQL Server
2000 Analysis Services ("MSAS"); our primary focus is the
manipulation of multidimensional data sources, using MDX expressions, in a
variety of scenarios designed to meet real-world business intelligence needs.
For more
information on the series, as well as the hardware / software requirements to
prepare for the tutorials we will undertake, please see the first lesson of
this series: MDX
Concepts and Navigation.
Note: At the time of writing, Service
Pack 3 updates
are assumed for MSSQL Server 2000, MSSQL Server 2000 Analysis
Services, and the related Books Online and Samples.
The screen shots that appear in this article were taken from a Windows 2003
Server, and may appear somewhat different from coinciding views in other
operating systems.
Introduction
In our last tutorial, Named
Sets in MDX: An Introduction, we introduced named sets in MDX queries, focusing
on their creation through use of the WITH clause, to allow us to gain an
understanding of the general capabilities of static and dynamic
named sets. We introduced the concepts behind named sets, and then examined
the MDX syntax required to create them and to specify them for presentation in
our results. Next, we discussed the nature of static and dynamic named sets,
and then activated what we had learned through an illustrative practice example
for each of the two types. Finally, we discussed the results we obtained in
each hands-on example, illustrating the value that named sets can offer us.
In
this article, we introduce the concept of distinct counts, discussing why
they are useful - indeed, often required - in our organizational analysis
efforts. Throughout our session, we will describe some of the challenges that
are inherent in distinct counts, and then we will undertake practice
exercises to illustrate solutions to meet our business needs. As a part of the
practical exercises, built around a hypothetical business need, we will provide
an introduction to the approach afforded us by the MSAS user interface, and
then to an alternative approach we can take using MDX.
"The Need for Distinction"
As anyone in the realm of
business intelligence and general analysis has probably come to realize, we
often encounter the need to quantify precisely the members of various
sets of data. Those of us who have become familiar with MSAS are aware of its
capabilities when it comes to categorizing and aggregating data within the
hierarchical contexts of dimensions and levels. We can, for the most part,
readily tap these capabilities from the user interface that MSAS provides. Through
the exploitation of more advanced approaches, including the use of calculated
members / measures and MDX (multidimensional expressions), we can extend our
analysis even further, and leverage MSAS to reach far more specific objectives.
NOTE: For more information on calculated
members, see Calculated
Members: Introduction, Calculated
Members: Further Considerations and Perspectives, and Calculated
Members: Leveraging Member Properties among numerous general references
in the Database Journal MDX Essentials series.
One of the basic requirements
that come into play, at least in some form, in virtually any analysis scenario,
is the need to count the members of a set targeted for analysis. An example
might be the need to count the number of products we shipped from a given
warehouse, or group of warehouses, to a geographical location or group of
stores. This can be accomplished readily enough with the Count()
function (see Basic
Numeric Functions: The Count() Function for details about using the MDX Count()
function).
As many of us know, Count()
does a fine job of giving us a total count. This would mean that the results
we might achieve in using Count() with products, in the scenarios
above, would represent total number of products shipped. What we would
not get, and what we might find far more useful in some situations, would be a
count of the different products that were shipped. Count(), in
providing a total number, would also be providing multiple counts of the
same products, because products will have been shipped multiple times, in
many instances. To reach our objective of counting different products,
then, we would need to count each different product shipped only once. To
count them multiple times not only misstates the number of different
products, but it also likely renders averages, and other metrics based upon
the count value, meaningless or misleading.
The word "different"
here is easily supplanted by "distinct." Moreover, as many of us
know, performing distinct counts has historically presented a challenge
in the OLAP world. Let's discuss an example that illustrates the challenge, and
then convert that challenge to an opportunity to meet a business need using the
distinct count capabilities found within MSAS.
Handling Distinct Counts via the MSAS User Interface
Let's take a look at a
scenario that illustrates a need for a distinct count, using a hypothetical
business need to add practical value. Let's say that a group of information
consumers within the FoodMart organization have approached us with a need that
they wish to meet within the Warehouse cube. The consumers want to be
able to report upon number of products within various metrics
without having to be concerned with an issue they faced with a previous system
- a scenario of "double counting" in many inventory reports that
concerned product-related transactions between warehouses and stores.
We might initially
attempt to meet the needs of the consumers with somewhat advanced MDX, but let's
try to minimize complication, while heading off many of the issues, with a
straightforward approach from within the Cube Editor component of the
MSAS user interface, Analysis Manager, first. This provides all that we
need, in many cases. (We will examine an MDX approach in the next section of
this article).
Let's start Analysis
Services and proceed with the following steps:
1.
Open Analysis
Manager.
2.
Expand the Analysis
Servers folder by clicking the "+" sign to its immediate
left.
Our
server(s) appear (my
server, MOTHER1, is depicted in some of the illustrations).
3.
Expand the
desired server.
Our
database(s) appear,
in much the same manner as shown in Illustration 1.
Illustration 1: A Sample
Set of Databases Displayed within Analysis Manager
4.
Expand the FoodMart2000
database.
5.
Expand the Cubes
folder.
The
sample cubes appear,
as shown in Illustration 2.
Illustration 2: The
Sample Cubes in the FoodMart2000 Database
NOTE: Your databases / cube tree may differ, depending upon
the activities you have performed since the installation of MSAS (and the
simultaneous creation of the original set of sample cubes). Should you want or
need to restore the cubes to their original state, simply restore the database under
consideration. For instructions, see the MSSQL Server 2000 Books Online.
6.
Right-click on
the Warehouse sample cube.
7.
Select Edit
from the context menu that appears, as shown in Illustration 3.
Illustration 3: Select Edit
from the Context Menu
The Cube
Editor opens. The Schema tab appears as depicted in Illustration
4.
Illustration 4: Cube
Editor - Schema Tab for the Warehouse Sample Cube
We will
be creating a measure in the Cube Editor to enable us to make our
distinct Product counts. Distinct Count can only exist as a
measure.
8.
Right-click
the Measures folder in the Tree View to the left of the Schema
tab.
A
single-line context menu appears, as shown in Illustration 5.
9.
Select New
Measure from the context menu.
Illustration 5: Select New
Measure from the Context Menu
The Insert
Measure dialog appears.
10.
Click-select product_id.
The Insert
Measure dialog, selected measure circled in red, appears in Illustration
6.
Illustration 6: Select Product_Id
from the Insert Measure Dialog
11.
Click OK
to accept the selection.
The Insert
Measure dialog closes, and we see the new measure appear (default name of Product_Id)
in the Measures folder, as depicted in Illustration 7.
Illustration 7: Product_Id
Appears in the Measures Folder (Circled)
12.
Click-select product_id
in the Measures folder, if required.
13.
If necessary,
click the downward arrow beneath the Cube Tree to open the Properties
pane.
14.
Click the Basic
tab.
15.
Modify the
default Name of Product Id to the following:
Product Count
16.
Type the
following into the empty Description box, just below the Name
box:
Distinct Count - Products
17.
Click the box
to the right of the Aggregate Function label, to enable the selector.
18.
Select Distinct
Count in the Aggregate Function selector.
The Basic tab of the Properties pane
appears as shown in Illustration 8.
Illustration 8: Product
Count Measure - Properties Pane - Basic Tab
19.
Click the Data
tab as if going to the Data View to perform a routine browse.
A warning
briefly appears, indicating that sample data is being generated, and that the
cube requires processing, as a result of our modifications. The sample data
then appears, along with a static warning below it, to ensure that we are aware
that the data is not what it might appear to be, and that the cube must be
processed to make updated, actual data available, as partially depicted in Illustration
9.
Illustration 9: Data
View (Partial and Compressed) - With "Staleness" Warning at its Foot
Let's
process the cube to activate our changes.
20.
Select File
--> Save to save the cube in its modified
state.
21.
Select Tools
--> Process Cube to initialize the processing steps.
A message
box appears, stating that the cube has no aggregations, and asking if we wish
to design them at this time, as shown in Illustration 10.
Illustration 10: Aggregations
Message Box - Just Say "No"
NOTE: The message box may not appear,
if the cube has been altered with regard to aggregations since its installation
as an MSAS sample. If not, the next box will appear instead, skipping this
one.
22.
Click No
to skip designing aggregations at present.
The Select
the Processing Method dialog appears, as depicted in Illustration 11.
Illustration 11: The Select
the Processing Method Dialog
Full
Processing is the
default, and only option, as the Warehouse cube has not been processed
since the structural change we have made to it.
23.
Leaving
settings at default, click OK.
Processing begins, and runs rapidly, as
evidenced by the Process viewer's presentation of processing log events
in real time. The Processing cycle ends and the success of the
evolution is indicated by the appearance of the Processing Completed
Successfully message (in green letters) at the bottom of the viewer, as
shown in Illustration 12.
Illustration 12: Indication
of Successful Processing
We are
returned to the Cube Editor. We can now browse the data and see our new
Distinct Count measure in action.
25.
Click the Data
tab, if necessary.
The Data
View refreshes and data appear in the default formation, ready for our
manipulations review. A portion of the Data View, depicting the Warehouse
Profit and new Product Count measures, appears in Illustration 13.
Illustration 13: Warehouse
Profit and Product Count Measures in the Data View
Now that
we have a credible result set with which to compare, let's take a look at
replicating the same results using MDX. We can leave the Data View as
it is, for easy referral against our next results dataset, which we will
generate independently within the MDX Sample Application.
Using MDX to Render Distinct Counts
We now have a set of "answers"
that we can attempt to replicate in direct MDX. Let's initialize the MDX Sample Application, as a platform from which to
perform our practice exercises, taking the following steps:
1.
Start the MDX
Sample Application.
We are
initially greeted by the Connect dialog, shown in Illustration 14.
The
illustration above depicts the name of my server, MOTHER1, and properly
indicates that we will be connecting via the MSOLAP provider (the
default).
The MDX
Sample Application window appears.
A
blank Query pane appears.
4.
Ensure that FoodMart
2000 is selected as the database name in the DB box of the toolbar.
5.
Select the Warehouse
cube in the Cube drop-down list box.
The MDX
Sample Application window should resemble that depicted in Illustration 15,
complete with the information from the Warehouse cube displaying in the Metadata
tree (left section of the Metadata pane).
Illustration 15: The MDX Sample Application Window
(Compressed View)
We
will begin creating our query, with a focus on returning results in the same
general formation as the Data View we left in the Cube Editor.
We will retrieve the Warehouse Profit and Product Count measures,
as pictured in Illustration 13 above. Next, we will attempt to add a
calculated measure that we craft directly in MDX, to replicate the distinct
count information we obtained with the Product Count measure that we
created in Analysis Manager earlier.
1.
Create the
following new query:
-- MXAS14- 1 Initial Attempt at Distinction
WITH MEMBER
[MEASURES].[ProdCount]
AS
'DISTINCTCOUNT({[Product].MEMBERS})'
SELECT
{ [MEASURES].[Warehouse Profit], [MEASURES].[Product Count],
[MEASURES].[ProdCount] } ON COLUMNS,
{[Product].CHILDREN} ON ROWS
FROM
[Warehouse]
The above represents an
attempt to meet the information consumers' objectives with what appears to be
the straightforward use of the DISTINCTCOUNT() function. This might
represent an approach that seems intuitive to a practitioner who has given up
on the handful of non-working or nebulous examples that can be found on the
web, (and which happen to be about all we seem to have as a basis for learning
MDX, in many instances). While it ultimately fails to provide the desired
solution, as we shall see, it should not be surprising that we might attempt
this, given the definition in the Books Online, not to mention the words
used in the name of the function itself. (Most will agree, also, that it is
better to attempt it now, than when under the gun of an employer or a hurried
client.)
The calculated member ProdCount
embodies the function. I named it ProdCount to distinguish if from Product
Count, the measure we created while within the user interface in the
earlier section, which I have also decided to present within the results
dataset for comparison purposes. Warehouse Profit is also presented to
align with our Data View as we left it in the last section.
2.
Execute the
query using the Run Query button.
The
results dataset appears as shown in Illustration 16.
Illustration 16: The
Results Dataset - DISTINCTCOUNT() Approach
3.
Save the query
as MXAS14-1.
It does not require a huge leap of logic to conclude that the ProdCount
calculated measure is generating a transaction count, which is probably
correctly "distinct," within its own (actual) meaning, but not at all
what the information consumers have requested in our practice example.
Bruised and humiliated (albeit briefly), let's resort to another,
more cumbersome approach, whose issue is at least the distinct product values.
4.
Create the
following new query:
-- MXAS14- 2 Distinction at its Finest
WITH MEMBER
[MEASURES].[CalcCount]
AS
'COUNT(CROSSJOIN({[MEASURES].[Warehouse Profit]}, DESCENDANTS
([Product].CURRENTMEMBER, [Product].[Product Name])), EXCLUDEEMPTY)'
SELECT
{[MEASURES]. [Warehouse Profit], [MEASURES].[Product Count], [MEASURES].[CalcCount] }
ON COLUMNS,
[Product].CHILDREN ON ROWS
FROM
[Warehouse]
The next attempt at
distinction is embodied by the calculated measure CalcCount, named,
again, simply as a means of distinguishing it from the measure we created in
the Cube Editor and which we include once again for comparison purposes.
The above approach may
not have been the initial impulse that many of us had in tackling what seemed
to be a straightforward replication of the Data View we saw earlier.
What we are doing, in short, with the CrossJoin() function is marrying
the Warehouse Profit values with the products, and returning
(thanks to EXCLUDEEMPTY) a count of the non-empty pairings.
The Descendants() function builds in flexibility, allowing us to apply
the logic equally well to a group of products as to the full set of products.
The key to this is the selection of the current member's descendents,
adding the "relativity" that so pointedly underscores the power of
the .CurrentMember function.
5.
Execute the
query using the Run Query button.
The
results dataset appears as shown in Illustration 17.
Illustration 17:
The Results Dataset - Distinction Attained
6.
Save the query
as MXAS14-2.
The values for the new measure are in alignment with those of the
measure we created in the Cube Editor.
NOTE: For a detailed introduction to most of the above
functions, see the Database Journal MDX Essentials Series
index page.
7.
Exit the MDX
Sample Application and Analysis Manager when ready.
Summary and Conclusion ...
In
this lesson, we introduced the concept of distinct counts, discussing
why they are often a requirement in our analysis efforts and those of the
information consumers whom we support. In our introduction, and throughout our
examination of the MDX syntax we explored to achieve our illustrative ends, we
highlighted the challenges that are inherent in distinct counts. We
performed practice exercises, to illustrate solutions for hypothetical business
needs that called upon the use of distinct count capability, obtaining
exposure to the options afforded us by the MSAS user interface, as well the MDX
syntax involved with using the alternative solutions that we proposed.
In
future articles, we will examine the performance considerations inherent in the
production of distinct counts, as well as options
that are available to tune our efforts for more efficient operation. The need
for distinct counts is a fact of business life, and mastery of the costs
and results of this vital capability represent a unique opportunity to add
another tool to our MSAS skill sets.
»
See All Articles by Columnist William E. Pearson, III