About the Series ...
This article is a member of the series, MDX Essentials.
The series is designed to provide hands-on application of the fundamentals of
the Multidimensional Expressions (MDX) language, with each tutorial
progressively adding features designed to meet specific real-world needs.
For more information about the series in general, as well as
the software and systems requirements for getting the most out of the lessons
included, please see my first article, MDX at
First Glance: Introduction to MDX Essentials.
Note: Current updates are assumed for MSSQL
Server, MSSQL Server Analysis Services, and the related Books
Online and Samples.
Overview
In this article, we will examine the TopCount() function,
whose highest value lies in its capability to enable us to isolate best
performers from among hundreds or thousands of fellow members. This ranking
capability is critical in data analysis and decision support scenarios: In many
business situations, we seek to report upon "best performers" for
various reasons. TopCount() facilitates our doing so, allowing us to
sort on a numeric value expression that we can provide. We can tell the
function how many "top" members we wish to retrieve (say, the "top
ten," or the "top twenty"), for a "custom-fit" approach,
that matches our needs precisely.
In
this article, we will introduce and overview the TopCount() function. We
will first comment upon the general operation of TopCount(), and then we
will:
-
Examine the syntax surrounding the function;
-
Undertake illustrative examples of the rudimentary uses of the
function in practice exercises;
-
Briefly discuss the results datasets we obtain in the practice
examples.
The TopCount() Function
According to the Analysis Services
Books Online, the TopCount() function
returns a specified number of items from the topmost members of a specified
set, optionally ordering the set first." The TopCount() function
stands out as an excellent general example of the potential power of MDX. We
specify three parameters, a set, a count, and a measure,
in most cases, and TopCount() returns the number of top performers (or "top
anything," in effect), based upon our input.
As we shall see, TopCount() sorts
the set we specify by another expression we provide within the function, thus
breaking the natural hierarchy of the set. The basis of sorting by TopCount()
closely resembles that used by the Order() function. We should
therefore keep in mind that, in the absence of a specified sort, the function
will institute a sort based upon the default member.
NOTE: For information surrounding the Order()
function, see my article Basic
Set Functions: The Order() Function,
a member of the MDX Essentials series at Database Journal.
We will examine the syntax for the
TopCount() function, and then look at its behavior based upon parameter input
we might provide. Next, we will undertake practice examples constructed to
support hypothetical business needs that illustrate uses for the function.
This will allow us to activate what we explore in the Discussion
and Syntax sections, by getting some hands-on exposure in
creating expressions that leverage the function. Our next article will build
upon the basics we expose here, and leverage TopCount() within more
sophisticated examples of business usage.
Discussion
To restate our initial explanation of its operation, the TopCount()
function sorts a set based upon a numerical expression we
provide, and then picks the top (whatever number we provide in the Count
parameter of the function) items in that set, based upon the rank order
established by the function. The frequent requirement to examine a subset of
members at the top of the general set is easily answered by the TopCount()
"sort, and then pick this number of members from the top down"
approach.
Let's look at some syntax illustrations to further clarify
the operation of TopCount().
Syntax
Syntactically, the set upon
which we seek to perform the TopCount() operation is specified within
the parentheses to the right of TopCount, a common arrangement within
MDX functions, as we have seen in our previous articles. The syntax is shown in
the following string.
TopCount(≪≪ Set >>, ≪≪ Count >> [,≪≪ Numeric Expression >>])
We follow <<Set>>, the set
specification, with a comma, which is followed by <<Count>>, the numeric
expression we provide to dictate the number of "top" members we wish for
the function to return. <<Count>> is, in turn, followed by a numeric
expression whereby we specify the "sort by" criteria. As we have
mentioned, TopCount() always breaks the natural hierarchy of the set
- due to the fact that the set is sorted upon the third argument we
provide as <<Numeric Expression>>, before returning the top
(specified number of) members from that sort.
The items returned by TopCount()
are limited to the << Count >> input that we provide, even
in the case of "ties" in the <<Numeric Expression>> value.
This may be important to consider when evaluating the data retrieved by the
function, in addition to default behavior of the function when no sort criteria
(in the form, again, of the <<Numeric Expression>>) is
provided.
The following example
expression illustrates the use of the TopCount() function, within a simple
business context. Let's say that a group of information consumers within the
FoodMart Planning organization wishes to see the top ten Warehouse
Cities for operating year 1997, based upon total Warehouse Profit
for each city.
The basic TopCount()
function specifies the "top ten Warehouse Cities" (with the
number "10" as the Count specification, and [Warehouse].[City].Members
as the Set specification, of the function). TopCount()
assembles the top ten members from the perspective of Warehouse Profit
(the Numeric Expression upon which the complete set of Warehouse
Cities will first be sorted by the function). The function with arguments is
represented in the ON ROWS line of the following query:
SELECT
{[Measures].[Warehouse Profit]} ON COLUMNS,
{TOPCOUNT([Warehouse].[City].Members, 10,
[Measures].[Warehouse Profit])} ON ROWS
FROM
[WAREHOUSE]
WHERE
([Time].[Year].[1997])
Assuming that we placed the TopCount() function within
the query as constructed above, our returned dataset would resemble that shown
in Table 1.
|
|
Warehouse Profit
|
|
Salem
|
17,041.57
|
|
Tacoma
|
16,725.63
|
|
Los Angeles
|
12,789.25
|
|
San Diego
|
12,536.28
|
|
Bremerton
|
12,063.17
|
|
Seattle
|
12,021.58
|
|
Spokane
|
5,835.46
|
|
Beverly Hills
|
5,674.85
|
|
Yakima
|
5,635.98
|
|
Portland
|
4,186.58
|
Table 1: Results Dataset, with TopCount() Defining Columns
As we expected, the top ten Warehouse Cities are
returned, having been sorted by Warehouse Profit, and having had the top
ten values in that sort isolated as we requested in the function.
As is the case with many MDX functions, TopCount()
can be used to deliver sophisticated analysis datasets, particularly when we
employ it along with other MDX functions. We will practice the use of TopCount()
in the section that follows, building from a simple example, similar to
that which we saw above, to a scenario where we apply the function, in
conjunction with a calculated member, to reveal an additional analytical
perspective. In our next article, we will examine even more sophisticated
applications of the function, combining its use with that of other functions we
have explored in articles of the series.
Practice
To reinforce our understanding of the basics we have covered
so far, we will first use the TopCount() function in a simple scenario
to illustrate its operation as the primary focus. We will do so in a backdrop
that places TopCount() within the context of meeting a business need for
a group of hypothetical information consumers.
Let's return to the MDX Sample Application as a
platform from which to construct and execute the MDX we examine, and to view
the results datasets we obtain.
1.
Start the MDX
Sample Application.
2.
Clear the top
area (the Query pane) of any queries or remnants that might appear.
3.
Ensure that FoodMart
2000 is selected as the database name in the DB box of the toolbar.
4.
Select the Warehouse
cube in the Cube drop-down list box.
Let's assume, for our practice example, that we have
received a call from the Finance department of the FoodMart organization,
requesting some information surrounding general profitability among the
organization's Warehouse Cities. The Finance information consumers
specifically wish to know the Warehouse Profit figures attributed to
each of the top ten performing cities for operating year 1998.
To rephrase, the objective will be to present a single
measure, Warehouse Profit, for 1998 within the context of the
FoodMart Warehouse cube. (For our exercise, the cube can be assumed to
represent the prior year's activity of the organization.) We wish to sort the Warehouse
Cities by the total Warehouse Profit of each, and then to return
data for the highest ten Warehouse Cities based upon the profit measure.
Let's construct a simple query, therefore, to return the top
ten performers, as requested.
5.
Type the
following query into the Query pane:
-- MDX031-1,Simple use of TopCount(): "Top Ten 1998 Profit Producers"
SELECT
{[Measures].[Warehouse Profit]} ON COLUMNS,
{ TOPCOUNT([Warehouse].[City].Members, 10,
([Measures].[Warehouse Profit]))} ON ROWS
FROM
[WAREHOUSE]
WHERE
([Time].[Year].[1998])
6.
Execute the
query by clicking the Run Query button in the toolbar.
The Results pane is populated by Analysis
Services, and the dataset shown in Illustration 1 appears.
Illustration 1: Result Dataset - Simple Use of the
TopCount() Function
In a manner similar to that in the Syntax
section illustration above, the top ten Warehouse Cities are returned
(this time specifically for operating year 1998). The operation of TopCount()
has again been to sort by Warehouse Profit, and then to return the top
ten values from that sort, as we requested in the function.
7.
Select File
-> Save As, name the file MDX031-1, and place it in a
meaningful location.
8.
Leave the
query open for the next section.
Next, let's say that the Finance information consumers are
provided with the simple "top ten" data we have generated. They state
that they need to add the Warehouse Sales and a Margin % to the
presentation, keeping the TopCount() based upon the Warehouse Profit
measure, as before. This, they feel, will "round out" the
presentation to provide more utility from an analysis perspective, presenting "more
information at a single glance."
9.
Within the
query we have saved as MDX031-1, replace the top comment line of the
query with the following:
-- MDX031-2, "Top Ten 1998 Producers": Add Sales and Margin Calc Member
10.
Save the query
as MDX031-2, to keep MDX031-1 intact as a working sample.
11.
Add the
following lines to the query, between the top comment line we just modified,
and the SELECT keyword that begins the query:
WITH
MEMBER
[Measures].[Margin %]
AS
' [Measures].[Warehouse Profit]/[Measures].[Warehouse Sales]'
This
will create the calculated member Margin %, which we will present
alongside the Warehouse Sales and Warehouse Profit measures in
the following steps.
12.
Modify the ON
COLUMNS line of the query to contain the Warehouse Sales measure,
together with the new Margin % calculated measure defined above, as
follows:
{[Measures].[Warehouse Sales], [Measures].[Warehouse Profit],
[Measures].[Margin %]} ON COLUMNS,
13.
Leave the
remainder of the query in its original state.
The
Query pane
appears as depicted in Illustration 2, with our modifications marked.
Illustration 2: The Query with Added Measure and Calculated
Measure
14.
Execute the
query by clicking the Run Query button in the toolbar.
The
Results pane is
populated, and the dataset shown in Illustration 3 appears.
Illustration 3: Result Dataset - With Our Modifications
We note that, although the order of the "top ten"
has not changed (the TopCount() function continues, after all, to be
based upon the Warehouse Profit measure), the Margin % calculated
measure is not sorted. (We will focus on this calculated measure as an "additional
perspective" next.)
15.
Re-save the
file as MDX031-2.
16. Leave the query open for the next step.
We present the results to the information consumers, who are quite happy with the outcome. At this point, we propose to extend the requirement once again, and to modify the query to return the top ten Warehouse Cities based upon the new Margin % calculated measure. This will provide another analytical perspective, we say; we also mention our confidence, based upon our experience within the realm of business intelligence, that this analytical "view" may act to broaden the perspective of Warehouse Profit alone.
We have only to make one simple change to the function to deliver this additional perspective. We will do so, and verify the operation of the modified TopCount() function, by taking the following steps:
17. Within the query we have saved as MDX031-2, replace the top comment line with the following:
-- MDX031-3, Top Ten 1998 Producers":
Calc Member is the TopCount() Basis
18. Save the query as MDX031-3, to keep MDX031-2 intact as a working sample.
19. Replace [Measures].[Warehouse Profit] within the Topcount() function (in the ON ROWS line) with [Measures].[Margn %], the new calculated member we added above.
The ON ROWS line appears as follows after the change:
{ TOPCOUNT([Warehouse].[City].Members, 10,
([Measures].[Margin %]))} ON ROWS
The Query pane appears as depicted in Illustration 4, with our modifications marked, once again.
Illustration 4: The Query with Our Modifications Marked
20. Execute the query by clicking the Run Query button in the toolbar.
The Results pane is populated, and the dataset shown in Illustration 5 appears.
Illustration 5: Result Dataset - Presenting a New View, Indeed
It quickly becomes obvious that we have derived a new view of the business, indeed, with this small modification. The operation of the TopCount() function, based in this instance upon the Margin % attained by each of the Warehouse Cities, results in a sort of the Warehouse Cities accordingly. The subsequent selection of the top ten from this sort presents Warehouse Cities that did not appear earlier, because their Profit levels were not among the highest in the organization. The substitution of the Margin %, however, as the basis for the TopCount() function has revealed a higher margin of return in Warehouse Cities that simply do not "register "in the former query, where their relative size obscures their performance.
This revelation is welcomed by the information consumers, who can now enhance their analysis capabilities by taking into consideration both perspectives (Warehouse Profit and Margin Percent) to isolate top performers. The study of the top groups from the perspective of Margin Percentage will likely add valuable lessons that can be extrapolated to other Warehouse operations, regardless of size. This is an excellent illustration of the power of multidimensional analysis, courtesy of the pairing of the TopCount() function with a meaningful calculated member to delve deeper in the analysis of top organizational performers.
21. Re-save the file as MDX031-3.
22. Exit the MDX Sample Application when ready.
Summary ...
This article served as the first of a pair of articles surrounding the potentially powerful TopCount() function. In this introductory session, we examined the TopCount() function, noting its obvious value in equipping us with a means of isolating high performers from among hundreds or thousands of fellow members. We noted that this ranking capability is often critical in data analysis and decision support scenarios, and then discussed how TopCount() facilitates our performing such ranking, first discussing the general operation of TopCount(), and then examining the syntax surrounding the function.
We then undertook exercises where we practiced using the function in meeting the business requirements of a hypothetical group of information consumers. We focused on a simple use of the function, to allow for minimal distraction while grounding ourselves in the basics, before providing a straightforward, yet meaningful, example of how we might leverage our core query with the addition of a calculated member to achieve a revealing additional perspective in our analysis of the performance of operational units. Throughout the practice examples, we briefly discussed the results datasets we obtained with regard to the TopCount() function, together with other surrounding considerations.
» See All Articles by Columnist William E. Pearson, III
Discuss this article in the MSSQL Server 2000 Analysis Services and MDX Topics Forum.