About the Series ...
This is the twenty-first article of the series, MDX
Essentials. The series is designed to provide hands-on application of the
fundamentals of the Multidimensional Expressions (MDX) language, with
each tutorial progressively adding features designed to meet specific
real-world needs.
For more information about the series in general, as well as
the software and systems requirements needed for getting the most out of the
lessons included, please see the first article, MDX at
First Glance: Introduction to MDX Essentials.
Note: Service Pack 3 updates are assumed for MSSQL
Server 2000, MSSQL Server 2000 Analysis Services, and the related Books
Online and Samples.
What We Accomplished in our Last Article
In the
last article of the series, Subset
Functions: The Tail() Function, we
continued our group of three articles surrounding functions whose primary
purpose is to perform operations on subsets. We introduced the Tail()
function, with which we can return a subset from the end of a set. We
commented upon the operation of the function, and then examined its syntax.
Next, we undertook practice examples with the function, based upon hypothetical
business requirements, following the approach we have used throughout the
series.
In our practice set, we intentionally replicated the
requirements we had simulated in working with the Head() function in the
article that preceded it, so as to compare the Tail() and Head()
functions, and to note their similarities in operation, as well as to contrast
the results datasets they returned. Throughout the practice examples, we
briefly discussed the results datasets we obtained with regard to the Tail()
function, together with other surrounding considerations.
Introduction
In
this lesson, we will conclude our "triptych" of articles exposing set
functions that deal specifically with subsets. As we have noted, each function
returns a subset of a larger set, as part of its operation. We began the
subset functions articles with an examination of the Head() function,
then explored Tail() in the last. As we mentioned in our last session, these
three functions have much in common in the context of usage and operation; covering
them in close proximity allows us to more finely distinguish among them, as
well as to become aware of their similarities, and to better exploit the attributes
we can leverage to meet specific business needs.
In
this article, we will introduce and overview the Subset() function. The
general purpose of the Subset() function is to return a subset of tuples
from a specified set. We will first comment upon the operation of Subset(),
and then we will:
-
Examine the syntax surrounding the function;
-
Undertake illustrative examples of the uses of the function in
practice exercises;
-
Briefly discuss of the results datasets we obtain in the practice
examples.
The Subset() Function
According to the Analysis Services
Books Online, the Subset() function "returns
«Count» tuples from «Set» as a set, starting at position «Start».
Once we recover from the seemingly redundant explanation that is, in fact, a
pretty clear representation of the operation of the Subset() function, we
can see that Subset() works a little like the substring
functionality that appears in various programming environments, query languages
and other places. We are focusing on tuples and their positions relative to
each other, as opposed to characters, but the similarities in concept are
perhaps easy to recognize.
As we shall see, the order of the
set elements remains intact within the operation of the function. We control
the "range" of the function by providing a count, similar to
the way we control the "reach" we obtain in other MDX functions - and
similar to the way we use the numeric expression in the Head() and
Tail() functions that we explored in our previous two articles. The
difference is that we do not begin our "starting point" from either
the left/beginning or right/ending "side" of the set, as do the Head()
and Tail() functions, respectively (and a bit like LTRIM and RTRIM,
we might note, in the string-based analogy we cited earlier). We can tell Subset()
with which exact position to begin its work, and the number of elements to
capture, by providing the associated «Start» and «Count» specifications.
We will examine the syntax for the Subset()
function, then look at its behavior based upon different «Start» and «Count»
input we might provide. Next, we will undertake practice examples constructed
to support hypothetical business needs that illustrate uses for the function. This
will allow us to activate what we explore in the Discussion and Syntax
sections, by getting some hands-on exposure in creating expressions that
leverage the function.
Discussion
To restate our initial explanation of its operation, the Subset()
function iterates through the elements of the specified set and constructs
a set by adding the members in the directed range to the new set. The Subset()
function starts at a point, or an index («Start» in the syntax
model we show in the Syntax section below) that we designate
within a set. The function acts to return a range of m tuples
from a specified set. We specify m via the «Count» input we provide.
The function "counts over" this number of members, "lassoing"
them into selection for the new set it creates.
In a manner dissimilar to what we saw for the Head() and
Tail() functions in the two immediately previous articles, Subset()
manages the absence of a specified numeric expression for «Count» by
"defaulting" to include all elements from the «Start» position
to the end of the set. (Recall that the Head() and Tail()
functions handled the absence of a specified numeric expression by substituting
"1" as the range of elements "over" from the beginning and
end of the specified set, respectively.)
Let's look at some syntax illustrations to further clarify
the operation of Subset().
Syntax
Syntactically, the set upon
which we seek to perform the Subset operation is specified within the
parentheses to the right of Subset, just as we saw with the Head()
and Tail() functions in our previous articles. The syntax is shown in
the following string.
Subset(<< Set >>, << Start >> [,<< Count >>])
We follow «Set», the set
specification with a comma, which is followed by «Start», the starting
position for the operation. «Start» is, in turn, followed by «Count»,
the count of members in the selection range. As we have mentioned, the omission
of the count value means that the function simply selects all tuples
from «Start», which is "position zero," to the end of the set.
In specifying «Count», "0" represents the first member in the
set, "1" the second, and so forth.
Within a scenario where the
specified «Count» is greater than the number of tuples in the set
we specify, the complete set, beginning from the «Start»
position, is returned. Moreover, the input of a number less than 1 as
the «Count» results in an empty set (indicated, for example, by a
message in the MDX Sample Application that, because "the cellset ...
contains no positions," it is unable to display a results dataset.
The following example
expression illustrates the use of the Subset() function, within a
context similar to that of an expression we used in discussing the syntax of
the Head() and Tail() functions in the immediately preceding two articles.
This will illustrate the similarities in the construction of the functions,
while exposing the differences in the datasets that they return.
Let's say, again, that a group
of corporate-level information consumers within the FoodMart
organization wish to see the total Profits by U.S. Warehouse-Country
for the last three Quarters of 1998. While we could easily
accomplish this with the Tail() function, whose specialty is, after all,
returning the "last of" anything, we can accomplish the same results
with the Subset() function.
The basic Subset()
function, which would specify the "last three Quarters" (the "children"
of year 1998) portion of the required result dataset, would be constructed
as follows:
Subset([1998].Children, 1, 3)
This expression would be equivalent to the expression from
our last article, Tail([1998].Children, 3), and would return an
identical result dataset. Assuming that we placed the Subset() function
above within the column axis definition of a query, and the Warehouse-Country
information defined the row axis, our returned dataset would resemble that shown
in Table 1.
|
|
Q2
|
Q3
|
Q4
|
|
Canada
|
4,949.88
|
4,196.32
|
3,645.54
|
|
Mexico
|
19,625.45
|
16,477.01
|
14,509.69
|
|
USA
|
26,093.90
|
24,912.75
|
29,348.79
|
Table 1: Results Dataset, with Subset() Defining Columns
Just as we saw with the Tail() function in our
previous session, Subset() has the effect of compactly expressing that
we wish to display the Quarters as shown. The "starting point"
is Q2 (position "1", as Q1 would be position "0"
to the zero-based «Start» value), from which we derive the set (the Quarters
of 1998), in their natural order, for three elements "distance."
The primary difference in the two functions, as we can
readily see, is that the Subset() function can be used a bit more
flexibly. It allows us to specify "starting point" in a given set,
together with a "range" of selection, as opposed to the same
selection capability, with fixed starting point at the beginning or end of the
set, that we obtain using the Head() and Tail() functions,
respectively.
As was the case with the Tail() and Head() functions,
Subset() can be particularly useful in working with the Time
dimension. Moreover, the same efficiencies we saw with the other subset
functions can be obtained when Subset() is used in conjunction with "family"
functions, as with the .Children function above. More compact, reusable
coding is often the result.
NOTE: For information surrounding the .Children
function, see MDX
Member Functions: The "Family" Functions.
We will practice the use of the Subset() function in
the section that follows.
Practice
Preparation
To reinforce our understanding of the basics we have covered
so far, we will use the Subset() function in a manner that illustrates
its operation. We will do so in a simple scenario that places Subset()
within the context of meeting a business need.
To begin, we will construct a SELECT query with a
clearly defined set, then put Subset() to use in limiting that set to
meet an illustrative need for a group of hypothetical information consumers.
The intent is, of course, to demonstrate the operation of the Subset()
function in a straightforward manner.
Let's return to the MDX Sample Application as a
platform from which to construct and execute the MDX we examine, and to view
the results datasets we obtain.
1.
Start the MDX
Sample Application.
2.
Clear the top
area (the Query pane) of any queries or remnants that might appear.
3.
Ensure that FoodMart
2000 is selected as the database name in the DB box of the toolbar.
4.
Select the Sales
cube in the Cube drop-down list box.
Let's assume, for our practice example, that we have
received a call from the Marketing department of the FoodMart organization,
requesting some information surrounding sales promotions that have been
conducted. The Marketing information consumers specifically wish to know the Unit
Sales figures attributed to each of the promotions, broken out by gender
of the purchasers, from which to derive a recurring report that is more
filtered.
To rephrase, the objective will be to present a single
measure, Units Sales, for "all time" within the context of the
FoodMart Sales cube. (For our exercise, the cube can be assumed to
represent the current year-plus activity of the organization.) We wish to
return data showing Unit Sales broken out by male and female purchasers,
for each of the promotions that we have conducted within the time frame
represented by the Sales cube. It is from the results dataset that is
returned that the consumers want to narrow their request, once they get a look
at overall figures, to a compact, recurring report.
Let's construct a simple query, therefore, to return the Unit
Sales information, presented by gender (as columns) and the promotion
name (as rows).
5.
Type the
following query into the Query pane:
-- MDX021-1, Preparation for Use of Subset() Function in a Basic Query
SELECT
{[Gender].Members} ON COLUMNS,
{[Promotions].[Promotion Name].Members} ON ROWS
FROM
[Sales]
WHERE ([Measures].[Unit Sales])
6.
Execute the
query by clicking the Run Query button in the toolbar.
The Results pane is populated by Analysis Services,
and the dataset shown in Illustration 1 appears.
Illustration 1: Result Dataset Preparation for Use of
Subset() Function
We see Male, Female, and All
Gender populating the columns across, and the Promotion Name (from
the Promotions dimension) appearing on the row axis.
7.
Select File
-> Save As, name the file MDX021-1,
and place it in a meaningful location.
8.
Leave the
query open for the next section.
Next, let's say that our information consumers are provided
with the somewhat raw Promotion-by-Gender metrics we have generated. They
state that they need the data in a slightly different presentation, before determining
the thresholds for the ultimate recurring report.
The department has recently decided to emphasize its focus
on the purchasing activities of female purchasers, while perusing the
corresponding activities of male purchasers, in an attempt to identify
patterns. More specifically, they want the same information that we have
provided, but sorted by Unit Sales values, from highest sales promotion
to lowest, from the perspective of female shoppers.
We can accomplish this re-sort using the Order()
function that we explored in Basic
Set Functions: The Order() Function, as we shall see in the
following steps.
9.
Within the
query we have saved as MDX021-1, replace the top comment line of the
query with the following:
-- MDX021-2, Preparation for Use of Subset() Function -Ordered Query
10.
Save the query
as MDX021-2, to prevent damaging MDX021-1.
11.
Change the
following line of the query (the rows axis definition):
{[Promotions].[Promotion Name].Members} ON ROWS
to the following
{ORDER([Promotions].[Promotion Name].Members, ([Gender].[All Gender].[F],
[Measures].[Unit Sales]), BDESC)} ON ROWS
12.
Remove the
following line (the slicer at the bottom) from the MDX query:
WHERE ([Measures].[Unit Sales])
The
Query pane
appears as shown in Illustration 2.
Illustration 2: The Query with Ordering Enhancement
13. Execute the query by clicking the Run Query button in the toolbar.
The Results pane is populated, and the dataset depicted in Illustration 3 appears.
Illustration 3: Result Dataset - Ordered Core Query
14. Re-save the file as MDX021-2.
15. Leave the query open for the next step.
We have used the Order() function, with the BDESC keyword in place, to obtain the sorted core dataset that the Marketing department wants to see. This allows the information consumers to narrow even further their requirements for a recurring report on the promotions activity by gender. In our next section, we will use the Subset() function to provide for these narrowed, more informed requirements.
NOTE: For details concerning our use of the Order() function above, see my article Basic Set Functions: The Order() Function.
Limiting the Initial Dataset with the Subset() Function
Having provided the Marketing team with a "big picture" idea of promotions activity from the Sales cube, we have equipped them to ask for data within a narrower scope, to eliminate outliers such as promotions that fall below thresholds of interest for various reasons. For purposes of our practice example, we will say that the Marketing information consumers respond to our sorted results dataset within a short period, as we expected, requesting that we provide the report, exactly as it currently appears, on a monthly basis, but that the No Promotion group be excluded (it is of little value in the current context of specific promotion analysis), and that only the top twelve (on the basis of female patronage) promotions be presented in the recurring report.
There are numerous ways to approach this with MDX functions, but we know that Subset() will handle the requirement, particularly in a scenario where we have a sort in place for the dimension member under examination, females.
Let's use the Subset() function to meet the business requirement with precision.
1. Within the query we have saved as MDX021-2, replace the top comment line of the query with the following:
-- MDX021-3, Use of Subset() Function within the Ordered Query
2. Save the query as MDX021-3.
3. Within the query, click to the far right of "ON COLUMNS," in the following line:
{[Gender].Members} ON COLUMNS,
4. Press the Enter key a couple of times to create space between the line and the line that follows it.
5. Type the following into the new line:
SUBSET(
6. Place the cursor to the immediate right of the right curly brace ("}") in the following line of the query:
[Measures].[Unit Sales]), BDESC)} ON ROWS
7. Type a comma (" , "), a space, and then the following:
1, 12)
then another space.
The Query pane appears as shown in Illustration 4.
Illustration 4: The Query with Subset() Function in Place
Note that we set "1" as «Start», because, conveniently enough, we wish to exclude the "0" position (the No Promotions line item) anyway, based upon the request of the Marketing consumers who have defined the business requirement. We set "12" as the «Count», because the same information consumers have requested that we provide the metrics for the range of the top twelve promotions in the final version of this recurring report.
8. Execute the query by clicking the Run Query button in the toolbar.
The Results pane is populated, and the dataset shown in Illustration 5 appears.
Illustration 5: Result Dataset - The Subset() Function in Action
9. Re-save the file as MDX021-3.
We have thus provided the Marketing department with the requested analytical data. Because we have built in, via the Order() function, the automatic sorting on the criteria requested, we can be confident that any future generation of the data via this query will provide the appropriate selection, together with the order that reflects the sort of the core dataset. Should the consumers return with a request to change the number of promotions to which they want to narrow their focus, we can accomplish this with a simple adjustment to the «Count» specification within the Subset() function we have placed into our query.
10. Close the Sample Application when ready.
Summary ...
This article served as the conclusion of a group of three articles surrounding subset-related functions. We introduced the Subset() function, whose general purpose is to return a specified number of elements in a set, beginning at a point in the set that we designate via the «Start» value, and extending for a range of «Count» tuples. We commented upon the operation of the function, and then examined its syntax.
We undertook a multi-step practice example whereby we created a core query, then limited the results that the query returned through the use of the Subset() function, within the context of meeting an illustrative business requirement. We demonstrated the manner in which the Subset() function uses the «Start» and «Count» values we input to generate the precise results that we wish to obtain. We briefly discussed the results dataset we obtained with the Subset() function, together with other surrounding considerations. Throughout our examination of the Subset() function, we compared and contrasted the Subset() and the Head() and Tail() functions, from the perspective of usage and operation, in order to finely distinguish among them for the particular characteristics we need to meet specific business needs.
» See All Articles by Columnist William E. Pearson, III
Discuss this article in the MSSQL Server 2000 Analysis Services and MDX Topics Forum.