Using MDX with DeepSee
Working with Sets
|
|
This chapter discusses how to create and use sets. It discusses
the following topics:
A set contains zero or more elements, and there are three general
kinds of elements: members, scalar values, and tuples (for information
on tuples, see the next chapter,
Tuples and Cubes).
You use sets on the axes of MDX queries; you also use them to
build other sets. Note that although an MDX set can be empty, you
cannot use such an empty set as an axis.
{expression1, expression2, ...}
This list can include any number of elements. In DeepSee MDX,
if the list includes only one element, you can omit the curly braces.
Each set element can be one of the following:
-
-
An explicit reference to a single member by name.
-
An expression that uses an MDX function to return
a single member.
-
An expression that uses an MDX function, like
MEMBERS, to return a set.
-
-
A scalar value, which is one of the following:
-
A reference to a measure. The expression
MEASURES.[Avg Test Score] is a scalar value; it returns
either a number or null in all contexts.
-
A numeric or string constant such as
37 or
"label".
-
-
A percentage literal. For example:
10%
There must be no space between the number and the percent
sign.
-
An expression that uses an MDX function to return
a scalar value.
It is often useful to create a set and assign a name to
it, so that you can reuse that set in multiple ways. Also, the syntax
of a query is often easier to read when you use named sets.
You can create one or more named sets within a query, as follows:
WITH with_clause1 with_clause2 ... SELECT query_details
-
SET set_name AS 'set_expression'
-
Then your query can refer the named set by name in all the places
where you can use other set expressions.
WITH SET testset AS '{homed.city.members}'
SELECT MEASURES.[%COUNT] ON 0, testset ON 1 FROM demomdx
%COUNT
1 Cedar Falls 184
2 Centerville 194
3 Cypress 134
4 Elm Heights 146
5 Juniper 176
6 Magnolia 169
7 Pine 118
8 Redwood 182
9 Spruce 197
Every set has an inherent order (a first member, a second member,
and so on).
When you use an MDX function to return a set, that function
determines the order of the members in the set. Wherever possible,
MDX functions use the natural order of members as specified in the
cube definition.
For example, the MEMBERS function returns the members of a level
in the order specified in the cube definition (typically in alphabetic
order or date order, depending on the level).
When you construct a set as described in the previous section,
the order in which you list elements controls the order of the members
in the set. For example, suppose that you specify a set as follows:
{gend.gender.MEMBERS,allerd.allergies.MEMBERS}
This set consists of the members of the
Gender dimension, followed by the members of the
Allergies dimension.
SUBSET returns a set of members from a
given set, by position. You specify a set, the starting position,
and the number of members to return. The starting position is 0. For
example:
SELECT MEASURES.[%COUNT] ON 0,SUBSET(homed.city.MEMBERS,0,3) ON 1 FROM demomdx
%COUNT
1 Cedar Falls 110
2 Centerville 99
3 Cypress 112
The
EXCEPT function provides
another way to get a subset; see the next section.
This section describes ways to sort sets. It discusses the following
topics:
It is
often useful to sort members by the value of some measure. For example,
you might want to sort departments by response, so that you can look
at the departments with the slowest responses. Or you might sort products
by their sales revenues so that you can look at the top-ranked products.
To return a set in the order that you specify, use the
ORDER function. This function takes an
argument, typically a measure reference, that specifies the value
to use when determining the order of the set members. For example:
SELECT MEASURES.[avg test score] ON 0,
ORDER(homed.city.MEMBERS,MEASURES.[avg test score],BDESC) ON 1 FROM demomdx
Avg Test Score
1 Juniper 75.08
2 Redwood 75.07
3 Cedar Falls 75.03
4 Elm Heights 74.96
5 Pine 74.76
6 Spruce 74.47
7 Magnolia 74.13
8 Cypress 73.96
9 Centerville 73.79
The optional third argument can be one of the following:
-
ASC (the default) Use this
to sort in ascending order, while preserving the hierarchy, if applicable.
-
DESC Use this to sort in
descending order, while preserving the hierarchy, if applicable.
-
BASC Use this to break
the hierarchy and sort all members in ascending order.
-
BDESC Use this to break
the hierarchy and sort all members in descending order.
For example, the following query breaks the hierarchy:
SELECT MEASURES.[avg test score] ON 0,
ORDER(homed.MEMBERS,MEASURES.[avg testscore],BDESC) ON 1 FROM demomdx
Avg Test Score
1 Juniper 75.08
2 Redwood 75.07
3 32007 75.07
4 Cedar Falls 75.03
5 38928 75.00
6 Elm Heights 74.96
7 32006 74.78
8 Pine 74.76
9 Spruce 74.47
10 34577 74.28
11 Magnolia 74.13
12 Cypress 73.96
13 Centerville 73.79
14 36711 73.79
In contrast, the following preserves the hierarchy:
SELECT MEASURES.[avg test score] ON 0,
ORDER(homed.MEMBERS,MEASURES.[avg testscore],DESC) ON 1 FROM demomdx
Avg Test Score
1 32007 75.07
2 Redwood 75.07
3 38928 75.00
4 Cedar Falls 75.03
5 Elm Heights 74.96
6 32006 74.78
7 Juniper 75.08
8 Spruce 74.47
9 34577 74.28
10 Pine 74.76
11 Magnolia 74.13
12 Cypress 73.96
13 36711 73.79
14 Centerville 73.79
It is useful to sort items in some way and then choose a subset
from the top or bottom, such as the top five. The following MDX functions
enable you to do so.
-
HEAD and
TAIL return the first part or the last part of the set, respectively,
given a member count. For example:
SELECT MEASURES.[%COUNT] ON 0,HEAD(homed.city.MEMBERS,3) ON 1 FROM demomdx
%COUNT
1 Cedar Falls 110
2 Centerville 99
3 Cypress 112
The members of the returned set have the same order as in the
original set.
-
TOPCOUNT and
BOTTOMCOUNT are similar to HEAD and
TAIL, respectively, but also include an optional argument to specify
how to sort the set before extracting the subset.
For
example, the following query returns the top-rated cities, by patient
count.
SELECT MEASURES.[%COUNT] ON 0,TOPCOUNT(homed.city.MEMBERS,4,MEASURES.[%COUNT]) ON 1 FROM demomdx
%COUNT
1 Juniper 122
2 Pine 121
3 Elm Heights 118
4 Magnolia 114
SELECT MEASURES.[avg test score] ON 0,TOPCOUNT(homed.city.MEMBERS,5,MEASURES.[avg test score]) ON 1 FROM demomdx
Avg Test Score
1 Juniper 75.08
2 Redwood 75.07
3 Cedar Falls 75.03
4 Elm Heights 74.96
5 Pine 74.76
-
TOPSUM and
BOTTOMSUM are similar to TOPCOUNT and
BOTTOMCOUNT, respectively. Instead of specifying the number of members
to return, however, you specify a cutoff value that is applied to
a total across the members. For example, you could retrieve the products
that account for the top $5 million in sales.
-
TOPPERCENT and
BOTTOMPERCENT are similar to TOPCOUNT
and BOTTOMCOUNT, respectively. Instead of specifying the number of
members to return, however, you specify a cutoff percentage that is
applied to a total across the members. For example, you could retrieve
the products that account for the top 10% of sales.
The
HIERARCHIZE function
accepts a set of members from the same dimension and returns a set
containing those members in hierarchical order, that is, the order
specified by the hierarchy. For example:
SELECT MEASURES.[%COUNT] ON 0,
HIERARCHIZE({homed.36711,homed.38928,homed.[elm heights],homed.Spruce}) ON 1 FROM demomdx
%COUNT
1 36711 99
2 Spruce 93
3 38928 228
4 Elm Heights 118
If the members belong to different hierarchies in the dimension,
the different hierarchies are returned in an arbitrary order.
Sets are the building blocks of MDX queries. When you write
an MDX query, you must specify a set to use on each axis. DeepSee
supports the following MDX functions that you can use to combine sets:
-
UNION combines two sets (optionally
discarding any duplicate members) and returns a set that contains
all the members of these sets. For example:
WITH SET set1 AS '{allerd.eggs,allerd.soy,allerd.wheat}'
SET set2 AS '{allerd.[dairy products],allerd.pollen,allerd.soy,allerd.wheat}'
SELECT MEASURES.[%COUNT] ON 0, UNION(set1,set2) ON 1 FROM demomdx
%COUNT
1 eggs 32
2 soy 36
3 wheat 52
Because the query does not use UNION with the ALL keyword, duplicates
are removed.
-
INTERSECT examines
two sets and returns a set that contains all the members that are
in both sets, optionally retaining duplicates. For example:
WITH SET set1 AS 'TOPCOUNT(homed.city.members,5,MEASURES.[avg allergy count])'
SET set2 AS 'TOPCOUNT(homed.city.members,5,MEASURES.[avg age])'
SELECT MEASURES.[%COUNT] ON 0, INTERSECT(set1,set2) ON 1 FROM demomdx
%COUNT
1 Magnolia 114
2 Redwood 111
3 Cypress 112
4 Cedar Falls 110
-
EXCEPT examines two
sets and removes the members in the first set that also exist in the
second set, optionally retaining duplicates. For example:
WITH SET set1 AS '{allerd.eggs,allerd.eggs,allerd.soy,allerd.wheat}'
SET set2 AS '{allerd.[diary products],allerd.pollen,allerd.wheat}'
SELECT MEASURES.[%COUNT] ON 0, EXCEPT(set1,set2) ON 1 FROM demomdx
%COUNT
1 eggs 32
2 soy 36
-
CROSSJOIN returns
a set that consists of the cross-product of two sets. Both sets can
consist of members. Or one set can consist of members and the other
set can consist of measures. If both sets contain measures, the DeepSee
engine issues the error
Two measures cannot be crossjoined. For example:
SELECT MEASURES.[%COUNT] ON 0, CROSSJOIN(diagd.diagnoses.MEMBERS,
aged.[age group].MEMBERS) ON 1 FROM demomdx
%COUNT
1 None->0 to 29 389
2 None->30 to 59 333
3 None->60+ 106
4 asthma->0 to 29 40
5 asthma->30 to 59 39
6 asthma->60+ 11
7 CHD->0 to 29 *
8 CHD->30 to 59 12
9 CHD->60+ 25
10 diabetes->0 to 29 1
11 diabetes->30 to 59 20
12 diabetes->60+ 24
13 osteoporosis->0 to 29 *
14 osteoporosis->30 to 59 *
15 osteoporosis->60+ 22
You
can also examine measure values for the members in a set and use those
values to filter the set. To do so, you use the
FILTER function.
The
FILTER function uses
a set and a logical expression. It examines a set and returns the
subset in which the given logical expression is true for each member.
The logical expression typically compares a measure value to a constant
or to another measure value. For example:
SELECT MEASURES.[%COUNT] ON 0, FILTER(homed.city.MEMBERS,MEASURES.[%COUNT]>115) ON 1 FROM demomdx
%COUNT
1 Elm Heights 118
2 Juniper 122
3 Pine 121
It is important to understand that this filtering occurs at
an aggregate level: The measure value is computed for each possible
member in the query. The FILTER function considers those aggregate
values and removes members as appropriate.
You can use the same function with member properties as follows:
SELECT homed.h1.CURRENTMEMBER.PROPERTIES("Population") ON 0,
FILTER(homed.city.MEMBERS,homed.h1.CURRENTMEMBER.PROPERTIES("Population")>20000) ON 1 FROM demomdx
ZIP
1 Cedar Falls 90,000
2 Centerville 49,000
3 Elm Heights 33,194
4 Redwood 29,192
In some cases, a set might contain null elements. For example, the
CROSSJOIN function could potentially
return null elements (as is shown in the preceding section).
If you precede the set expression with the keyword NON EMPTY,
DeepSee suppresses the null elements. For example:
SELECT MEASURES.[%COUNT] ON 0,NON EMPTY CROSSJOIN(diagd.diagnoses.MEMBERS,
aged.[age group].MEMBERS) ON 1 FROM demomdx
%COUNT
1 None->0 to 29 389
2 None->30 to 59 333
3 None->60+ 106
4 asthma->0 to 29 40
5 asthma->30 to 59 39
6 asthma->60+ 11
7 CHD->30 to 59 12
8 CHD->60+ 25
9 diabetes->0 to 29 1
10 diabetes->30 to 59 20
11 diabetes->60+ 24
12 osteoporosis->60+ 22
When
you combine sets, you may want to remove duplicates. This is true
especially when you have created and combined sets in multiple steps.
To be certain that the resulting set has no duplicates, you use the
DISTINCT function.
For example, suppose that the query must return a specific city
as reference, which is needed for comparison to the other cities.
Consider the following query, which displays a reference city, followed
by a set of cities with a given patient count:
WITH SET refcity AS '{homed.juniper}' SELECT MEASURES.[%COUNT] ON 0,
{refcity,FILTER(homed.city.MEMBERS,MEASURES.[%COUNT]>115)} ON 1 FROM demomdx
%COUNT
1 Juniper 122
2 Elm Heights 118
3 Juniper 122
4 Pine 121
Compare to the following query, which removes the duplicate
reference city:
WITH SET refcity AS '{homed.juniper}' SELECT MEASURES.[%COUNT] ON 0,
DISTINCT({refcity,FILTER(homed.city.MEMBERS,MEASURES.[%COUNT]>115)}) ON 1 FROM demomdx
%COUNT
1 Juniper 122
2 Elm Heights 118
3 Pine 121
To count
the elements of a set, use the
COUNT function. For example:
SELECT COUNT(docd.doctor.MEMBERS) ON 0 FROM demomdx
COUNT
41
By default,
COUNT considers
any empty elements and counts them along with the non-empty elements.
If you use the EXCLUDEEMPTY keyword as the second argument, this function
returns the number of non-empty elements.
To see this, first consider the following query:
SELECT aged.[age group].MEMBERS ON 0, diagd.diagnoses.MEMBERS ON 1 FROM demomdx WHERE MEASURES.[%COUNT]
0 to 29 30 to 59 60+
1 None 389 333 106
2 asthma 40 39 11
3 CHD * 12 25
4 diabetes 1 20 24
5 osteoporosis * * 22
The following query counts the number of members of the
Diagnoses level and uses the
WHERE clause to get only patients in the age group 0 to 29:
WITH SET myset AS 'diagd.diagnoses.MEMBERS' SELECT COUNT(myset) ON 0 FROM demomdx WHERE aged.[0 to 29]
COUNT
5
In this query,
COUNT returns
5 because it considers empty members. In contrast:
WITH SET myset AS 'diagd.diagnoses.MEMBERS' SELECT COUNT(myset,EXCLUDEEMPTY) ON 0 FROM demomdx WHERE aged.[0 to 29]
COUNT
3