Using MDX with DeepSee
Tuples and Cubes
|
|
This chapter discusses two additional key concepts in MDX: tuples
and cubes. It discusses the following topics:
A tuple is a combination of members from different dimensions.
Each tuple has a single value (possibly null).
Every data cell in a result set is a tuple. For example, consider
the following query:
SELECT MEASURES.[%COUNT] ON 0, homed.city.members ON 1 FROM demomdx
%COUNT
1 Cedar Falls 110
2 Centerville 99
3 Cypress 112
4 Elm Heights 118
5 Juniper 122
6 Magnolia 114
7 Pine 121
8 Redwood 111
9 Spruce 93
This query returns a set of nine tuples. For example, the first
tuple is a combination of
Cedar Falls (from the
City dimension) and
%COUNT (from the
Measures dimension).
You can create a tuple directly, via the following syntax:
(member_expr1, member_expr2, member_expr3, ...)
In other implementations of MDX, each of these member expressions
must be associated with a different dimension. This means that a tuple
cannot include more than one member from the same dimension.
In DeepSee MDX, a tuple expression can include more than one
member expression from the same dimension. In most cases, the result
is null, because in most cases, a record belongs to only one member.
However, in DeepSee, a level can be based on a list value, which means
that a given record can belong to multiple members. For example, the
tuple
(allerd.soy,allerd.wheat) represents all
patients who are allergic to both soy and wheat.
A tuple is either fully qualified or partially qualified:
-
If the tuple expression refers to each dimension in
the cube, the tuple is
fully qualified. A fully
qualified tuple refers to a very small number of records and is too
granular to be commonly used.
-
If the tuple expression does not refer to each dimension
in the cube, the tuple is
partially qualified. A partially qualified tuple can be very useful, especially when
used to filter the data used by the query.
If a tuple
refers to only one member, the tuple is equivalent to that member.
For example, the following expressions both access the same data:
The expression
colord.red is a member expression
that refers to the
Red member of the
ColorD dimension.
Each expression accesses only the patients whose favorite color
is red.
You can create sets of tuples, by enclosing a comma-separated
list of tuple expressions within curly braces:
{tuple_expression1, tuple_expression2, ...}
(Note that in other implementations of MDX, for any tuple in
a set, you must construct each tuple in the same way. For example,
if the first tuple uses dimension A in its first list item, all the
other tuples must do so as well. DeepSee MDX does not have this restriction.)
SELECT MEASURES.[%COUNT] ON 0, CROSSJOIN(gend.gender.MEMBERS,homed.city.members) ON 1 FROM demomdx
%COUNT
1 Female->Cedar Falls 58
2 Female->Centerville 41
3 Female->Cypress 51
4 Female->Elm Heights 53
5 Female->Juniper 58
6 Female->Magnolia 58
7 Female->Pine 64
8 Female->Redwood 58
9 Female->Spruce 47
10 Male->Cedar Falls 52
11 Male->Centerville 58
12 Male->Cypress 61
13 Male->Elm Heights 65
14 Male->Juniper 64
15 Male->Magnolia 56
16 Male->Pine 57
17 Male->Redwood 53
18 Male->Spruce 46
You can use these set expressions in all the places where set
expressions are permitted:
-
-
-
As an argument to an MDX function that uses a set
Every tuple has a value (which might be null).
The value of a tuple is determined as follows:
-
DeepSee finds the rows in the fact table that correspond
to all the non-measure members used in the tuple expression.
-
DeepSee then finds values for those rows as follows:
-
If the tuple expression includes a specific measure,
DeepSee finds the value of that measure for each relevant row of the
fact table.
-
If the tuple expression does not include a specific
measure, DeepSee uses the default measure (typically,
%COUNT).
-
DeepSee aggregates those values together, using the
aggregation function specified for the measure.
For example, consider the following tuple:
(homed.32006,colord.red,allerd.[dairy products],MEASURES.[avg test score])
To determine the value of this tuple, DeepSee finds all the
patients in the fact table that belong to the 32006 ZIP code, and
whose favorite color is red, and who are allergic to dairy products.
DeepSee then accesses the values for the
Test Score measure for those patients and averages those values.
For another example, consider the following tuple (permitted
in DeepSee MDX):
(allerd.soy,allerd.wheat)
To determine the value of this tuple, DeepSee counts the patients
who are allergic to both soy and to wheat.
Finally, consider the following tuple:
(homed.juniper,homed.centerville)
To determine the value of this tuple, DeepSee counts the patients
whose home city is Juniper and whose home city is Centerville. The
value of this tuple is null, because each patient has one home city.
A tuple expression can refer to a member at any level in any
hierarchy of a dimension. Consider the following dimension (from the
Patients cube) which includes one hierarchy with four levels:
You can create tuples that use members of any of these levels.
For example, you can use any of the following tuple expressions:
(aged.[all patients])
(aged.[0 to 29])
(aged.5)
For another example, let us create variations of the preceding
expressions. In this case, let us include members of other dimensions
in the tuple expressions:
(aged.[all patients],gend.male)
(aged.[0 to 29],diagd.asthma)
(aged.5,allerd.soy,colord.red)
You can use sets of tuples as axes of a query. The following
example shows the simplest case, a set that consists of one tuple:
SELECT MEASURES.[%COUNT] ON 0, (homed.juniper,allerd.wheat,aged.[20 to 29]) ON 1 FROM demomdx
%COUNT
Juniper->wheat->20 to 29 1
The following example shows a set of tuples used as a query
axis:
WITH SET myset AS
'{(homed.[cedar falls],allerd.soy,colord.red),(homed.magnolia,allerd.soy,colord.green),
(homed.34577,allerd.eggs,colord.green)}'
SELECT MEASURES.[%COUNT] ON 0, myset ON 1 FROM demomdx
%COUNT
1 Cedar Falls->soy->Red *
2 Magnolia->soy->Green 1
3 34577->eggs->Green *
For another example, the following is a valid query in DeepSee
MDX:
WITH SET myset AS
'{(homed.[cedar falls],allerd.soy,colord.green),(colord.red,allerd.soy,homed.pine,gend.male)}'
SELECT MEASURES.[%COUNT] ON 0, myset ON 1 FROM demomdx
%COUNT
1 Cedar Falls->soy->Green *
2 Red->soy->Pine->Male *
Finally, the following example uses tuples that refer multiple
times to a single dimension:
SELECT MEASURES.[%COUNT] ON 0,
{(allerd.soy,allerd.wheat),(homed.juniper,homed.centerville)} ON 1 FROM demomdx
%COUNT
1 soy->wheat 4
2 Juniper->Centerville *
There are 4 patients who are allergic to both soy and wheat.
There are no patients with two home cities.
A cube is an n-dimensional structure that contains one
axis (or
edge) for each dimension.
The cells of this cube are tuples. An MDX query retrieves specific
tuples from the cube.
It is useful to visualize this cube, at least in simple cases.
The
DemoMDX cube has 10 dimensions (including the
Measures dimension). For the sake of simplicity, the following
figure shows three of those dimensions (
HomeD,
GenD, and
Measures). Note that only
three measures are actually shown.
Each axis of the cube is divided into segments, with one segment
for each of the lowest-level members of the corresponding dimension.
For the
HomeD axis, these segments are the members
of the
City level.
Each cell in the cube is a fully qualified tuple. Each tuple
has a value, as shown in the figure.
An MDX query is a request for a set of tuples, each of which
has a value. Consider the following query:
SELECT CROSSJOIN(MEASURES.[%COUNT],gend.gender.MEMBERS) ON 0, homed.city.MEMBERS ON 1 FROM demomdx
Female Male
1 Cedar Falls 569 571
2 Centerville 625 560
3 Cypress 575 543
4 Elm Heights 545 560
5 Juniper 570 546
6 Magnolia 566 503
7 Pine 562 563
8 Redwood 536 540
9 Spruce 550 516
Each measure that is aggregated by addition (such as
Age) is contained directly in the cube. For other measures,
MDX uses values from the cube and aggregates them as specified in
the measure definition.
For example, the
Avg Age measure is not contained
directly in the cube, but the
Age measure is; the
Age measure contains the cumulative age of all the patients
represented in a tuple. To calculate the
Avg Age measure, MDX divides
Age by
%COUNT. Consider the following query:
SELECT CROSSJOIN(MEASURES.[avg age],gend.gender.MEMBERS) ON 0, homed.city.members ON 1 FROM demomdx
Female Male
1 Cedar Falls 36.90 34.56
2 Centerville 35.98 34.68
3 Cypress 37.02 33.55
4 Elm Heights 36.87 34.05
5 Juniper 38.09 34.26
6 Magnolia 35.64 35.03
7 Pine 36.64 33.38
8 Redwood 36.70 36.52
9 Spruce 37.90 32.93
For now, we consider only dimensions that contain a single hierarchy.
For any dimension, only the lowest level is represented directly
on the corresponding cube axis.
For example, the following figure shows all the levels of the
HomeD dimension:
Notice that the
HomeD axis includes only
the leaf members of this dimension that is, only the members
of its lowest level. The higher levels consist of combinations of
lower members. For example, each member of the
ZIP level consists of one or more members of the
City dimension.
Now consider the following query:
SELECT CROSSJOIN(MEASURES.[%COUNT],gend.gender.MEMBERS) ON 0, homed.zip.members ON 1 FROM demomdx
Female Male
1 32006 1,120 1,062
2 32007 536 540
3 34577 1,703 1,609
4 36711 625 560
5 38928 1,114 1,131
For this query, DeepSee finds the relevant tuples of the cube
and obtains their values.
A dimension can have multiple hierarchies. For a dimension that
includes multiple hierarchies, the corresponding axis of the cube
contains one segment for each member of the lowest level in each hierarchy.
Consider the following theoretical cube:
Members of cube 'theoretical':
-----------------------------------------
...
Dimensions
...
Sales Date
H1
Sales Year
Sales Period
Sales Date
H2
Sales Quarter
...
The
Sales Date dimension contains two hierarchies.
The
H1 hierarchy has three levels:
The other hierarchy contains only one level.
(For reasons of space, the picture of the cube is not divided
into tuples.)
When a query uses, for example, the
Sales Quarter level, DeepSee uses the appropriate part of this axis and accesses
the requested tuples.