Using MDX with DeepSee
Filtering a Query
|
|
This chapter describes ways to filter data in MDX queries. It
discusses the following topics:
For example, consider the following query:
SELECT {MEASURES.[%COUNT],MEASURES.[avg age]} ON 0, gend.gender.MEMBERS ON 1 FROM demomdx WHERE homed.redwood
Patient Count Avg Age
1 Female 536 36.70
2 Male 540 36.52
This query accesses only one slice of the cube, the slice of
patients whose home city is Redwood. For example:
In this case, the Redwood slice is the only part of the cube
that the query considers.
If the WHERE clause uses a set or a tuple, however, the phrase
slicer axis is less useful, because in these cases, the
cube is not truly being sliced.
More generally, the WHERE clause can contain a set expression
instead of a single member expression. In this case MDX combines the
records with logical AND. For example, the following query uses only
patients whose favorite color is red and patients who are male:
SELECT MEASURES.[%COUNT] ON 0, homed.city.MEMBERS ON 1 FROM demomdx WHERE{colord.red,gend.male}
%COUNT
1 Cedar Falls 66
2 Centerville 72
3 Cypress 76
4 Elm Heights 81
5 Juniper 74
6 Magnolia 63
7 Pine 71
8 Redwood 72
9 Spruce 58
In this case, the query uses the set
{colord.red,gend.male}, which consists of two members. When DeepSee accesses the fact table,
it finds the records associated with
colord.red and the records associated with
gend.male and
it uses all those records.
Important:
Each set element is used as a separate slicer axis, and the
results of all the slicer axes (of all %FILTER clauses) are aggregated
together. This is the process of
axis folding (a filter is considered to be a query axis). Axis folding means
that if a given source record has a non-null result for each slicer
axis, that record is counted multiple times.
In axis folding, values are combined according to the aggregation
method for that measure, as specified in the cube definition. (In
the examples here,
%COUNT is added.)
The next section discusses how to filter queries in yet another
way.
In the
WHERE clause, you can instead specify a single tuple or a set of tuples.
For example:
SELECT MEASURES.[%COUNT] ON 0, NON EMPTY homed.city.MEMBERS ON 1 FROM demomdx
WHERE (aged.[age group].[60 +],gend.male)
%COUNT
1 Cedar Falls 7
2 Centerville 9
3 Cypress
12
4 Elm Heights 14
5 Juniper 8
6 Magnolia 9
7 Pine 7
8 Redwood 6
9 Spruce 2
WITH SET myset as '{(aged.[age group].[60 +],diagd.chd),(aged.[age group].[60+],diagd.asthma)}'
SELECT MEASURES.[%COUNT] ON 0, NON EMPTY homed.city.MEMBERS ON 1 FROM demomdx WHERE myset
%COUNT
1 Cedar Falls 5
2 Centerville 5
3 Cypress 8
4 Elm Heights 3
5 Juniper 3
6 Magnolia 5
7 Pine 2
8 Redwood 5
When you filter the query itself, it is often useful to use
the NON EMPTY keyword, so that the query returns only the non-null
values. Include this keyword at the start of any set expression that
might return a null value. For example:
SELECT MEASURES.[%COUNT] ON 0, NON EMPTY homed.city.MEMBERS ON 1 FROM demomdx
WHERE (aged.[age bucket].[30 to 39],diagd.chd)
%COUNT
1 Elm Heights 1
2 Magnolia 1
In contrast, if we did not use NON EMPTY, the result would be
as follows:
SELECT MEASURES.[%COUNT] ON 0, homed.city.MEMBERS ON 1 FROM demomdx
WHERE (aged.[age bucket].[30 to 39],diagd.chd)
%COUNT
1 Cedar Falls *
2 Centerville *
3 Cypress *
4 Elm Heights 1
5 Juniper *
6 Magnolia 1
7 Pine *
8 Redwood *
9 Spruce *
It is
often necessary to exclude a single member of a level. To do this
easily, you can use the
%NOT function, which is an InterSystems extension:
SELECT aged.[age bucket].MEMBERS ON 1 FROM patients WHERE aged.[age group].[0 to 29].%NOT
1 0 to 9 *
2 10 to 19 *
3 20 to 29 *
4 30 to 39 166
5 40 to 49 139
6 50 to 59 106
7 60 to 69 86
8 70 to 79 62
9 80+ 41
Queries that use the
%NOT function run more quickly than equivalent queries that use EXCEPT.
Often
it is necessary for the WHERE clause to refer to multiple members.
For example:
SELECT gend.MEMBERS ON 1 FROM patients WHERE {allerd.[ant bites],allerd.soy,allerd.wheat}
1 Female 56
2 Male 59
This query construction, however, means that DeepSee evaluates
the query results multiple times (once for each item in the WHERE
clause) and then combines them. This can be undesirably slow and can
double-count items. (In this example, a given patient can be counted
as many as three times, once for each allergy in the WHERE clause.)
With the %OR function, you can rewrite the query as follows:
SELECT gend.MEMBERS ON 1 FROM patients WHERE %OR({allerd.[ant bites],allerd.soy,allerd.wheat})
1 Female 55
2 Male 57
Note the numbers are lower, because this query does not double-count
any patients. Also, this query is faster than the preceding.
As of release 2014.1, you can use %OR with a set that contains
members of different levels (or even that contains tuples). For example:
SELECT NON EMPTY [Measures].[%COUNT] ON 0 FROM [Patients]
WHERE %OR({[AgeD].[H1].[Age Bucket].&[80+],[DiagD].[H1].[Diagnoses].&[CHD]})
Patient Count
71