Filtering a Query
This page describes ways to filter data in Business Intelligence MDX queries.
Also see Accessing the BI Samples.
Introduction to the WHERE Clause
An MDX query itself can also include a filter (the WHERE clause). The WHERE clause of an MDX query is commonly referred to as the slicer axis. If the WHERE clause contains only one member, the system accesses only a slice of the cube.
For example, consider the following query:
SELECT {MEASURES.[%COUNT],MEASURES.[avg age]} ON 0, gend.gender.MEMBERS ON 1 FROM demomdx WHERE homed.redwood
Patient Count Avg Age
1 Female 536 36.70
2 Male 540 36.52
This query accesses only one slice of the cube, the slice of patients whose home city is Redwood. For example:
In this case, the Redwood slice is the only part of the cube that the query considers.
If the WHERE clause uses a set or a tuple, however, the phrase slicer axis is less useful, because in these cases, the cube is not truly being sliced.
Using a Set in the WHERE Clause
More generally, the WHERE clause can contain a set expression instead of a single member expression. In this case MDX combines the records with logical AND. For example, the following query uses only patients whose favorite color is red and patients who are male:
SELECT MEASURES.[%COUNT] ON 0, homed.city.MEMBERS ON 1 FROM demomdx WHERE{colord.red,gend.male}
%COUNT
1 Cedar Falls 66
2 Centerville 72
3 Cypress 76
4 Elm Heights 81
5 Juniper 74
6 Magnolia 63
7 Pine 71
8 Redwood 72
9 Spruce 58
In this case, the query uses the set {colord.red,gend.male}, which consists of two members. When the system accesses the fact table, it finds the records associated with colord.red and the records associated with gend.male and it uses all those records.
Each set element is used as a separate slicer axis, and the results of all the slicer axes (of all %FILTER clauses) are aggregated together. This is the process of axis folding (a filter is considered to be a query axis). Axis folding means that if a given source record has a non-null result for each slicer axis, that record is counted multiple times.
In axis folding, values are combined according to the aggregation method for that measure, as specified in the cube definition. (In the examples here, %COUNT is added.)
For more details, see Axis Folding in the article How the Business Intelligence Query Engine Works in Implementing InterSystems Business Intelligence.
The next section discusses how to filter queries in yet another way.
Using Tuples in the WHERE Clause
In the WHERE clause, you can instead specify a single tuple or a set of tuples. For example:
SELECT MEASURES.[%COUNT] ON 0, NON EMPTY homed.city.MEMBERS ON 1 FROM demomdx
WHERE (aged.[age group].[60 +],gend.male)
%COUNT
1 Cedar Falls 7
2 Centerville 9
3 Cypress
12
4 Elm Heights 14
5 Juniper 8
6 Magnolia 9
7 Pine 7
8 Redwood 6
9 Spruce 2
For another example:
WITH SET myset as '{(aged.[age group].[60 +],diagd.chd),(aged.[age group].[60+],diagd.asthma)}'
SELECT MEASURES.[%COUNT] ON 0, NON EMPTY homed.city.MEMBERS ON 1 FROM demomdx WHERE myset
%COUNT
1 Cedar Falls 5
2 Centerville 5
3 Cypress 8
4 Elm Heights 3
5 Juniper 3
6 Magnolia 5
7 Pine 2
8 Redwood 5
When you filter the query itself, it is often useful to use the NON EMPTY keyword, so that the query returns only the non-null values. Include this keyword at the start of any set expression that might return a null value. For example:
SELECT MEASURES.[%COUNT] ON 0, NON EMPTY homed.city.MEMBERS ON 1 FROM demomdx
WHERE (aged.[age bucket].[30 to 39],diagd.chd)
%COUNT
1 Elm Heights 1
2 Magnolia 1
In contrast, if we did not use NON EMPTY, the result would be as follows:
SELECT MEASURES.[%COUNT] ON 0, homed.city.MEMBERS ON 1 FROM demomdx
WHERE (aged.[age bucket].[30 to 39],diagd.chd)
%COUNT
1 Cedar Falls *
2 Centerville *
3 Cypress *
4 Elm Heights 1
5 Juniper *
6 Magnolia 1
7 Pine *
8 Redwood *
9 Spruce *
The %NOT Optimization
It is often necessary to exclude a single member of a level. To do this easily, you can use the %NOT function, which is an InterSystems extension:
SELECT aged.[age bucket].MEMBERS ON 1 FROM patients WHERE aged.[age group].[0 to 29].%NOT
1 0 to 9 *
2 10 to 19 *
3 20 to 29 *
4 30 to 39 166
5 40 to 49 139
6 50 to 59 106
7 60 to 69 86
8 70 to 79 62
9 80+ 41
Queries that use the %NOT function run more quickly than equivalent queries that use EXCEPT.
The %OR Optimization
Often it is necessary for the WHERE clause to refer to multiple members. For example:
SELECT gend.MEMBERS ON 1 FROM patients WHERE {allerd.[ant bites],allerd.soy,allerd.wheat}
1 Female 56
2 Male 59
This query construction, however, means that the system evaluates the query results multiple times (once for each item in the WHERE clause) and then combines them. This can be undesirably slow and can double-count items. (In this example, a given patient can be counted as many as three times, once for each allergy in the WHERE clause.)
With the %OR function, you can rewrite the query as follows:
SELECT gend.MEMBERS ON 1 FROM patients WHERE %OR({allerd.[ant bites],allerd.soy,allerd.wheat})
1 Female 55
2 Male 57
Note the numbers are lower, because this query does not double-count any patients. Also, this query is faster than the preceding.
You can use %OR with a set that contains members of different levels (or even that contains tuples). For example:
SELECT NON EMPTY [Measures].[%COUNT] ON 0 FROM [Patients]
WHERE %OR({[AgeD].[H1].[Age Bucket].&[80+],[DiagD].[H1].[Diagnoses].&[CHD]})
Patient Count
71