Filtering a Query

This chapter describes ways to filter data in MDX queries. It discusses the following topics:

Introduction to the WHERE Clause

As noted in the section “Including a Simple Filter in a Query,” earlier in this book, an MDX query itself can also include a filter (the WHERE clause). The WHERE clause of an MDX query is commonly referred to as the slicer axis. If the WHERE clause contains only one member, the system accesses only a slice of the cube.

For example, consider the following query:

SELECT {MEASURES.[%COUNT],MEASURES.[avg age]} ON 0, gend.gender.MEMBERS ON 1 FROM demomdx WHERE homed.redwood
 
                       Patient Count              Avg Age
1 Female                         536                36.70
2 Male                           540                36.52

This query accesses only one slice of the cube, the slice of patients whose home city is Redwood. For example:

In this case, the Redwood slice is the only part of the cube that the query considers.

If the WHERE clause uses a set or a tuple, however, the phrase slicer axis is less useful, because in these cases, the cube is not truly being sliced.

Using a Set in the WHERE Clause

More generally, the WHERE clause can contain a set expression instead of a single member expression. In this case MDX combines the records with logical AND. For example, the following query uses only patients whose favorite color is red and patients who are male:

SELECT MEASURES.[%COUNT] ON 0, homed.city.MEMBERS ON 1 FROM demomdx WHERE{colord.red,gend.male}
 
                                    %COUNT
1 Cedar Falls                            66
2 Centerville                            72
3 Cypress                                76
4 Elm Heights                            81
5 Juniper                                74
6 Magnolia                               63
7 Pine                                   71
8 Redwood                                72
9 Spruce                                 58

In this case, the query uses the set {colord.red,gend.male}, which consists of two members. When DeepSee accesses the fact table, it finds the records associated with colord.red and the records associated with gend.male and it uses all those records.

Important:

Each set element is used as a separate slicer axis, and the results of all the slicer axes (of all %FILTER clauses) are aggregated together. This is the process of axis folding (a filter is considered to be a query axis). Axis folding means that if a given source record has a non-null result for each slicer axis, that record is counted multiple times.

In axis folding, values are combined according to the aggregation method for that measure, as specified in the cube definition. (In the examples here, %COUNT is added.)

For more details, see “Axis Folding” in the appendix “How the DeepSee Query Engine Works” in the DeepSee Implementation Guide.

The next section discusses how to filter queries in yet another way.

Using Tuples in the WHERE Clause

In the WHERE clause, you can instead specify a single tuple or a set of tuples. For example:

SELECT MEASURES.[%COUNT] ON 0, NON EMPTY homed.city.MEMBERS ON 1 FROM demomdx 
WHERE (aged.[age group].[60 +],gend.male)
 
                                    %COUNT
1 Cedar Falls                             7
2 Centerville                             9
3 Cypress                          
      12
4 Elm Heights                            14
5 Juniper                                 8
6 Magnolia                                9
7 Pine                                    7
8 Redwood                                 6
9 Spruce                                  2

For another example:

WITH SET myset as '{(aged.[age group].[60 +],diagd.chd),(aged.[age group].[60+],diagd.asthma)}' 
SELECT MEASURES.[%COUNT] ON 0, NON EMPTY homed.city.MEMBERS ON 1 FROM demomdx WHERE myset
 
                                    %COUNT
1 Cedar Falls                             5
2 Centerville                             5
3 Cypress                                 8
4 Elm Heights                             3
5 Juniper                                 3
6 Magnolia                                5
7 Pine                                    2
8 Redwood                                 5

When you filter the query itself, it is often useful to use the NON EMPTY keyword, so that the query returns only the non-null values. Include this keyword at the start of any set expression that might return a null value. For example:

SELECT MEASURES.[%COUNT] ON 0, NON EMPTY homed.city.MEMBERS ON 1 FROM demomdx 
WHERE (aged.[age bucket].[30 to 39],diagd.chd)
 
                                    %COUNT
1 Elm Heights                             1
2 Magnolia                                1

In contrast, if we did not use NON EMPTY, the result would be as follows:

SELECT MEASURES.[%COUNT] ON 0, homed.city.MEMBERS ON 1 FROM demomdx 
WHERE (aged.[age bucket].[30 to 39],diagd.chd)
 
                                    %COUNT
1 Cedar Falls                             *
2 Centerville                             *
3 Cypress                                 *
4 Elm Heights                             1
5 Juniper                                 *
6 Magnolia                                1
7 Pine                                    *
8 Redwood                                 *
9 Spruce                                  *

The %NOT Optimization

It is often necessary to exclude a single member of a level. To do this easily, you can use the %NOT function, which is an InterSystems extension:

SELECT aged.[age bucket].MEMBERS ON 1 FROM patients WHERE aged.[age group].[0 to 29].%NOT
 
 1 0 to 9                                  *
2 10 to 19                                *
3 20 to 29                                *
4 30 to 39                              166
5 40 to 49                              139
6 50 to 59                              106
7 60 to 69                               86
8 70 to 79                               62
9 80+                                    41

Queries that use the %NOT function run more quickly than equivalent queries that use EXCEPT.

The %OR Optimization

Often it is necessary for the WHERE clause to refer to multiple members. For example:

SELECT gend.MEMBERS ON 1 FROM patients WHERE {allerd.[ant bites],allerd.soy,allerd.wheat}
 
 
1 Female                                 56
2 Male                                   59

This query construction, however, means that DeepSee evaluates the query results multiple times (once for each item in the WHERE clause) and then combines them. This can be undesirably slow and can double-count items. (In this example, a given patient can be counted as many as three times, once for each allergy in the WHERE clause.)

With the %OR function, you can rewrite the query as follows:

SELECT gend.MEMBERS ON 1 FROM patients WHERE %OR({allerd.[ant bites],allerd.soy,allerd.wheat})
 
 
1 Female                                 55
2 Male                                   57

Note the numbers are lower, because this query does not double-count any patients. Also, this query is faster than the preceding.

As of release 2014.1, you can use %OR with a set that contains members of different levels (or even that contains tuples). For example:

SELECT NON EMPTY [Measures].[%COUNT] ON 0 FROM [Patients] 
WHERE %OR({[AgeD].[H1].[Age Bucket].&[80+],[DiagD].[H1].[Diagnoses].&[CHD]})
 
                             Patient Count
                                         71