Working with Sets
This page discusses how to create and use sets in Business Intelligence.
Also see Accessing the BI Samples.
Introduction to Sets
A set contains zero or more elements, and there are three general kinds of elements: members, scalar values, and tuples (for information on tuples, see Tuples and Cubes).
You use sets on the axes of MDX queries; you also use them to build other sets. Note that although an MDX set can be empty, you cannot use such an empty set as an axis.
Creating Set Expressions
The general syntax for a set expression is as follows:
{expression1, expression2, ...}
This list can include any number of elements. In InterSystems MDX, if the list includes only one element, you can omit the curly braces.
Each set element can be one of the following:
-
A member expression, which is either of the following:
-
An explicit reference to a single member by name.
-
An expression that uses an MDX function to return a single member.
-
-
An expression that uses an MDX function, like MEMBERS, to return a set.
-
A range of members in the same level, as shown in Working With Levels:
member1:membern
-
A scalar value, which is one of the following:
-
A reference to a measure. The expression MEASURES.[Avg Test Score] is a scalar value; it returns either a number or null in all contexts.
-
A numeric or string constant such as 37 or "label".
-
A numeric expression such as (37+3)/2.
-
A percentage literal. For example: 10%
There must be no space between the number and the percent sign.
-
An expression that uses an MDX function to return a scalar value.
For example, the PROPERTIES function returns a scalar value; for an introduction to this function, see Working With Levels.
For another example, the AVG function and other summary functions return scalar values; see Adding Summaries.
-
For complete details, see the InterSystems MDX Reference.
Creating Named Sets
It is often useful to create a set and assign a name to it, so that you can reuse that set in multiple ways. Also, the syntax of a query is often easier to read when you use named sets.
You can create one or more named sets within a query, as follows:
WITH with_clause1 with_clause2 ... SELECT query_details
Where:
-
Each expression with_clause1, with_clause2, and so on has the following syntax:
SET set_name AS 'set_expression'
-
query_details is your MDX query.
Then your query can refer the named set by name in all the places where you can use other set expressions.
For example:
WITH SET testset AS '{homed.city.members}'
SELECT MEASURES.[%COUNT] ON 0, testset ON 1 FROM demomdx
%COUNT
1 Cedar Falls 184
2 Centerville 194
3 Cypress 134
4 Elm Heights 146
5 Juniper 176
6 Magnolia 169
7 Pine 118
8 Redwood 182
9 Spruce 197
This named set is a query-scoped named set; its scope is the query. For information on session-scoped named sets, see CREATE SET Statement, in the InterSystems MDX Reference.
Your cubes might contain additional named sets that you can use in all queries; see Defining Models for InterSystems Business Intelligence.
Order of Members in a Set
Every set has an inherent order (a first member, a second member, and so on).
When you use an MDX function to return a set, that function determines the order of the members in the set. Wherever possible, MDX functions use the natural order of members as specified in the cube definition.
For example, the MEMBERS function returns the members of a level in the order specified in the cube definition (typically in alphabetic order or date order, depending on the level).
When you construct a set as described in the previous section, the order in which you list elements controls the order of the members in the set. For example, suppose that you specify a set as follows:
{gend.gender.MEMBERS,allerd.allergies.MEMBERS}
This set consists of the members of the Gender dimension, followed by the members of the Allergies dimension.
Selecting Subsets
SUBSET returns a set of members from a given set, by position. You specify a set, the starting position, and the number of members to return. The starting position is 0. For example:
SELECT MEASURES.[%COUNT] ON 0,SUBSET(homed.city.MEMBERS,0,3) ON 1 FROM demomdx
%COUNT
1 Cedar Falls 110
2 Centerville 99
3 Cypress 112
The EXCEPT function provides another way to get a subset; see the next section.
Also see Filtering a Query.
Sorting Sets
This section describes ways to sort sets. It discusses the following topics:
Sorting a Set by a Measure Value
It is often useful to sort members by the value of some measure. For example, you might want to sort departments by response, so that you can look at the departments with the slowest responses. Or you might sort products by their sales revenues so that you can look at the top-ranked products.
To return a set in the order that you specify, use the ORDER function. This function takes an argument, typically a measure reference, that specifies the value to use when determining the order of the set members. For example:
SELECT MEASURES.[avg test score] ON 0,
ORDER(homed.city.MEMBERS,MEASURES.[avg test score],BDESC) ON 1 FROM demomdx
Avg Test Score
1 Juniper 75.08
2 Redwood 75.07
3 Cedar Falls 75.03
4 Elm Heights 74.96
5 Pine 74.76
6 Spruce 74.47
7 Magnolia 74.13
8 Cypress 73.96
9 Centerville 73.79
The optional third argument can be one of the following:
-
ASC (the default) — Use this to sort in ascending order, while preserving the hierarchy, if applicable.
-
DESC — Use this to sort in descending order, while preserving the hierarchy, if applicable.
-
BASC — Use this to break the hierarchy and sort all members in ascending order.
-
BDESC — Use this to break the hierarchy and sort all members in descending order.
For example, the following query breaks the hierarchy:
SELECT MEASURES.[avg test score] ON 0,
ORDER(homed.MEMBERS,MEASURES.[avg testscore],BDESC) ON 1 FROM demomdx
Avg Test Score
1 Juniper 75.08
2 Redwood 75.07
3 32007 75.07
4 Cedar Falls 75.03
5 38928 75.00
6 Elm Heights 74.96
7 32006 74.78
8 Pine 74.76
9 Spruce 74.47
10 34577 74.28
11 Magnolia 74.13
12 Cypress 73.96
13 Centerville 73.79
14 36711 73.79
In contrast, the following preserves the hierarchy:
SELECT MEASURES.[avg test score] ON 0,
ORDER(homed.MEMBERS,MEASURES.[avg testscore],DESC) ON 1 FROM demomdx
Avg Test Score
1 32007 75.07
2 Redwood 75.07
3 38928 75.00
4 Cedar Falls 75.03
5 Elm Heights 74.96
6 32006 74.78
7 Juniper 75.08
8 Spruce 74.47
9 34577 74.28
10 Pine 74.76
11 Magnolia 74.13
12 Cypress 73.96
13 36711 73.79
14 Centerville 73.79
Selecting a Top or Bottom Subset
It is useful to sort items in some way and then choose a subset from the top or bottom, such as the top five. The following MDX functions enable you to do so.
-
HEAD and TAIL return the first part or the last part of the set, respectively, given a member count. For example:
SELECT MEASURES.[%COUNT] ON 0,HEAD(homed.city.MEMBERS,3) ON 1 FROM demomdx %COUNT 1 Cedar Falls 110 2 Centerville 99 3 Cypress 112
The members of the returned set have the same order as in the original set.
-
TOPCOUNT and BOTTOMCOUNT are similar to HEAD and TAIL, respectively, but also include an optional argument to specify how to sort the set before extracting the subset.
For example, the following query returns the top-rated cities, by patient count.
SELECT MEASURES.[%COUNT] ON 0,TOPCOUNT(homed.city.MEMBERS,4,MEASURES.[%COUNT]) ON 1 FROM demomdx %COUNT 1 Juniper 122 2 Pine 121 3 Elm Heights 118 4 Magnolia 114
For another example:
SELECT MEASURES.[avg test score] ON 0,TOPCOUNT(homed.city.MEMBERS,5,MEASURES.[avg test score]) ON 1 FROM demomdx Avg Test Score 1 Juniper 75.08 2 Redwood 75.07 3 Cedar Falls 75.03 4 Elm Heights 74.96 5 Pine 74.76
-
TOPSUM and BOTTOMSUM are similar to TOPCOUNT and BOTTOMCOUNT, respectively. Instead of specifying the number of members to return, however, you specify a cutoff value that is applied to a total across the members. For example, you could retrieve the products that account for the top $5 million in sales.
-
TOPPERCENT and BOTTOMPERCENT are similar to TOPCOUNT and BOTTOMCOUNT, respectively. Instead of specifying the number of members to return, however, you specify a cutoff percentage that is applied to a total across the members. For example, you could retrieve the products that account for the top 10% of sales.
Applying Hierarchical Order
The HIERARCHIZE function accepts a set of members from the same dimension and returns a set containing those members in hierarchical order, that is, the order specified by the hierarchy. For example:
SELECT MEASURES.[%COUNT] ON 0,
HIERARCHIZE({homed.36711,homed.38928,homed.[elm heights],homed.Spruce}) ON 1 FROM demomdx
%COUNT
1 36711 99
2 Spruce 93
3 38928 228
4 Elm Heights 118
If the members belong to different hierarchies in the dimension, the different hierarchies are returned in an arbitrary order.
Combining Sets
Sets are the building blocks of MDX queries. When you write an MDX query, you must specify a set to use on each axis. The system supports the following MDX functions that you can use to combine sets:
-
UNION combines two sets (optionally discarding any duplicate members) and returns a set that contains all the members of these sets. For example:
WITH SET set1 AS '{allerd.eggs,allerd.soy,allerd.wheat}' SET set2 AS '{allerd.[dairy products],allerd.pollen,allerd.soy,allerd.wheat}' SELECT MEASURES.[%COUNT] ON 0, UNION(set1,set2) ON 1 FROM demomdx %COUNT 1 eggs 32 2 soy 36 3 wheat 52
Because the query does not use UNION with the ALL keyword, duplicates are removed.
-
INTERSECT examines two sets and returns a set that contains all the members that are in both sets, optionally retaining duplicates. For example:
WITH SET set1 AS 'TOPCOUNT(homed.city.members,5,MEASURES.[avg allergy count])' SET set2 AS 'TOPCOUNT(homed.city.members,5,MEASURES.[avg age])' SELECT MEASURES.[%COUNT] ON 0, INTERSECT(set1,set2) ON 1 FROM demomdx %COUNT 1 Magnolia 114 2 Redwood 111 3 Cypress 112 4 Cedar Falls 110
-
EXCEPT examines two sets and removes the members in the first set that also exist in the second set, optionally retaining duplicates. For example:
WITH SET set1 AS '{allerd.eggs,allerd.eggs,allerd.soy,allerd.wheat}' SET set2 AS '{allerd.[diary products],allerd.pollen,allerd.wheat}' SELECT MEASURES.[%COUNT] ON 0, EXCEPT(set1,set2) ON 1 FROM demomdx %COUNT 1 eggs 32 2 soy 36
Also see The %NOT Optimization.
-
CROSSJOIN returns a set that consists of the cross-product of two sets. Both sets can consist of members. Or one set can consist of members and the other set can consist of measures. If both sets contain measures, the analytics engine issues the error Two measures cannot be crossjoined. For example:
SELECT MEASURES.[%COUNT] ON 0, CROSSJOIN(diagd.diagnoses.MEMBERS, aged.[age group].MEMBERS) ON 1 FROM demomdx %COUNT 1 None->0 to 29 389 2 None->30 to 59 333 3 None->60+ 106 4 asthma->0 to 29 40 5 asthma->30 to 59 39 6 asthma->60+ 11 7 CHD->0 to 29 * 8 CHD->30 to 59 12 9 CHD->60+ 25 10 diabetes->0 to 29 1 11 diabetes->30 to 59 20 12 diabetes->60+ 24 13 osteoporosis->0 to 29 * 14 osteoporosis->30 to 59 * 15 osteoporosis->60+ 22
Note that the MDX shell displays a null value as an asterisk (*). For information on suppressing null values, see Removing Null Elements from a Set.
Also see the NONEMPTYCROSSJOIN function.
Also note that unlike the previous functions, which return sets of members, CROSSJOIN returns a set of tuples (as does NONEMPTYCROSSJOIN). Tuples are discussed in Tuples and Cubes.
Filtering a Set by a Measure or Property Value
You can also examine measure values for the members in a set and use those values to filter the set. To do so, you use the FILTER function.
The FILTER function uses a set and a logical expression. It examines a set and returns the subset in which the given logical expression is true for each member. The logical expression typically compares a measure value to a constant or to another measure value. For example:
SELECT MEASURES.[%COUNT] ON 0, FILTER(homed.city.MEMBERS,MEASURES.[%COUNT]>115) ON 1 FROM demomdx
%COUNT
1 Elm Heights 118
2 Juniper 122
3 Pine 121
It is important to understand that this filtering occurs at an aggregate level: The measure value is computed for each possible member in the query. The FILTER function considers those aggregate values and removes members as appropriate.
You can use the same function with member properties as follows:
SELECT homed.h1.CURRENTMEMBER.PROPERTIES("Population") ON 0,
FILTER(homed.city.MEMBERS,homed.h1.CURRENTMEMBER.PROPERTIES("Population")>20000) ON 1 FROM demomdx
ZIP
1 Cedar Falls 90,000
2 Centerville 49,000
3 Elm Heights 33,194
4 Redwood 29,192
Removing Null Elements from a Set
In some cases, a set might contain null elements. For example, the CROSSJOIN function could potentially return null elements (as is shown in the preceding section).
If you precede the set expression with the keyword NON EMPTY, the system suppresses the null elements. For example:
SELECT MEASURES.[%COUNT] ON 0,NON EMPTY CROSSJOIN(diagd.diagnoses.MEMBERS,
aged.[age group].MEMBERS) ON 1 FROM demomdx
%COUNT
1 None->0 to 29 389
2 None->30 to 59 333
3 None->60+ 106
4 asthma->0 to 29 40
5 asthma->30 to 59 39
6 asthma->60+ 11
7 CHD->30 to 59 12
8 CHD->60+ 25
9 diabetes->0 to 29 1
10 diabetes->30 to 59 20
11 diabetes->60+ 24
12 osteoporosis->60+ 22
Removing Duplicates
When you combine sets, you may want to remove duplicates. This is true especially when you have created and combined sets in multiple steps. To be certain that the resulting set has no duplicates, you use the DISTINCT function.
For example, suppose that the query must return a specific city as reference, which is needed for comparison to the other cities. Consider the following query, which displays a reference city, followed by a set of cities with a given patient count:
WITH SET refcity AS '{homed.juniper}' SELECT MEASURES.[%COUNT] ON 0,
{refcity,FILTER(homed.city.MEMBERS,MEASURES.[%COUNT]>115)} ON 1 FROM demomdx
%COUNT
1 Juniper 122
2 Elm Heights 118
3 Juniper 122
4 Pine 121
Compare to the following query, which removes the duplicate reference city:
WITH SET refcity AS '{homed.juniper}' SELECT MEASURES.[%COUNT] ON 0,
DISTINCT({refcity,FILTER(homed.city.MEMBERS,MEASURES.[%COUNT]>115)}) ON 1 FROM demomdx
%COUNT
1 Juniper 122
2 Elm Heights 118
3 Pine 121
Counting the Elements of a Set
To count the elements of a set, use the COUNT function. For example:
SELECT COUNT(docd.doctor.MEMBERS) ON 0 FROM demomdx
COUNT
41
By default, COUNT considers any empty elements and counts them along with the non-empty elements. If you use the EXCLUDEEMPTY keyword as the second argument, this function returns the number of non-empty elements.
To see this, first consider the following query:
SELECT aged.[age group].MEMBERS ON 0, diagd.diagnoses.MEMBERS ON 1 FROM demomdx WHERE MEASURES.[%COUNT]
0 to 29 30 to 59 60+
1 None 389 333 106
2 asthma 40 39 11
3 CHD * 12 25
4 diabetes 1 20 24
5 osteoporosis * * 22
The following query counts the number of members of the Diagnoses level and uses the WHERE clause to get only patients in the age group 0 to 29:
WITH SET myset AS 'diagd.diagnoses.MEMBERS' SELECT COUNT(myset) ON 0 FROM demomdx WHERE aged.[0 to 29]
COUNT
5
In this query, COUNT returns 5 because it considers empty members. In contrast:
WITH SET myset AS 'diagd.diagnoses.MEMBERS' SELECT COUNT(myset,EXCLUDEEMPTY) ON 0 FROM demomdx WHERE aged.[0 to 29]
COUNT
3