# Working with Sets

This chapter discusses how to create and use sets.

Also see “Accessing the Samples Shown in This Book,” in the first chapter.

## Introduction to Sets

A set contains zero or more elements, and there are three general kinds of elements: members, scalar values, and tuples (for information on tuples, see the next chapter, “Tuples and Cubes”).

You use sets on the axes of MDX queries; you also use them to build other sets. Note that although an MDX set can be empty, you cannot use such an empty set as an axis.

## Creating Set Expressions

The general syntax for a set expression is as follows:

`{expression1, expression2, ...}`

This list can include any number of elements. In InterSystems MDX, if the list includes only one element, you can omit the curly braces.

Each set element can be one of the following:

• A member expression, which is either of the following:

• An explicit reference to a single member by name.

• An expression that uses an MDX function to return a single member.

• An expression that uses an MDX function, like MEMBERS, to return a set.

• A range of members in the same level, as shown in the chapter “Working With Levels”:

`member1:membern`
• A scalar value, which is one of the following:

• A reference to a measure. The expression MEASURES.[Avg Test Score] is a scalar value; it returns either a number or null in all contexts.

• A numeric or string constant such as 37 or "label".

• A numeric expression such as (37+3)/2.

• A percentage literal. For example: 10%

There must be no space between the number and the percent sign.

• An expression that uses an MDX function to return a scalar value.

For example, the PROPERTIES function returns a scalar value; for an introduction to this function, see the chapter “Working With Levels,” earlier in this book.

For another example, the AVG function and other summary functions return scalar values; see the chapter “Adding Summaries,” later in this book.

For complete details, see the InterSystems MDX Reference.

## Creating Named Sets

It is often useful to create a set and assign a name to it, so that you can reuse that set in multiple ways. Also, the syntax of a query is often easier to read when you use named sets.

You can create one or more named sets within a query, as follows:

`WITH with_clause1 with_clause2 ... SELECT query_details`

Where:

• Each expression with_clause1, with_clause2, and so on has the following syntax:

`SET set_name AS 'set_expression'`
• query_details is your MDX query.

Then your query can refer the named set by name in all the places where you can use other set expressions.

For example:

```WITH SET testset AS '{homed.city.members}'
SELECT MEASURES.[%COUNT] ON 0, testset ON 1 FROM demomdx

%COUNT
1 Cedar Falls                         184
2 Centerville                         194
3 Cypress                             134
4 Elm Heights                         146
5 Juniper                             176
6 Magnolia                            169
7 Pine                                118
8 Redwood                             182
9 Spruce                              197```
Note:

This named set is a query-scoped named set; its scope is the query. For information on session-scoped named sets, see “CREATE SET Statement,” in the InterSystems MDX Reference.

Your cubes might contain additional named sets that you can use in all queries; see Defining Models for InterSystems Business Intelligence.

## Order of Members in a Set

Every set has an inherent order (a first member, a second member, and so on).

When you use an MDX function to return a set, that function determines the order of the members in the set. Wherever possible, MDX functions use the natural order of members as specified in the cube definition.

For example, the MEMBERS function returns the members of a level in the order specified in the cube definition (typically in alphabetic order or date order, depending on the level).

When you construct a set as described in the previous section, the order in which you list elements controls the order of the members in the set. For example, suppose that you specify a set as follows:

`{gend.gender.MEMBERS,allerd.allergies.MEMBERS}`

This set consists of the members of the Gender dimension, followed by the members of the Allergies dimension.

## Selecting Subsets

SUBSET returns a set of members from a given set, by position. You specify a set, the starting position, and the number of members to return. The starting position is 0. For example:

```SELECT MEASURES.[%COUNT] ON 0,SUBSET(homed.city.MEMBERS,0,3) ON 1 FROM demomdx

%COUNT
1 Cedar Falls                           110
2 Centerville                            99
3 Cypress                               112
```

The EXCEPT function provides another way to get a subset; see the next section.

Also see the chapter “Filtering a Query,” later in this book.

## Sorting Sets

This section describes ways to sort sets. It discusses the following topics:

### Sorting a Set by a Measure Value

It is often useful to sort members by the value of some measure. For example, you might want to sort departments by response, so that you can look at the departments with the slowest responses. Or you might sort products by their sales revenues so that you can look at the top-ranked products.

To return a set in the order that you specify, use the ORDER function. This function takes an argument, typically a measure reference, that specifies the value to use when determining the order of the set members. For example:

```SELECT MEASURES.[avg test score] ON 0,
ORDER(homed.city.MEMBERS,MEASURES.[avg test score],BDESC) ON 1 FROM demomdx

Avg Test Score
1 Juniper                             75.08
2 Redwood                             75.07
3 Cedar Falls                         75.03
4 Elm Heights                         74.96
5 Pine                                74.76
6 Spruce                              74.47
7 Magnolia                            74.13
8 Cypress                             73.96
9 Centerville                         73.79```

The optional third argument can be one of the following:

• ASC (the default) — Use this to sort in ascending order, while preserving the hierarchy, if applicable.

• DESC — Use this to sort in descending order, while preserving the hierarchy, if applicable.

• BASC — Use this to break the hierarchy and sort all members in ascending order.

• BDESC — Use this to break the hierarchy and sort all members in descending order.

For example, the following query breaks the hierarchy:

```SELECT MEASURES.[avg test score] ON 0,
ORDER(homed.MEMBERS,MEASURES.[avg testscore],BDESC) ON 1 FROM demomdx

Avg Test Score
1 Juniper                            75.08
2 Redwood                            75.07
3 32007                              75.07
4 Cedar Falls                        75.03
5 38928                              75.00
6 Elm Heights                        74.96
7 32006                              74.78
8 Pine                               74.76
9 Spruce                             74.47
10 34577                              74.28
11 Magnolia                           74.13
12 Cypress                            73.96
13 Centerville                        73.79
14 36711                              73.79```

In contrast, the following preserves the hierarchy:

```SELECT MEASURES.[avg test score] ON 0,
ORDER(homed.MEMBERS,MEASURES.[avg testscore],DESC) ON 1 FROM demomdx

Avg Test Score
1 32007                              75.07
2 Redwood                            75.07
3 38928                              75.00
4 Cedar Falls                        75.03
5 Elm Heights                        74.96
6 32006                              74.78
7 Juniper                            75.08
8 Spruce                             74.47
9 34577                              74.28
10 Pine                               74.76
11 Magnolia                           74.13
12 Cypress                            73.96
13 36711                              73.79
14 Centerville                        73.79```

### Selecting a Top or Bottom Subset

It is useful to sort items in some way and then choose a subset from the top or bottom, such as the top five. The following MDX functions enable you to do so.

• HEAD and TAIL return the first part or the last part of the set, respectively, given a member count. For example:

```SELECT MEASURES.[%COUNT] ON 0,HEAD(homed.city.MEMBERS,3) ON 1 FROM demomdx

%COUNT
1 Cedar Falls                           110
2 Centerville                            99
3 Cypress                               112
```

The members of the returned set have the same order as in the original set.

• TOPCOUNT and BOTTOMCOUNT are similar to HEAD and TAIL, respectively, but also include an optional argument to specify how to sort the set before extracting the subset.

For example, the following query returns the top-rated cities, by patient count.

```SELECT MEASURES.[%COUNT] ON 0,TOPCOUNT(homed.city.MEMBERS,4,MEASURES.[%COUNT]) ON 1 FROM demomdx

%COUNT
1 Juniper                               122
2 Pine                                  121
3 Elm Heights                           118
4 Magnolia                              114
```

For another example:

```SELECT MEASURES.[avg test score] ON 0,TOPCOUNT(homed.city.MEMBERS,5,MEASURES.[avg test score]) ON 1 FROM demomdx

Avg Test Score
1 Juniper                             75.08
2 Redwood                             75.07
3 Cedar Falls                         75.03
4 Elm Heights                         74.96
5 Pine                                74.76```
• TOPSUM and BOTTOMSUM are similar to TOPCOUNT and BOTTOMCOUNT, respectively. Instead of specifying the number of members to return, however, you specify a cutoff value that is applied to a total across the members. For example, you could retrieve the products that account for the top \$5 million in sales.

• TOPPERCENT and BOTTOMPERCENT are similar to TOPCOUNT and BOTTOMCOUNT, respectively. Instead of specifying the number of members to return, however, you specify a cutoff percentage that is applied to a total across the members. For example, you could retrieve the products that account for the top 10% of sales.

### Applying Hierarchical Order

The HIERARCHIZE function accepts a set of members from the same dimension and returns a set containing those members in hierarchical order, that is, the order specified by the hierarchy. For example:

```SELECT MEASURES.[%COUNT] ON 0,
HIERARCHIZE({homed.36711,homed.38928,homed.[elm heights],homed.Spruce}) ON 1 FROM demomdx

%COUNT
1 36711                                  99
2 Spruce                                 93
3 38928                                 228
4 Elm Heights                           118
```

If the members belong to different hierarchies in the dimension, the different hierarchies are returned in an arbitrary order.

## Combining Sets

Sets are the building blocks of MDX queries. When you write an MDX query, you must specify a set to use on each axis. The system supports the following MDX functions that you can use to combine sets:

• UNION combines two sets (optionally discarding any duplicate members) and returns a set that contains all the members of these sets. For example:

```WITH SET set1 AS '{allerd.eggs,allerd.soy,allerd.wheat}'
SET set2 AS '{allerd.[dairy products],allerd.pollen,allerd.soy,allerd.wheat}'
SELECT MEASURES.[%COUNT] ON 0, UNION(set1,set2) ON 1 FROM demomdx

%COUNT
1 eggs                                   32
2 soy                                    36
3 wheat                                  52
```

Because the query does not use UNION with the ALL keyword, duplicates are removed.

• INTERSECT examines two sets and returns a set that contains all the members that are in both sets, optionally retaining duplicates. For example:

```WITH SET set1 AS 'TOPCOUNT(homed.city.members,5,MEASURES.[avg allergy count])'
SET set2 AS 'TOPCOUNT(homed.city.members,5,MEASURES.[avg age])'
SELECT MEASURES.[%COUNT] ON 0, INTERSECT(set1,set2) ON 1 FROM demomdx

%COUNT
1 Magnolia                              114
2 Redwood                               111
3 Cypress                               112
4 Cedar Falls                           110
```
• EXCEPT examines two sets and removes the members in the first set that also exist in the second set, optionally retaining duplicates. For example:

```WITH SET set1 AS '{allerd.eggs,allerd.eggs,allerd.soy,allerd.wheat}'
SET set2 AS '{allerd.[diary products],allerd.pollen,allerd.wheat}'
SELECT MEASURES.[%COUNT] ON 0, EXCEPT(set1,set2) ON 1 FROM demomdx

%COUNT
1 eggs                                   32
2 soy                                    36
```

Also see “The %NOT Optimization,” later in this book.

• CROSSJOIN returns a set that consists of the cross-product of two sets. Both sets can consist of members. Or one set can consist of members and the other set can consist of measures. If both sets contain measures, the analytics engine issues the error Two measures cannot be crossjoined. For example:

```SELECT MEASURES.[%COUNT] ON 0, CROSSJOIN(diagd.diagnoses.MEMBERS,
aged.[age group].MEMBERS) ON 1 FROM demomdx

%COUNT
1 None->0 to 29                        389
2 None->30 to 59                       333
3 None->60+                            106
4 asthma->0 to 29                       40
5 asthma->30 to 59                      39
6 asthma->60+                           11
7 CHD->0 to 29                           *
8 CHD->30 to 59                         12
9 CHD->60+                              25
10 diabetes->0 to 29                      1
11 diabetes->30 to 59                    20
12 diabetes->60+                         24
13 osteoporosis->0 to 29                  *
14 osteoporosis->30 to 59                 *
15 osteoporosis->60+                     22
```

Note that the MDX shell displays a null value as an asterisk (*). For information on suppressing null values, see “Removing Null Elements from a Set,” later in this chapter.

Also see the NONEMPTYCROSSJOIN function.

Also note that unlike the previous functions, which return sets of members, CROSSJOIN returns a set of tuples (as does NONEMPTYCROSSJOIN). Tuples are discussed later in this book.

## Filtering a Set by a Measure or Property Value

You can also examine measure values for the members in a set and use those values to filter the set. To do so, you use the FILTER function.

The FILTER function uses a set and a logical expression. It examines a set and returns the subset in which the given logical expression is true for each member. The logical expression typically compares a measure value to a constant or to another measure value. For example:

```SELECT MEASURES.[%COUNT] ON 0, FILTER(homed.city.MEMBERS,MEASURES.[%COUNT]>115) ON 1 FROM demomdx

%COUNT
1 Elm Heights                           118
2 Juniper                               122
3 Pine                                  121```

It is important to understand that this filtering occurs at an aggregate level: The measure value is computed for each possible member in the query. The FILTER function considers those aggregate values and removes members as appropriate.

You can use the same function with member properties as follows:

```SELECT homed.h1.CURRENTMEMBER.PROPERTIES("Population") ON 0,
FILTER(homed.city.MEMBERS,homed.h1.CURRENTMEMBER.PROPERTIES("Population")>20000) ON 1 FROM demomdx

ZIP
1 Cedar Falls                        90,000
2 Centerville                        49,000
3 Elm Heights                        33,194
4 Redwood                            29,192```

## Removing Null Elements from a Set

In some cases, a set might contain null elements. For example, the CROSSJOIN function could potentially return null elements (as is shown in the preceding section).

If you precede the set expression with the keyword NON EMPTY, the system suppresses the null elements. For example:

```SELECT MEASURES.[%COUNT] ON 0,NON EMPTY CROSSJOIN(diagd.diagnoses.MEMBERS,
aged.[age group].MEMBERS) ON 1 FROM demomdx

%COUNT
1 None->0 to 29                        389
2 None->30 to 59                       333
3 None->60+                            106
4 asthma->0 to 29                       40
5 asthma->30 to 59                      39
6 asthma->60+                           11
7 CHD->30 to 59                         12
8 CHD->60+                              25
9 diabetes->0 to 29                      1
10 diabetes->30 to 59                    20
11 diabetes->60+                         24
12 osteoporosis->60+                     22
```

## Removing Duplicates

When you combine sets, you may want to remove duplicates. This is true especially when you have created and combined sets in multiple steps. To be certain that the resulting set has no duplicates, you use the DISTINCT function.

For example, suppose that the query must return a specific city as reference, which is needed for comparison to the other cities. Consider the following query, which displays a reference city, followed by a set of cities with a given patient count:

```WITH SET refcity AS '{homed.juniper}' SELECT MEASURES.[%COUNT] ON 0,
{refcity,FILTER(homed.city.MEMBERS,MEASURES.[%COUNT]>115)} ON 1 FROM demomdx

%COUNT
1 Juniper                               122
2 Elm Heights                           118
3 Juniper                               122
4 Pine                                  121```

Compare to the following query, which removes the duplicate reference city:

```WITH SET refcity AS '{homed.juniper}' SELECT MEASURES.[%COUNT] ON 0,
DISTINCT({refcity,FILTER(homed.city.MEMBERS,MEASURES.[%COUNT]>115)}) ON 1 FROM demomdx

%COUNT
1 Juniper                               122
2 Elm Heights                           118
3 Pine                                  121```

## Counting the Elements of a Set

To count the elements of a set, use the COUNT function. For example:

```SELECT COUNT(docd.doctor.MEMBERS) ON 0 FROM demomdx

COUNT
41```

By default, COUNT considers any empty elements and counts them along with the non-empty elements. If you use the EXCLUDEEMPTY keyword as the second argument, this function returns the number of non-empty elements.

To see this, first consider the following query:

```SELECT aged.[age group].MEMBERS ON 0, diagd.diagnoses.MEMBERS ON 1 FROM demomdx WHERE MEASURES.[%COUNT]

0 to 29           30 to 59                60+
1 None                          389                333                106
2 asthma                         40                 39                 11
3 CHD                             *                 12                 25
4 diabetes                        1                 20                 24
5 osteoporosis                    *                  *                 22
```

The following query counts the number of members of the Diagnoses level and uses the WHERE clause to get only patients in the age group 0 to 29:

```WITH SET myset AS 'diagd.diagnoses.MEMBERS' SELECT COUNT(myset) ON 0 FROM demomdx WHERE aged.[0 to 29]

COUNT
5
```

In this query, COUNT returns 5 because it considers empty members. In contrast:

```WITH SET myset AS 'diagd.diagnoses.MEMBERS' SELECT COUNT(myset,EXCLUDEEMPTY) ON 0 FROM demomdx WHERE aged.[0 to 29]

COUNT
3
```
FeedbackOpens in a new window