Skip to main content

UNION (SQL)

Combines two or more SELECT statements.

Synopsis

select-statement {UNION [ALL] [%PARALLEL] select-statement}

select-statement {UNION [ALL]  [%PARALLEL] (query)}

(query) {UNION [ALL]  [%PARALLEL] select-statement}

(query) {UNION [ALL]  [%PARALLEL] (query)}

Description

A UNION combines two or more queries into a single query that retrieves data into a result. The queries that are combined by a UNION can be simple queries, consisting of a single SELECT statement, or compound queries.

For a union to be possible between SELECT statements, the number of columns specified in each leg must match. Specifying SELECTs with different numbers of columns results in an SQLCODE -9 error. You can specify a NULL column in one SELECT to pair with a data column in another SELECT in order to match the number of columns. For example:

SELECT Name,Salary,BirthDate
FROM Sample.Employee
UNION ALL
SELECT Name,NULL,BirthDate
FROM Sample.Person
Caution:

To use the SELECT * syntax in a UNION, the tables must contain the same number of columns. Therefore, future changes to the table definition by adding or deleting a column may cause unforeseen errors in unions of this sort.

InterSystems SQL determines the result column data types by automatically evaluating all legs of the UNION query and returning the data type with the highest precedence as follows: VARCHAR, DOUBLE, NUMERIC, BIGINT, INTEGER, SMALLINT, TINYINT. Other data types, such as DATE, are not assigned precedence. For example, the following program returns data type TINYINT, even though the DATE data type has a higher precedence in other contexts.

SELECT MyTinyIntField FROM Table1
     UNION ALL
SELECT MyDateField FROM Table2

If you want to return a data type other than the ones listed, you have to use an explicit CAST statement, as shown in the following example:

SELECT CAST(MyTinyInt AS DATE) FROM Table1
     UNION ALL
SELECT MyDateField FROM Table2

If the columns in the legs of the union differ in length, precision, or scale, the result column is assigned the largest value.

Result column names are taken from the name of the column (or column alias) in the first leg of the union. In situations where the corresponding columns in the two legs do not have the same names, it may be useful to use the same column alias in all of the legs to identify the result column.

If any column in any of the UNION legs is nullable, the result column metadata is reported as nullable.

String fields in the UNION result have the collation type of the corresponding SELECT fields, but are assigned EXACT collation if the field collations do not match.

UNION and UNION ALL

An ordinary UNION eliminates duplicate rows (all values identical) from the result. A UNION ALL preserves duplicate rows in the result.

Fields of different precision do not have identical values. For example, the values 33 (data type NUMERIC(9)) and 33.00 (data type NUMERIC(9,2)) are not considered identical.

Fields with different collations do not have identical values. For example, MyStringField and %SQLUPPER(MyStringField) are not considered identical, even if both values are all uppercase.

TOP and ORDER BY Clauses

A UNION statement can conclude with an ORDER BY clause which orders the result. This ORDER BY applies to the whole statement; it must be part of the outermost query, not a subquery. It does not have to be paired with a TOP clause. The following example shows this use of ORDER BY: the two SELECT statements select data, the data is combined by the UNION, then the ORDER BY sequences the results:

SELECT Name,Home_Zip FROM Sample.Person
  WHERE Home_Zip %STARTSWITH 9
UNION
SELECT Name,Office_Zip FROM Sample.Employee
  WHERE Office_Zip %STARTSWITH 8
ORDER BY Home_Zip 

Using a column number in ORDER BY that does not correspond to a SELECT list column results in an SQLCODE -5 error. Using a column name in ORDER BY that does not correspond to a SELECT list column results in an SQLCODE -6 error.

Either SELECT statements (or both) in a union can also contain an ORDER BY clause, but it must be paired with a TOP clause. This ORDER BY is applied to determine which rows are selected by the TOP clause. The following example shows this use of ORDER BY: the two SELECT statements each use an ORDER BY to sequence their rows, which determines which rows are selected as the top rows. The selected data is combined by the UNION, then the final ORDER BY sequences the results:

SELECT TOP 5 Name,Home_Zip FROM Sample.Person
  WHERE Home_Zip %STARTSWITH 9
  ORDER BY Name
UNION
SELECT TOP 5 Name,Office_Zip FROM Sample.Employee
  WHERE Office_Zip %STARTSWITH 8
  ORDER BY Office_Zip
ORDER BY Home_Zip

TOP may apply to the first SELECT in the union, or to the result of the union, depending on the placement of the ORDER BY clause:

  • TOP...ORDER BY applies to UNION result: the UNION is within a FROM clause subquery, and TOP and ORDER BY are applied to the results of the UNION. For example:

    SELECT TOP 10 Name,Home_Zip
      FROM (SELECT Name,Home_Zip FROM Sample.Person
              WHERE Name %STARTSWITH 'A'
            UNION
            SELECT Name,Home_Zip FROM Sample.Person
              WHERE Home_Zip %STARTSWITH 8)
    ORDER BY Home_Zip
  • TOP applies to first SELECT; ORDER BY applies to UNION result. For example:

    SELECT TOP 10 Name,Home_Zip 
      FROM Sample.Person
      WHERE Name %STARTSWITH 'A'
    UNION
    SELECT Name,Home_Zip FROM Sample.Person
      WHERE Home_Zip %STARTSWITH 8
    ORDER BY Home_Zip

Enclosing Parentheses

UNION supports optional enclosing parentheses for either or both of its SELECT statements, or for the entire UNION statement. You may specify one or more pairs of enclosing parentheses. The following are all valid uses of enclosing parentheses:

(SELECT ...) UNION SELECT ...
(SELECT ...) UNION (SELECT ...)
((SELECT ...)) UNION ((SELECT ...))
(SELECT ... UNION SELECT ...)
((SELECT ...) UNION (SELECT ...))

Each use of parentheses generates a separate cached query.

UNION/OR Optimization

By default, SQL automatic optimization transforms UNION subqueries to OR conditions, where deemed appropriate. This UNION/OR transformation allows EXISTS and other low-level predicates to migrate to top-level conditions where they are available to InterSystems IRIS query optimizer indexing. This default transformation is desirable in most situations. However, in some situations this UNION/OR transformation imposes a significant overhead burden. The %NOUNIONOROPT query optimization option disables this automatic UNION/OR transformation for all conditions in the WHERE clause associated with the FROM clause. Thus, in a complex query, you can disable automatic UNION/OR optimization for one subquery while allowing it in other subqueries. For further information on %NOUNIONOROPT, refer to the FROM clause.

If a condition involving a subquery is applied to a UNION, it is applied within each union operand, rather than at the end. This allows subquery optimizations to be applied in each UNION operand. For descriptions of subquery optimization options, refer to the FROM clause. In the following example, the WHERE clause condition is applied to each of the subqueries in the union, rather than to the result of the union:

SELECT Name,Age FROM 
  (SELECT Name,Age FROM Sample.Person
   UNION SELECT Name,Age FROM Sample.Employee)
WHERE Age IN (SELECT TOP 5 Age FROM Sample.Employee WHERE Age>55 ORDER BY Age)

UNION ALL Aggregate Optimization

SQL automatic optimization of a UNION ALL pushes a top-level aggregate into the legs of the union. This can result in significantly improved performance with or without the %PARALLEL keyword, For example:

SELECT COUNT(*) FROM (SELECT item1 FROM table1 UNION ALL SELECT item2 FROM table2) 

is optimized as:

SELECT SUM(y) FROM (SELECT COUNT(*) AS y FROM table1 UNION ALL SELECT COUNT(*) AS y FROM table2) 

This optimization applies to all top-level aggregate functions (not just COUNT), including queries with multiple top-level aggregate functions. For this optimization to be applied, the outer query must be a "onerow" query, with no WHERE or GROUP BY clause, it cannot reference %VID, and the UNION ALL must be the only stream in its FROM clause. The aggregates cannot be nested, and any aggregate function used cannot use %FOREACH() grouping or DISTINCT.

Parallel Processing

The %PARALLEL keyword supports parallelism and distributed processing on a multiprocessor system. It causes InterSystems IRIS to perform parallel processing on the UNION queries, assigning each query to a separate process on the same machine. In some cases that process will send the query to a different machine to be processed. These processes communicate via pipes, with InterSystems IRIS creating one or more temporary files to hold subquery results. The main process combines the resulting rows and returns the final results. For further details, refer to the Show Plan for a UNION query, comparing the Show Plan with and without the %PARALLEL keyword. To determine the number of processors on the current system use the %SYSTEM.Util.NumberOfCPUs()Opens in a new tab method.

In general, the more effort expended to produce each row, the more beneficial %PARALLEL becomes.

Specifying the %PARALLEL keyword disables automatic UNION-to-OR optimizations.

The following examples show the use of the %PARALLEL keyword:

SELECT Name FROM Sample.Employee WHERE Name %STARTSWITH 'A'
UNION %PARALLEL
SELECT Name FROM Sample.Person WHERE Name %STARTSWITH 'A'
ORDER BY Name
SELECT Name FROM Sample.Employee WHERE Name %STARTSWITH 'A'
UNION ALL %PARALLEL
SELECT Name FROM Sample.Person WHERE Name %STARTSWITH 'A'
ORDER BY Name

%PARALLEL is intended for SELECT queries and their subqueries. An INSERT command subquery cannot use %PARALLEL.

Adding the %PARALLEL keyword may not be appropriate for all UNION queries, and may result in an error. The following SQL constructs generally do not support UNION %PARALLEL execution: an outer join, a correlated field, an IN predicate condition containing a subquery, or a collection predicate. UNION %PARALLEL is supported for a FOR SOME predicate, but not for a FOR SOME %ELEMENT collection predicate. To determine if a UNION query can successfully use %PARALLEL, test each leg of the UNION separately. Separately test each leg query by adding a FROM %PARALLEL keyword. If one of the FROM %PARALLEL queries generates a query plan that does not show parallelization, then the UNION query will not support %PARALLEL.

UNION ALL and Aggregate Functions

SQL automatic optimization pushes UNION ALL aggregate functions into the union leg subqueries. SQL calculates the aggregate value for each subquery, and then combines the results to return the original aggregate value. For example:

SELECT COUNT(Name) FROM (SELECT Name FROM Sample.Person
                    UNION ALL SELECT Name FROM Sample.Employee)

Is optimized as:

SELECT SUM(y) FROM (SELECT COUNT(Name) AS y FROM Sample.Person
                    UNION ALL SELECT COUNT(Name) AS y FROM Sample.Employee)

This can result in substantial performance improvement. This optimization is applied with or without the %PARALLEL keyword. This optimization is applied to multiple aggregate functions.

This optimization transform only occurs under the following circumstances:

  • The outer query FROM clause must contain only a UNION ALL statement.

  • The outer query cannot contain a WHERE clause or a GROUP BY clause.

  • The outer query cannot contain a %VID (view ID) field.

  • Aggregate functions cannot contain a DISTINCT or %FOREACH keyword.

  • Aggregate functions cannot be nested.

Arguments

ALL

An optional keyword literal. If specified, duplicate data values are returned. If omitted, duplicate data values are suppressed.

%PARALLEL

An optional argument that specifies the %PARALLEL keyword. If specified, each side of the union is run in parallel as a separate process.

select-statement

A SELECT statement, which retrieves data from a database.

query

A query that combines one or more SELECT statements.

Examples

The following example creates a result that contains a row for every Name found in each of the two tables; if a Name is found in both tables, two rows are created. When the Name is an employee, it lists the office location, concatenated with the word “office” as State, and the employee’s Title. When Name is a person, it lists the home location, concatenated with the word “home” as State, and <null> for Title. The ORDER BY clause operates on the result; the combined rows are ordered by Name:

SELECT Name,Office_State||' office' AS State,Title 
FROM Sample.Employee
UNION
SELECT Name,Home_State||' home',NULL
FROM Sample.Person
ORDER BY Name

The following two examples show the effects of the ALL keyword. In the first example, UNION returns only unique values. In the second example, UNION ALL returns all values, including duplicates:

SELECT Name
FROM Sample.Employee
WHERE Name %STARTSWITH 'A'
UNION
SELECT Name
FROM Sample.Person
WHERE Name %STARTSWITH 'A'
ORDER BY Name
SELECT Name
FROM Sample.Employee
WHERE Name %STARTSWITH 'A'
UNION ALL
SELECT Name
FROM Sample.Person
WHERE Name %STARTSWITH 'A'
ORDER BY Name

See Also

FeedbackOpens in a new tab