Overview of Aggregate Functions
Supported Aggregate Functions
An aggregate function performs a task in relation to one or more values from a single column and returns a single value. The supported functions are:
SUM — returns the sum of the values of a specified column.
AVG — returns the average of the values of the specified column.
COUNT — returns the number of rows in a table, or the number of non-null values in a specified column.
MAX — returns the maximum value used within a specified column.
MIN — returns the minimum value used within a specified column.
VARIANCE, VAR_SAMP, VAR_POP — returns the statistical variance of the values of a specified column.
STDDEV, STDDEV_SAMP, STDDEV_POP — returns the statistical standard deviation of the values of a specified column.
LIST — returns all of the values used within a specified column as a comma-separated list.
%DLIST — returns all of the values used within a specified column as elements in an InterSystems IRIS list structure.
XMLAGG — returns all of the values used within a specified column as a concatenated string.
JSON_ARRAYAGG — returns all of the values used within a specified column as a JSON format array.
You can define additional user-defined aggregate functions (UDAFs) using the CREATE AGGREGATE command.
Aggregate functions ignore fields that are NULL. For example, LIST and %DLIST do not include elements for rows in which the specified field is NULL. COUNT only counts non-null values of the specified field.
Aggregate functions (with the exception of COUNT) cannot be applied to a stream field. Attempting to do so generates an SQLCODE -37 error. You can use COUNT to count stream field values, with some restrictions.
Aggregate functions are similar to window functions. However, aggregate functions take the values of a column from a group of rows and return the result as a single value. Window functions take the values of a column from a group of rows and return a value for each row. An aggregate function can be specified in a window function. A window function cannot be specified in an aggregate function. AVG(), MAX(), MIN(), and SUM() can be used as either aggregate functions or window functions.
Using Aggregate Functions
An aggregate function can be used in:
SELECT list, either as a listed selectItem or in a subquery selectItem.
HAVING clause. However, a HAVING clause must explicitly specify the aggregate function; it cannot specify an aggregate using the corresponding selectItem column alias or selectItem sequence number.
DISTINCT BY clause. However, specifying an aggregate function by itself is not meaningful and always returns a single row. More meaningful is to specify an aggregate function as part of an expression, such as DISTINCT BY(MAX(Age)-Age).
An aggregate function cannot be used directly in:
an ORDER BY clause. Attempting to do so generates an SQLCODE -73 error. However, you can use an aggregate function in an ORDER BY clause by specifying the corresponding column alias or selectItem sequence number.
a WHERE clause. Attempting to do so generates an SQLCODE -19 error.
a GROUP BY clause. Attempting to do so generates an SQLCODE -19 error.
a TOP clause. Attempting to do so generates an SQLCODE -1 error.
a JOIN. Attempting to specify an aggregate in an ON clause generates an SQLCODE -19 error. Attempting to specify an aggregate in a USING clause generates an SQLCODE -1 error.
However, you can supply an aggregate function value to these clauses (with the exception of the TOP clause) by using a subquery supplying a column alias. For example, to use a WHERE clause to select Age values that are less than the average Age value, you can place the AVG aggregate function in a subquery:
SELECT Name,Age,AvgAge FROM (SELECT Name,Age,AVG(Age) AS AvgAge FROM Sample.Person) WHERE Age < AvgAge ORDER BY Age
Combining Aggregates and Fields
InterSystems SQL allows you to specify an aggregate function with other SELECT items in a query. An aggregate such as COUNT(*) does not need to be in a separate query.
SELECT TOP 5 COUNT(*),Name,AVG(Age) FROM Sample.Person ORDER BY Name
When you specify an aggregate function and specify no field select items in the select list, InterSystems SQL returns one row. A TOP clause is ignored, unless it is TOP 0 (return no rows):
SELECT TOP 7 AVG(Age),LIST(Age) FROM Sample.Person WHERE Age > 75
When you specify an aggregate function and specify one or more field select items in the select list, InterSystems SQL returns as many rows as required for the field item:
SELECT DISTINCT Age,AVG(Age),LIST(Age) FROM Sample.Person WHERE Age > 75
Column Names and Aliases
By default, the column name assigned to the results of an aggregate function is Aggregate_n, where the n number suffix is the column order number, as specified in the SELECT list. Thus, the following example creates column names Aggregate_2 and Aggregate_5:
SELECT TOP 5 Home_State,COUNT(*),Name,Age,AVG(Age) FROM Sample.Person ORDER BY Name
To specify another column name (a column alias), use the AS keyword:
SELECT COUNT(*) AS PersonCount FROM Sample.Person,Sample.Employee
You can use a column alias to specify an aggregate field in an ORDER BY clause. The following example lists people in the order that their ages diverge from the average age:
SELECT Name,Age, AVG(Age) AS AvgAge, ABS(Age - AVG(Age)) AS RelAge FROM Sample.Person ORDER BY RelAge
For further details on column aliases, refer to the SELECT statement.
With ORDER BY
The LIST, %DLIST, XMLAGG, and JSON_ARRAYAGG functions combine the values of a table column from multiple rows into a single aggregate value. Because an ORDER BY clause is applied to the query result set after all aggregate fields are evaluated, ORDER BY cannot directly affect the sequence of values within these aggregates. Under certain circumstances, the results of these aggregates may appear in sequential order, but this ordering should not be relied upon. The values listed within a given aggregate result value cannot be explicitly ordered.
DISTINCT Keyword Clause
All aggregate functions support the optional DISTINCT keyword clause. This keyword limits the aggregate operation to only distinct (unique) field values. When using default field collation (%SQLUPPER), field values that differ only in lettercase are not considered distinct values. If DISTINCT is not specified, the default is to perform the aggregate operation on all non-NULL values, including duplicate values. The MIN and MAX aggregate functions support the DISTINCT keyword, although it perform no operation.
Aggregate functions support the full DISTINCT keyword clause syntax, including the optional BY(item-list) subclause. Refer to the DISTINCT clause for details.
The aggregate function DISTINCT field1 clause ignores field1 values that are NULL. This differs from the DISTINCT clause of the SELECT statement: a SELECT DISTINCT clause returns one row for the distinct NULL, just as it returns one row for each distinct field value. However, an aggregate function DISTINCT BY(field2) field1 does not ignore the distinct NULL for field2. For example, if FavoriteColors has 50 distinct values and multiple NULLs, the number of DISTINCT rows returned is 51, the COUNT(DISTINCT FavoriteColors) is 50, and the COUNT(DISTINCT BY(FavoriteColors) %ID) is 51:
SELECT DISTINCT FavoriteColors, COUNT(DISTINCT FavoriteColors), COUNT(DISTINCT BY(FavoriteColors) %ID) FROM Sample.Person
With DISTINCT and GROUP BY
A SELECT DISTINCT with a selectItem aggregate function and a GROUP BY clause returns the same results as if the DISTINCT keyword were not present. To achieve the desired results, put the aggregate function in a subquery.
For example, you wish to return the number of distinct counts of persons in states (there are states with 4 people, there are states with 6 people, etc.). You would expect to achieve this result as follows:
SELECT DISTINCT COUNT(*) AS PersonCounts FROM Sample.Person GROUP BY Home_State
Instead, you get a person count for each state, the same as if the DISTINCT keyword were not present:
SELECT COUNT(*) AS PersonCounts FROM Sample.Person GROUP BY Home_State
To achieve your intended result, you need to use a subquery, as follows:
SELECT DISTINCT * FROM (SELECT COUNT(*) AS PersonCounts FROM Sample.Person GROUP BY Home_State)
When a query returns aggregate values, the %ROWCOUNT value depends on the query:
Aggregate functions only: calculates aggregate values and returns %ROWCOUNT 1. If an aggregates-only query selects no rows, it still returns %ROWCOUNT 1: COUNT=0, other aggregate functions return NULL.
Aggregate functions only with GROUP BY: returns aggregate values for each group selected by the GROUP BY clause. %ROWCOUNT is the number of groups selected. If the query selects no rows, the GROUP BY selects no groups, and the query returns %ROWCOUNT 0.
Aggregate functions only with DISTINCT: calculates aggregate values and returns %ROWCOUNT 1. If the query selects no rows, the DISTINCT selects no distinct values, and the query returns %ROWCOUNT 0.
Aggregate functions only with TOP clause: For any non-zero TOP value, calculates aggregate values and returns %ROWCOUNT 1. For TOP=0, returns %ROWCOUNT 0, aggregates are not calculated.
Aggregates with fields: If the query returns field values as well as aggregate functions, the number of rows returned is the number of rows selected. If the query selects no rows, it returns %ROWCOUNT 0 and aggregates are not calculated.
These results are not affected the presence in the selectItem of subqueries or expressions.
Aggregates, Transactions, and Locking
Including an aggregate function in a query causes the query to return the current state of the data to all result set fields, including uncommitted changes to the data. Thus, an ISOLATION LEVEL READ COMMITTED setting is ignored for a query containing an aggregate function. The current state of uncommitted data is as follows:
INSERT and UPDATE: the aggregate calculation does include the modified values, even though these modifications are not yet committed and may be rolled back.
DELETE and TRUNCATE TABLE: the aggregate calculation does not include deleted rows, even though these deletions are not yet committed and may be rolled back.
Because aggregate functions usually involve data from a large number of rows, it is not acceptable to issue a transaction lock on all of the rows involved in an aggregate calculation. It is therefore possible that another user may be performing a transaction that modifies the data while an aggregate calculation is in process.
Aggregates and Sharded Tables
Support for aggregate functions is limited for sharded tables. For example, the aggregate function DISTINCT, %FOREACH, and %AFTERHAVING clauses are not supported for sharded tables. See Querying the Sharded Cluster in the chapter “Horizontally Scaling InterSystems IRIS for Data Volume with Sharding” in the Scalability Guide.
AVG, COUNT, %DLIST, JSON_ARRAYAGG, LIST, MAX, MIN, STDDEV, STDDEV_SAMP, STDDEV_POP, SUM, VARIANCE, VAR_SAMP, VAR_POP, XMLAGG aggregate functions
CREATE AGGREGATE command