Overview of Aggregate Functions
Supported Aggregate Functions
An aggregate function performs a task in relation to one or more values from a single column and returns a single value. The supported functions are:
-
SUM — returns the sum of the values of a specified column.
-
AVG — returns the average of the values of the specified column.
-
COUNT — returns the number of rows in a table, or the number of non-null values in a specified column.
-
MAX — returns the maximum value used within a specified column.
-
MIN — returns the minimum value used within a specified column.
-
VARIANCE, VAR_SAMP, VAR_POP — returns the statistical variance of the values of a specified column.
-
STDDEV, STDDEV_SAMP, STDDEV_POP — returns the statistical standard deviation of the values of a specified column.
-
LIST — returns all of the values used within a specified column as a comma-separated list.
-
%DLIST — returns all of the values used within a specified column as elements in a Caché list structure.
-
XMLAGG — returns all of the values used within a specified column as a concatenated string.
-
JSON_ARRAYAGG — returns all of the values used within a specified column as a JSON format array.
Aggregate functions ignore fields that are NULL. For example, LIST and %DLIST do not include elements for rows in which the specified field is NULL. COUNT only counts non-null values of the specified field.
All aggregate functions support the optional DISTINCT keyword clause. This keyword limits the aggregate operation to only distinct (unique) field values. The default is to perform the aggregate operation on all non-NULL values, including duplicate values. The MIN and MAX aggregate functions support the DISTINCT keyword, although it perform no operation.
Aggregate functions support the full DISTINCT keyword clause syntax, including the optional BY(item-list) subclause. Refer to the DISTINCT clause for details.
The aggregate function DISTINCT field1 clause ignores field1 values that are NULL. This differs from the DISTINCT clause of the SELECT statement: a SELECT DISTINCT clause returns one row for the distinct NULL, just as it returns one row for each distinct field value. However, an aggregate function DISTINCT BY(field2) field1 does not ignore the distinct NULL for field2. For example, if FavoriteColors has 50 distinct values and multiple NULLs, the number of DISTINCT rows returned is 51, the COUNT(DISTINCT FavoriteColors) is 50, and the COUNT(DISTINCT BY(FavoriteColors) %ID) is 51:
SELECT DISTINCT FavoriteColors,
COUNT(DISTINCT FavoriteColors),
COUNT(DISTINCT BY(FavoriteColors) %ID)
FROM Sample.Person
Aggregate functions (with the exception of COUNT) cannot be applied to a stream field. Attempting to do so generates an SQLCODE -37 error. You can use COUNT to count stream field values, with some restrictions.
Using Aggregate Functions
An aggregate function can be used in:
-
SELECT list, either as a listed select-item or in a subquery select-item.
-
HAVING clause. However, a HAVING clause must explicitly specify the aggregate function; it cannot specify an aggregate using the corresponding select-item column alias or select-item sequence number.
-
DISTINCT BY clause. However, specifying an aggregate function by itself is not meaningful and always returns a single row. More meaningful is to specify an aggregate function as part of an expression, such as DISTINCT BY(MAX(Age)-Age).
An aggregate function cannot be used directly in:
-
an ORDER BY clause. Attempting to do so generates an SQLCODE -73 error. However, you can use an aggregate function in an ORDER BY clause by specifying the corresponding column alias or select-item sequence number.
-
a WHERE clause. Attempting to do so generates an SQLCODE -19 error.
-
a GROUP BY clause. Attempting to do so generates an SQLCODE -19 error.
-
a TOP clause. Attempting to do so generates an SQLCODE -1 error.
-
a JOIN. Attempting to specify an aggregate in an ON clause generates an SQLCODE -19 error. Attempting to specify an aggregate in a USING clause generates an SQLCODE -1 error.
However, you can supply an aggregate function value to these clauses (with the exception of the TOP clause) by using a subquery supplying a column alias. For example, to use a WHERE clause to select Age values that are less than the average Age value, you can place the AVG aggregate function in a subquery:
SELECT Name,Age,AvgAge
FROM (SELECT Name,Age,AVG(Age) AS AvgAge FROM Sample.Person)
WHERE Age < AvgAge
ORDER BY Age
Queries Combining Aggregates and Fields
Caché SQL allows you to specify an aggregate function with other SELECT items in a query. An aggregate such as COUNT(*) does not need to be in a separate query.
SELECT TOP 5 COUNT(*),Name,AVG(Age)
FROM Sample.Person
ORDER BY Name
When you specify an aggregate function and specify one or more field select items in the select list, Caché SQL returns as many rows as required for the field item:
SELECT DISTINCT Age,AVG(Age),LIST(Age)
FROM Sample.Person
WHERE Age > 75
Aggregate-only Queries
When you specify one or more aggregate functions and specify no field select items in the select list, Caché SQL returns one row. A TOP clause is ignored, unless it is TOP 0 (return no rows):
SELECT TOP 7 AVG(Age),LIST(Name)
FROM Sample.Person
WHERE Age > 75
If the query returns no table data, an aggregates-only query still returns one row (%ROWCOUNT=1). The aggregate host variables are defined and set to the empty string, except for COUNT which is defined and set to 0. This is shown in the following Embedded SQL example:
KILL a,b
&sql(SELECT AVG(Age),LIST(Name) INTO :a,:b FROM Sample.Person
WHERE Age > 999)
WRITE %ROWCOUNT,!
WRITE "#",a,"#",b,"#"
However, if the select-list includes one or more field select items, or if the aggregate-only query contains a GROUP BY clause or a TOP 0 clause, the query returns 0 rows (%ROWCOUNT=0) and the aggregate INTO host variables are undefined.
Column Names and Aliases
By default, the column name assigned to the results of an aggregate function is Aggregate_n, where the n number suffix is the column order number, as specified in the SELECT list. Thus, the following example creates column names Aggregate_2 and Aggregate_5:
SELECT TOP 5 Home_State,COUNT(*),Name,Age,AVG(Age)
FROM Sample.Person
ORDER BY Name
To specify another column name (a column alias), use the AS keyword:
SELECT COUNT(*) AS PersonCount
FROM Sample.Person,Sample.Employee
You can use a column alias to specify an aggregate field in an ORDER BY clause. The following example lists people in the order that their ages diverge from the average age:
SELECT Name,Age,
AVG(Age) AS AvgAge,
ABS(Age - AVG(Age)) AS RelAge
FROM Sample.Person
ORDER BY RelAge
For further details on column aliases, refer to the SELECT statement.
With ORDER BY
The LIST, %DLIST, XMLAGG, and JSON_ARRAYAGG functions combine the values of a table column from multiple rows into a single aggregate value. Because an ORDER BY clause is applied to the query result set after all aggregate fields are evaluated, ORDER BY cannot directly affect the sequence of values within these aggregates. Under certain circumstances, the results of these aggregates may appear in sequential order, but this ordering should not be relied upon. The values listed within a given aggregate result value cannot be explicitly ordered.
With DISTINCT and GROUP BY
A SELECT DISTINCT with a select-item aggregate function and a GROUP BY clause returns the same results as if the DISTINCT keyword were not present. To achieve the desired results, put the aggregate function in a subquery.
For example, you wish to return the number of distinct counts of persons in states (there are states with 4 people, there are states with 6 people, etc.). You would expect to achieve this result as follows:
SELECT DISTINCT COUNT(*) AS PersonCounts
FROM Sample.Person
GROUP BY Home_State
Instead, you get a person count for each state, the same as if the DISTINCT keyword were not present:
SELECT COUNT(*) AS PersonCounts
FROM Sample.Person
GROUP BY Home_State
To achieve your intended result, you need to use a subquery, as follows:
SELECT DISTINCT *
FROM (SELECT COUNT(*) AS PersonCounts FROM Sample.Person
GROUP BY Home_State)
Row Counts
In a query that contains one or more aggregate functions the %ROWCOUNT value depends on the query:
-
If the select-list does not contain any references to fields in the FROM clause table(s), other than fields supplied to aggregate functions, the query returns %ROWCOUNT 1 when the query selects no rows. COUNT returns 0, other aggregate functions return NULL:
ZNSPACE "SAMPLES" SET q1="SELECT COUNT(*) AS NumRows,COUNT(FavoriteColors) AS NumColors,"_ "AVG(Age) AS AvgAge,MAX(Age) AS MaxAge FROM Sample.Person" SET q2="SELECT COUNT(*) AS NumRows,COUNT(FavoriteColors) AS NumColors,"_ "AVG(Age) AS AvgAge,MAX(Age) AS MaxAge FROM Sample.Person WHERE Name='ZZZ'" SET tStatement = ##class(%SQL.Statement).%New() QueryReturnsData SET qStatus = tStatement.%Prepare(q1) IF qStatus'=1 {WRITE "%Prepare failed:" DO $System.Status.DisplayError(qStatus) QUIT} SET rset = tStatement.%Execute() DO rset.%Display() WRITE !,"Rowcount with data:",rset.%ROWCOUNT,!! QueryReturnsNoData SET qStatus = tStatement.%Prepare(q2) IF qStatus'=1 {WRITE "%Prepare failed:" DO $System.Status.DisplayError(qStatus) QUIT} SET rset = tStatement.%Execute() DO rset.%Display() WRITE !,"Rowcount without data:",rset.%ROWCOUNT
The select-item list can include other items, so long as they do not reference fields in the FROM clause table(s):
ZNSPACE "SAMPLES" SET q1="SELECT COUNT(*) AS NumRows,COUNT(Name) AS NumNames,"_ "(SELECT Name FROM Sample.Company) AS SubQ,$LENGTH('this string'),%CLASSNAME "_ "FROM Sample.Person WHERE Name='ZZZ'" SET tStatement = ##class(%SQL.Statement).%New() SET qStatus = tStatement.%Prepare(q1) IF qStatus'=1 {WRITE "%Prepare failed:" DO $System.Status.DisplayError(qStatus) QUIT} SET rset = tStatement.%Execute() DO rset.%Display() WRITE !,"Rowcount:",rset.%ROWCOUNT
However, if this type of query selects no rows and has a GROUP BY clause, it returns %ROWCOUNT 0.
-
A query containing TOP 0 always returns %ROWCOUNT 0. Aggregate functions are not evaluated:
ZNSPACE "SAMPLES" SET q1="SELECT TOP 0 COUNT(*) AS NumRows,COUNT(Name) AS NumNames,"_ "AVG(Age) AS AvgAge FROM Sample.Person" SET tStatement = ##class(%SQL.Statement).%New() SET qStatus = tStatement.%Prepare(q1) IF qStatus'=1 {WRITE "%Prepare failed:" DO $System.Status.DisplayError(qStatus) QUIT} SET rset = tStatement.%Execute() DO rset.%Display() WRITE !,"Rowcount:",rset.%ROWCOUNT
-
A query containing aggregate functions and fields returns the %ROWCOUNT of rows returned. If the result set contains no rows, the query returns %ROWCOUNT 0, COUNT returns NULL:
ZNSPACE "SAMPLES" SET q1="SELECT TOP 4 COUNT(*) AS NumRows,COUNT(Name) AS NumNames,"_ "Name,AVG(Age) AS AvgAge FROM Sample.Person" SET q2="SELECT TOP 4 COUNT(* )AS NumRows,COUNT(Name) AS NumNames,"_ "Name,AVG(Age) AS AvgAge FROM Sample.Person WHERE Name='ZZZ'" SET tStatement = ##class(%SQL.Statement).%New() QueryReturnsData SET qStatus = tStatement.%Prepare(q1) IF qStatus'=1 {WRITE "%Prepare failed:" DO $System.Status.DisplayError(qStatus) QUIT} SET rset = tStatement.%Execute() DO rset.%Display() WRITE !,"Rowcount with data:",rset.%ROWCOUNT,!! QueryReturnsNoData SET qStatus = tStatement.%Prepare(q2) IF qStatus'=1 {WRITE "%Prepare failed:" DO $System.Status.DisplayError(qStatus) QUIT} SET rset = tStatement.%Execute() DO rset.%Display() WRITE !,"Rowcount without data:",rset.%ROWCOUNT
Aggregates, Transactions, and Locking
Including an aggregate function in a query causes the query to return the current state of the data to all result set fields, including uncommitted changes to the data. Thus, an ISOLATION LEVEL READ COMMITTED setting is ignored for a query containing an aggregate function. The current state of uncommitted data is as follows:
-
INSERT and UPDATE: the aggregate calculation does include the modified values, even though these modifications are not yet committed and may be rolled back.
-
DELETE and TRUNCATE TABLE: the aggregate calculation does not include deleted rows, even though these deletions are not yet committed and may be rolled back.
Because aggregate functions usually involve data from a large number of rows, it is not acceptable to issue a transaction lock on all of the rows involved in an aggregate calculation. It is therefore possible that another user may be performing a transaction that modifies the data while an aggregate calculation is in process.
See Also
-
AVG, COUNT, %DLIST, JSON_ARRAYAGG, LIST, MAX, MIN, STDDEV, STDDEV_SAMP, STDDEV_POP, SUM, VARIANCE, VAR_SAMP, VAR_POP, XMLAGG aggregate functions
-
SELECT statement