HAVING (SQL)
Synopsis
SELECT field
FROM table GROUP BY field
HAVING condition-expression
SELECT aggregatefunc(field %AFTERHAVING)
FROM table [GROUP BY field]
HAVING condition-expression
Description
The optional HAVING clause appears after the FROM clause and the optional WHERE and GROUP BY clauses, and before the optional ORDER BY clause.
The HAVING clause of a SELECT statement qualifies or disqualifies specific rows from the query selection. The rows that qualify are those for which the condition-expression is true. The condition-expression is a series of logical tests (predicates) which can be linked by the AND and OR logical operators. For further details, see the WHERE clause.
The HAVING clause is like a WHERE clause that can operate on groups, rather than on the full data set. Thus, in most cases, the HAVING clause is used either with an aggregate function using the %AFTERHAVING keyword, or in combination with a GROUP BY clause, or both.
A HAVING clause condition-expression can also specify an aggregate function. A WHERE clause condition-expression cannot specify an aggregate function. This is shown in the following example:
SELECT Name,Age,AVG(Age) AS AvgAge
FROM Sample.Person
HAVING Age > AVG(Age)
ORDER BY Age
A HAVING clause often serves to compare aggregates of sub-populations against aggregates for an entire population.
Specifying a Field
A field specified in a HAVING clause condition-expression or an %AFTERHAVING keyword expression must be specified as a field name or an aggregate function. You cannot specify a field or aggregate function by column number. You cannot specify a field or aggregate function by column alias; attempting to do so generates an SQLCODE -29 error. However, you can use a subquery to define a column alias, then use this alias in the HAVING clause. For example:
SELECT Y AS TeenYear,AVG(Y %AFTERHAVING) AS AvgTeenAge FROM
(SELECT Age AS Y FROM Sample.Person WHERE Age<20)
HAVING Y > 12 ORDER BY Y
Aggregate Functions in the select-item List
The HAVING clause selects which rows to return. By default, this row selection does not determine the value of aggregate functions in the select-item list because the HAVING clause is parsed after aggregate functions in the select-item list.
In the following example, only those rows with Age > 65 are returned. But the AVG(Age) is calculated based on all rows, not just those selected by the HAVING clause:
SELECT Name,Age,AVG(Age) AS AvgAge FROM Sample.Person
HAVING Age > 65
ORDER BY Age
Compare this to a WHERE clause, which selects both which rows to return and which row values to include in aggregate functions in the select-item list:
SELECT Name,Age,AVG(Age) AS AvgAge FROM Sample.Person
WHERE Age > 65
ORDER BY Age
A HAVING clause can be used in a query that only returns aggregate values:
-
Aggregate Threshold: The HAVING clause uses an aggregate threshold to determine whether to return 1 row (containing the query aggregate values) or 0 rows. Thus you can use a HAVING clause to only return an aggregate calculation when an aggregate threshold is achieved. The following example only returns an average of the Age values for all rows in the table when there are at least 100 rows in the table. If there are less than 100 rows, the average of the Age values for all rows might not be deemed meaningful, and therefore should not be returned:
SELECT AVG(Age) FROM Sample.Person HAVING COUNT(*)>99
-
Multiple Rows: A HAVING clause with an aggregate function and no GROUP BY clause returns the number of rows that fulfill the HAVING clause condition. The aggregate function value is calculated based on all of the rows in the table:
SELECT AVG(Age) FROM Sample.Person HAVING %ID<10
This is in contrast to a WHERE clause with an aggregate function, which returns one row. The aggregate function value is calculated based on rows that fulfill the WHERE clause condition:
SELECT AVG(Age) FROM Sample.Person WHERE %ID<10
%AFTERHAVING
The %AFTERHAVING keyword can be used with an aggregate function in the select-item list to specify that the aggregate operation is to be performed after the HAVING clause condition is applied.
SELECT Name,Age,AVG(Age) AS AvgAge,
AVG(Age %AFTERHAVING) AS AvgMiddleAge
FROM Sample.Person
HAVING Age > 40 AND Age < 65
ORDER BY Age
The %AFTERHAVING keyword only gives meaningful results if both of the following considerations are met:
-
The select-item list must contain at least one item that is a non-aggregate field reference. This field reference may be to any field in any table specified in the FROM clause, a field referenced using an implicit join (arrow syntax), the %ID alias, or an asterisk (*).
-
The HAVING clause condition must apply at least one non-aggregate condition. Therefore, HAVING Age>50, HAVING Age>AVG(Age), or HAVING Age>50 AND MAX(Age)>75 are valid conditions, but HAVING Age>50 OR MAX(Age)>75 is not a valid condition.
The following example uses a HAVING clause with a GROUP BY clause to return the state average age, and the state average age for people that are older than the average age for all rows in the table. It also uses a subquery to return the average age for all rows in the table:
SELECT Home_State,(SELECT AVG(Age) FROM Sample.Person) AS AvgAgeAllRecs,
AVG(Age) AS AvgAgeByState,AVG(Age %AFTERHAVING) AS AvgOlderByState
FROM Sample.Person
GROUP BY Home_State
HAVING Age > AVG(Age)
ORDER BY Home_State
Arguments
condition-expression
An expression consisting of one or more boolean predicates governing which data values are to be retrieved.
Logical Predicates
The SQL predicates fall into the following categories:
You cannot use the FOR SOME %ELEMENT collection predicate in a HAVING clause. This predicate can only be used in a WHERE clause.
Predicate Case-Sensitivity
A predicate uses the collation type defined for the field. By default, string data type fields are defined with SQLUPPER collation, which is not case-sensitive. You can define the string collation default for the current namespace and specify a non-default field collation type when defining a field/property.
The %INLIST, Contains operator ([), %MATCHES, and %PATTERN predicates do not use the field’s default collation. They always uses EXACT collation, which is case-sensitive.
A predicate comparison of two literal strings is always case-sensitive.
Predicate Conditions and %NOINDEX
You can preface a predicate condition with the %NOINDEX keyword to prevent the query optimizer using an index on that condition. This is most useful when specifying a range condition that is satisfied by the vast majority of the rows. For example, HAVING %NOINDEX Age >= 1. For further details, refer to Index Optimization Options.
Equality Comparison Predicates
The following are the available comparison predicates:
Predicate | Operation |
---|---|
= | Equals |
<> | Does not equal |
!= | Does not equal |
> | Is greater than |
< | Is less than |
>= | Is greater than or equal to |
<= | Is less than or equal to |
The following example uses a comparison predicate. It returns one record for each Age less than 21:
SELECT Name, Age FROM Sample.Person
GROUP BY Age
HAVING Age < 21
ORDER BY Age
Note that SQL defines comparison operations in terms of collation: the order in which values are sorted. Two values are equal if they collate in exactly the same way. A value is greater than another value if it collates after the second value. String data type field collation is based on the field’s default collation. By default, it is not case-sensitive. Thus, a comparison of two string field values or a comparison of a string field value with a string literal is (by default) not case-sensitive. For example, if Home_State field values are uppercase two-letter strings:
Expression | Value |
---|---|
'MA' = Home_State | TRUE for values MA. |
'ma' = Home_State | TRUE for values MA. |
'VA' < Home_State | TRUE for values VT, WA, WI, WV, WY. |
'ar' >= Home_State | TRUE for values AK, AL, AR. |
Note, however, that a comparison of two literal strings is case-sensitive: WHERE 'ma'='MA' is always FALSE.
BETWEEN Predicate
The following example uses a BETWEEN predicate, which is equivalent to a paired greater than or equal to and less than or equal to. It returns one record for each Age between 18 and 35, inclusive of 18 and 35:
SELECT Name, Age FROM Sample.Person
GROUP BY Age
HAVING Age BETWEEN 18 AND 35
ORDER BY Age
For further details, refer to BETWEEN.
IN and %INLIST Predicates
The IN predicate is used for matching a value to an unstructured series of items.
The %INLIST predicate is an InterSystems IRIS extension for matching a value to the elements of a list structure.
With either predicate you can perform equality comparisons and subquery comparisons.
IN has two formats. The first serves as shorthand for the use of multiple equality comparisons linked together with the OR operator. For instance:
SELECT Name, Home_State FROM Sample.Person
GROUP BY Home_State
HAVING Home_State IN ('ME','NH','VT','MA','RI','CT')
evaluates true if Home_State equals any of the values inside the parenthetical list. The list elements can be constants or expressions. Collation applies to the IN comparison as it applies to an equality test. By default, IN comparisons use the collation type of the field definition; by default string fields are defined as SQLUPPER, which is not case-sensitive.
When dates or times are used for IN predicate equality comparisons, the appropriate data type conversions are automatically performed. If the HAVING clause field is type TimeStamp, values of type Date or Time are converted to Timestamp. If the HAVING clause field is type Date, values of type TimeStamp or String are converted to Date. If the HAVING clause field is type Time, values of type TimeStamp or String are converted to Time.
The following examples both perform the same equality comparisons and return the same data. The GROUP BY field specifies to return only one record for each successful equality comparison. The DOB field is of data type Date:
SELECT Name,DOB FROM Sample.Person
GROUP BY DOB
HAVING DOB IN ({d '1951-02-02'},{d '1987-02-28'})
SELECT Name,DOB FROM Sample.Person
GROUP BY DOB
HAVING DOB IN ({ts '1951-02-02 02:37:00'},{ts '1987-02-28 16:58:10'})
For further details refer to Date and Time Constructs.
The %INLIST predicate can be used to perform an equality comparison on the elements of a list structure. %INLIST uses EXACT collation. Therefore, by default, %INLIST string comparisons are case-sensitive. For further details on list structures, see the SQL $LIST function.
The following example uses %INLIST to match a string value to the elements of the FavoriteColors list field:
SELECT Name,FavoriteColors FROM Sample.Person
HAVING 'Red' %INLIST FavoriteColors
It returns all records where FavoriteColors includes the element “Red”.
The following example matches Home_State column values to the elements of the northne (northern New England states) list:
SELECT Name,Home_State
FROM Sample.Person
HAVING Home_State %INLIST $LISTBUILD("VT","NH","ME")
You can also use IN or %INLIST with a subquery to test whether a column value (or any other expression) equals any of the subquery row values. For example:
SELECT Name,Home_State FROM Sample.Person
HAVING Name IN
(SELECT Name FROM Sample.Employee
HAVING Salary < 50000)
Note that the subquery must have exactly one item in the SELECT list.
%STARTSWITH Predicate
The InterSystems IRIS %STARTSWITH comparison operator permits you to perform partial matching on the initial characters of a string or numeric. The following example uses %STARTSWITH. It selects by age, then returns a record for each Name that begins with “S”:
SELECT Name,Age FROM Sample.Person
WHERE Age > 30
HAVING Name %STARTSWITH 'S'
ORDER BY Name
Like other string field comparisons, %STARTSWITH comparisons are not case-sensitive. For further details, refer to %STARTSWITH.
Contains Operator ([)
The Contains operator is the open bracket symbol: [. It permits you to match a substring (string or numeric) to any part of a field value. The comparison is always case-sensitive. The following example uses the Contains operator in a HAVING clause to select those records in which the Home_State value contains a “K”, and then do an %AFTERHAVING count on those states:
SELECT Home_State,COUNT(Home_State) AS States,
COUNT(Home_State %AFTERHAVING) AS KStates
FROM Sample.Person
HAVING Home_State [ 'K'
FOR SOME Predicate
The FOR SOME predicate of the HAVING clause determines whether or not to return a result set based on a condition test of one or more field values. This predicate has the following syntax:
FOR SOME (table[AS t-alias]) (fieldcondition)
FOR SOME specifies that fieldcondition must evaluate to true; at least one of the field values must match the specified condition. table can be a single table or a comma-separated list of tables, and can optionally take a table alias. fieldcondition specifies one or more conditions for one or more fields within the specified table. Both the table argument and the fieldcondition argument must be delimited by parentheses.
The following example shows the use of the FOR SOME predicate:
SELECT Name,Age
FROM Sample.Person
HAVING FOR SOME (Sample.Person)(Age>20)
ORDER BY Age
In the above example, if at least one field contains an Age value greater than 20, all of the records are returned. Otherwise, no records are returned.
For further details, refer to FOR SOME.
NULL Predicate
This detects undefined values. You can detect all null values, or all non-null values:
SELECT Name, FavoriteColors FROM Sample.Person
HAVING FavoriteColors IS NULL
SELECT Name, FavoriteColors FROM Sample.Person
HAVING FavoriteColors IS NOT NULL
ORDER BY FavoriteColors
Using the GROUP BY clause, you can return one record for each non-null value for a specified field:
SELECT Name, FavoriteColors FROM Sample.Person
GROUP BY FavoriteColors
HAVING FavoriteColors IS NOT NULL
ORDER BY FavoriteColors
For further details, refer to NULL.
EXISTS Predicate
This operates with subqueries to test whether a subquery evaluates to the empty set.
SELECT t1.disease FROM illness_tab t1 WHERE EXISTS
(SELECT t2.disease FROM disease_registry t2
WHERE t1.disease = t2.disease
HAVING COUNT(t2.disease) > 100)
For further details, refer to EXISTS.
LIKE, %MATCHES, and %PATTERN Predicates
These three predicates allow you to perform pattern matching.
-
LIKE allows you to pattern match using literals and wildcards. Use LIKE when you wish to return data values that contain a known substring of literal characters, or contain several known substrings in a known sequence. LIKE uses the collation of its target for letter case comparisons.
-
%MATCHES allows you to pattern match using literals, wildcards, and lists and ranges. Use %MATCHES when you wish to return data values that contain a known substring of literal characters, or contain one or more literal characters that fall within a list or range of possible characters, or contain several such substrings in a known sequence. %MATCHES uses EXACT collation for letter case comparisons.
-
%PATTERN allows you to specify a pattern of character types. For example, '1U4L1",".A' (1 uppercase letter, 4 lowercase letters, one literal comma, followed by any number of letter characters of either case). Use %PATTERN when you wish to return data values that contain a known sequence of character types. %PATTERN is especially useful when the data value is unimportant, but the character type format of those values is significant. %PATTERN can also specify known literal characters. It uses EXACT collation for literal comparisons, which are always case-sensitive.
To perform a comparison with the first characters of a string, use the %STARTSWITH predicate.
Examples
The following example returns a row for each state that has at least one person under the age of 21. For each row it returns the average, minimum, and maximum ages of all people in the state.
SELECT Home_State, MIN(Age) AS Youngest,
AVG(Age) AS AvgAge, MAX(Age) AS Oldest
FROM Sample.Person
GROUP BY Home_State
HAVING Age < 21
ORDER BY Youngest
The following example returns a row for each state that has at least one person under the age of 21. For each row it returns the average, minimum, and maximum ages of all people in the state. Using the %AFTERHAVING keyword, it also returns the average age of those people in the state under the age of 21 (AvgYouth), and the age of the oldest person in the state under the age of 21 (OldestYouth).
SELECT Home_State,AVG(Age) AS AvgAge,
AVG(Age %AFTERHAVING) AS AvgYouth,
MIN(Age) AS Youngest, MAX(Age) AS Oldest,
MAX(Age %AFTERHAVING) AS OldestYouth
FROM Sample.Person
GROUP BY Home_State
HAVING Age < 21
ORDER BY AvgAge
For further examples of %AFTERHAVING, refer to the individual aggregate functions.