Skip to main content

GROUP BY (SQL)

A SELECT clause that groups the resulting rows of a query according to one or more columns.

Synopsis

SELECT ...
GROUP BY field {,field2}

Description

GROUP BY is a clause of the SELECT command. The optional GROUP BY clause appears after the FROM clause and the optional WHERE clause, and before the optional HAVING and ORDER BY clauses.

The GROUP BY clause takes the resulting rows of a query and breaks them up into individual groups according to one or more database columns. When you use SELECT in conjunction with GROUP BY, one row is retrieved for each distinct value of the GROUP BY fields. GROUP BY treats fields with NULL (no specified value) as a separate distinct value group.

The GROUP BY clause is conceptually similar to the InterSystems IRIS aggregate function extension keyword %FOREACH, but GROUP BY operates on an entire query, while %FOREACH allows selection of aggregates on sub-populations without restricting the entire query population.

GROUP BY can be used in the SELECT clause of an INSERT command. GROUP BY cannot be used in an UPDATE or DELETE command.

Specifying a Field

The simplest form of a GROUP BY clause specifies a single field, such as GROUP BY City. This selects one arbitrary row for each unique City value. You can also specify a comma-separated list of fields, the combined value of which is treated as a single grouping term. It selects one arbitrary row for each unique combination of City and Age values. Therefore, GROUP BY City,Age returns the same results as GROUP BY Age,City.

The field(s) must be specified by column name. Valid field values include the following: a column name (GROUP BY City); an %ID (which returns all rows); a scalar function specifying a column name (GROUP BY ROUND(Age,-1)); a collation function specifying a column name (GROUP BY %EXACT(City)).

You cannot specify a field by column alias; attempting to do so generates an SQLCODE -29 error. You cannot specify a field by column number; this is interpreted as a literal and returns one row. You cannot specify an aggregate field; attempting to do so generates an SQLCODE -19 error. You cannot specify a subquery; this is interpreted as a literal and returns one row.

GROUP BY StreamField operates on the OID of a stream field, not its actual data. Because all stream field OIDs are unique values, GROUP BY has no effect on actual stream field duplicate data values. GROUP BY StreamField reduces the number records where the stream field is NULL to one record. For further details, see Storing and Using Stream Data (BLOBs and CLOBs).

A GROUP BY clause can use the arrow syntax (–>) operator to specify a field in a table that is not the base table. For example: GROUP BY Company->Name. For further details, refer to Implicit Joins (Arrow Syntax).

Specifying a literal as the field value in a GROUP BY clause returns 1 row; which row is returned is indeterminate. Thus, specifying 7, 'Chicago', '', 0, or NULL all return 1 row. However, if you specify a literal as a field value in a comma-separated list, the literal is ignored and GROUP BY selects one arbitrary row for each unique combination of the specified field names.

Aggregate Functions with GROUP BY and DISTINCT BY

The GROUP BY clause is applied before aggregate functions are calculated. In the following example, the COUNT aggregate function counts the number of rows in each GROUP BY group:

SELECT Home_State,COUNT(Home_State)
FROM Sample.Person
GROUP BY Home_State

The DISTINCT BY clause is applied after aggregate functions are calculated. In the following example, the COUNT aggregate function counts the number of rows in the entire table:

SELECT DISTINCT BY(Home_State) Home_State,COUNT(Home_State)
FROM Sample.Person

In order to calculate an aggregate function for the entire table, rather than a GROUP BY group, you can specify a select-item subquery:

SELECT Home_State,(SELECT COUNT(Home_State) FROM Sample.Person)
FROM Sample.Person
GROUP BY Home_State

A GROUP BY clause should not be used with a DISTINCT clause when the select list consists of an aggregate field. For example, the following query is intended to return the distinct numbers of people who share the same Home_State:

/* This query DOES NOT apply the DISTINCT keyword */
/* It is provided as a cautionary example  */
SELECT DISTINCT COUNT(*) AS mynum
FROM Sample.Person 
GROUP BY Home_State
ORDER BY mynum

This query did not return the expected results because it did not apply the DISTINCT keyword. To apply both a DISTINCT aggregate and a GROUP BY clause, use a subquery as shown in the following example:

SELECT DISTINCT *
FROM (SELECT COUNT(*) AS mynum
      FROM Sample.Person 
      GROUP BY Home_State) AS Sub
ORDER BY Sub.mynum

This example successfully returns the distinct numbers of people who share the same Home_State. For instance, if any Home_State is shared by 8 people, the query returns an 8.

If a query consist only of aggregate functions and does not return any data from the table, it returns %ROWCOUNT=1 with an empty string (or 0) value for the aggregate functions. For example:

SELECT AVG(Age) FROM Sample.Person WHERE Name %STARTSWITH 'ZZZZ'

However, if this type of query contains a GROUP BY clause, it returns %ROWCOUNT=0 and the aggregate function values remains undefined.

Collation, Letter Case, and Optimization

This section describes how GROUP BY handles data values that differ only in letter case.

  • Group Lettercase Variants Together (return uppercase):

    By default, GROUP BY groups together string values based on the collation specified for the field when it was created. InterSystems IRIS has a default string collation, which can be set for each namespace; the initial string collation default for all namespaces is SQLUPPER. Therefore, commonly, GROUP BY collation is not case-sensitive unless otherwise specified.

    GROUP BY groups together the values of a field with SQLUPPER collation based on their uppercase letter collation. Field values that differ only in letter case are grouped together. Grouped field values are returned in all uppercase letters. This has the performance advantage of allowing GROUP BY to use the index for the field, rather than accessing the actual field values. It is therefore only meaningful if an index exists for one or more of the selected fields. It has the consequence that the GROUP BY field value is returned in all uppercase letters, even if none of the actual data values are in all uppercase letters.

  • Group Lettercase Variants Together (return actual lettercase):

    GROUP BY can group together values that differ in lettercase and return grouped field values with an actual field lettercase value (randomly selecting). This has the advantage that the returned value is an actual value, showing the lettercase of at least one value in the data. It has the performance disadvantage of not being able to use the field’s index. You can specify this for an individual query by applying the %EXACT collation function to the select-item field.

  • Do Not Group Lettercase Variants Together (return actual lettercase):

    GROUP BY can perform case-sensitive grouping of values by applying the %EXACT collation function to the GROUP BY field. This has the advantage of returning every lettercase variant as a separate group. It has the performance disadvantage of not being able to use the field’s index.

You can configure this behavior system-wide for all queries that contain a GROUP BY clause by using the Management Portal. Select System Administration, Configuration, SQL and Object Settings, SQL. View and edit the GROUP BY and DISTINCT queries must produce original values check box. By default, this check box is not selected. This default groups alphabetic values by their uppercase letter collation. (This optimization also works for the DISTINCT clause.) For further details, refer to SQL and Object Settings Pages.

You can also set this option system-wide using the $SYSTEM.SQL.Util.SetOption()Opens in a new tab method FastDistinct option. To determine the current setting, call $SYSTEM.SQL.CurrentSettings()Opens in a new tab, which displays the DISTINCT optimization turned on setting; the default is 1.

This optimization takes advantage of indexes for the selected field(s). It is therefore only meaningful if an index exists for one or more of the selected fields. It collates field values as they are stored in the index; alphabetic strings are returned in all uppercase letters. You can set this system-wide option, then override it for specific queries by using the %EXACT collation function to preserve letter case.

The following examples show these behaviors. These examples assume that Sample.Person contains records with a Home_City field with SQLUPPER collation and values of ‘New York’ and ‘new york’:

SELECT Home_City FROM Sample.Person GROUP BY Home_City
/* groups together Home_City values by their uppercase letter values
   returns the name of each grouped city in uppercase letters. 
   Thus, 'NEW YORK' is returned.                          */
SELECT %EXACT(Home_City) FROM Sample.Person GROUP BY Home_City
/* groups together Home_City values by their uppercase letter values
   returns the name of a grouped city in original letter case. 
   Thus, 'New York' or 'new york' may be returned, but not both. */
SELECT Home_City FROM Sample.Person GROUP BY %EXACT(Home_City)
/* groups together Home_City values by their original letter case
   returns the name of each grouped city in original letter case. 
   Thus, both 'New York' and 'new york' are returned as separate groups. */

%ROWID

Specifying a GROUP BY clause causes a cursor-based Embedded SQL query to not set the %ROWID variable. %ROWID is not set even when GROUP BY does not limit the rows returned. This is shown in the following example:

   SET %ROWID=999
   &sql(DECLARE EmpCursor CURSOR FOR 
        SELECT Name, Home_State
        INTO :name,:state FROM Sample.Person
        WHERE Home_State %STARTSWITH 'M' 
        GROUP BY Home_State)
   &sql(OPEN EmpCursor)
        QUIT:(SQLCODE'=0)
   FOR { &sql(FETCH EmpCursor)
        QUIT:SQLCODE  
        WRITE !,"RowID: ",%ROWID," row count: ",%ROWCOUNT
        WRITE " Name=",name," State=",state
  }
   &sql(CLOSE EmpCursor)

This change of query behavior only applies to cursor-based Embedded SQL SELECT queries. Dynamic SQL SELECT queries and non-cursor Embedded SQL SELECT queries never set %ROWID.

Transaction Committed Changes

A query containing a GROUP BY clause does not support READ COMMITTED isolation level. In a transaction defined as READ COMMITTED, a SELECT statement without a GROUP BY clause returns only data modifications that have been committed; in other words, it returns the state of the data before the current transaction. A SELECT statement with a GROUP BY clause returns all data modifications made, whether or not they have been committed.

Arguments

field

One or more fields from which data is being retrieved. Either a single field name or a comma-separated list of field names.

Example

The following example groups names by their initial letter. It returns the initial letter, the count of names sharing that initial letter, and an example of a one of the name values. Names are grouped using their SQLUPPER collation, regardless of the letter case of the actual values. Note that the Name select-item contains the uppercase initial letter; %EXACT collation is used to display an actual name value:

SELECT Name AS Initial,COUNT(Name) AS SameInitial,%EXACT(Name) AS Example
FROM Sample.Person GROUP BY %SQLUPPER(Name,2)

See Also

FeedbackOpens in a new tab