Caché SQL Reference
DISTINCT
[Home] [Back] [Next]
InterSystems: The power behind what matters   
Class Reference   
Search:    

A SELECT clause that specifies to return only distinct values.
Synopsis
SELECT [DISTINCT [BY (item {,item})] ]  |  [ALL]
  select-item {,select-item2}
Arguments
DISTINCT Optional — Returns rows for which the combined select-item value(s) are unique.
DISTINCT BY (item1 {,item2}) Optional — Returns select-item values for rows for which the BY (item) value(s) are unique.
ALL Optional — Returns all rows in the result set. The default.
Description
The optional DISTINCT clause appears after the SELECT keyword and before the optional TOP clause and the first select-item.
The DISTINCT clause is applied to the result set of the SELECT statement. It limits the rows returned to those that contain a distinct (non-duplicate) value. If no DISTINCT clause is specified, the default is to display all the rows that fulfill the SELECT criteria. The ALL clause is the same as specifying no DEFAULT clause; if you specify ALL, SELECT returns all the rows in the table that fulfill the SELECT criteria.
The DISTINCT clause has two forms:
The DISTINCT clause is applied before the TOP clause. If both are specified, the SELECT returns only rows with unique values, the number of unique value rows specified in the TOP clause.
If the column specified in the DISTINCT clause has rows that are NULL (contain no value), DISTINCT returns one row with NULL as a distinct (unique) value, as shown in the following examples:
SELECT DISTINCT FavoriteColors FROM Sample.Person
 
SELECT DISTINCT BY (FavoriteColors) Name,FavoriteColors FROM Sample.Person
ORDER BY FavoriteColors
 
A DISTINCT clause is not meaningful in an Embedded SQL simple query, because in this type of Embedded SQL a SELECT always returns only one row of data. However, an Embedded SQL cursor–based query can return multiple rows of data; in a cursor-based query a DISTINCT clause returns only unique value rows.
DISTINCT and ORDER BY
The ORDER BY clause is applied before the DISTINCT clause. You can therefore use the combination of DISTINCT and ORDER BY to return values from specific rows.
For example, the following program returns one row for each distinct Home_State value. Because the rows have been ordered by Name value in ascending collation sequence, the row returned for each Home_State is the one with the highest Name value:
SELECT DISTINCT BY (Home_State) Home_State,Name FROM Sample.Person
ORDER BY Name
 
For further details and program examples, refer to the ORDER BY clause reference page.
DISTINCT and GROUP BY
DISTINCT and GROUP BY both group records by a specified field (or fields) and return one record for each unique value of that field. One significant difference between them is that DISTINCT calculates aggregate functions before grouping. GROUP BY calculates aggregate functions after grouping. This difference is shown in the following examples:
SELECT DISTINCT BY (ROUND(Age,-1)) Age,AVG(Age) AS AvgAge FROM Sample.Person
 /* AVG(Age) returns average of all ages in table */
 
SELECT Age,AVG(Age) AS AvgAge FROM Sample.Person GROUP BY ROUND(Age,-1)
 /* AVG(Age) returns an average age for each age group */
 
A DISTINCT clause can be specified with one or more aggregate function fields, though this is rarely meaningful because an aggregate function returns a single value. Thus the following example returns a single row:
SELECT DISTINCT BY (AVG(Age)) Name,Age,AVG(Age) FROM Sample.Person
 
Caution:
If a DISTINCT clause with aggregate functions as the only item or select-item is used with a GROUP BY clause, the DISTINCT clause is ignored. The intended combination of DISTINCT, aggregate function, and GROUP BY can be achieved using a subquery. For further details and program examples, refer to the GROUP BY clause reference page.
Letter Case and DISTINCT Optimization
DISTINCT groups together string values based on the collation type defined for the field. By default, string data type fields are defined with SQLUPPER collation, which is not case-sensitive. The “Collation” chapter of Using Caché SQL provides details on defining the string collation default for the current namespace and specifying a non-default field collation type when defining a field/property.
If the field/property collation type is SQLUPPER, grouped field values are returned in all uppercase letters. To group values by original letter case, or to display the returned values for a grouped field in their original letter case, use the %EXACT collation function. This is shown in the following examples, which assume that the Home_City field is defined with collation type SQLUPPER and contains the values ‘New York’ and ‘new york’:
SELECT DISTINCT BY (Home_City) Name,Home_City FROM Sample.Person
/* groups together Home_City values by their uppercase letter values
   returns the name of each grouped city in uppercase letters. 
   Thus, 'NEW YORK' is returned.                          */
 
SELECT DISTINCT BY (Home_City) Name,%EXACT(Home_City) FROM Sample.Person
/* groups together Home_City values by their uppercase letter values
   returns the name of each grouped city in original letter case. 
   Thus, 'New York' or 'new york' may be returned, but not both. */
 
SELECT DISTINCT BY (%EXACT(Home_City)) Name,Home_City FROM Sample.Person
/* groups together Home_City values by their original letter case
   returns the name of each grouped city in original letter case.
   Thus, both 'New York' and 'new york' are returned.
   Optimization is not used.                       */
 
You can optimize query performance for queries that contain a DISTINCT clause by using the Management Portal. Select [Home] > [Configuration] > [General SQL Settings], then view and edit the DISTINCT Optimization Turned ON option. (This optimization also works for the GROUP BY clause.) The default is “Yes”. For further details, refer to SQL configuration settings described in Caché Advanced Configuration Settings Reference.
You can also set this system-wide option to 1 or 0 with the SetFastDistinct() method:
  WRITE $SYSTEM.SQL.SetFastDistinct(1)
This optimization takes advantage of indices for the selected field(s). It is therefore only meaningful if an index exists for one or more of the selected fields. It collates field values as they are stored in the index; alphabetic strings are returned in all uppercase letters. You can set this system-wide option, then override it for specific queries by using the %EXACT collation function to preserve letter case.
Other Uses of DISTINCT
DISTINCT and %ROWID
Specifying the DISTINCT keyword causes a cursor-based Embedded SQL query to not set the %ROWID variable. %ROWID is not set even when DISTINCT does not limit the rows returned. This is shown in the following example:
   SET %ROWID=999
   &sql(DECLARE EmpCursor CURSOR FOR 
        SELECT DISTINCT Name, Home_State
        INTO :name,:state FROM Sample.Person
        WHERE Home_State %STARTSWITH 'M')
   &sql(OPEN EmpCursor)
   FOR { &sql(FETCH EmpCursor)
        QUIT:SQLCODE  
        WRITE !,"RowID: ",%ROWID," row count: ",%ROWCOUNT
        WRITE " Name=",name," State=",state
  }
   &sql(CLOSE EmpCursor)
 
This change of query behavior only applies to cursor-based Embedded SQL SELECT queries. Dynamic SQL SELECT queries and non-cursor Embedded SQL SELECT queries never set %ROWID.
DISTINCT and Transaction Processing
Specifying the DISTINCT keyword causes a query to retrieve all current data, including data that has not yet been committed by the current transaction. The transaction’s READ COMMITTED isolation mode parameter (if set) is ignored; all data is retrieved in READ UNCOMMITTED mode. For further details, refer to Transaction Processing in the “Modifying the Database” chapter of Using Caché SQL.
Examples
The following query returns one row for each distinct Home_State value:
SELECT DISTINCT Home_State FROM Sample.Person
ORDER BY Home_State
 
The following query returns one row for each distinct Home_State value, but returns additional fields for that row. Which row is retrieved is not predictable:
SELECT DISTINCT BY (Home_State) %ID,Name,Home_State,Office_State FROM Sample.Person
ORDER BY Home_State
 
The following query returns one row for each distinct combination of Home_State and Office_State values. Depending on the data, it will either return more rows or the same number of rows as the previous example:
SELECT DISTINCT BY (Home_State,Office_State) %ID,Name,Home_State,Office_State FROM Sample.Person
ORDER BY Home_State,Office_State
 
The following query uses DISTINCT BY to return one row for each distinct Name length:
SELECT DISTINCT BY ($LENGTH(Name)) Name,$LENGTH(Name) AS lname
FROM Sample.Person
ORDER BY lname
 
The following query uses DISTINCT BY to return one row for each distinct first element of FavoriteColors %List values. It lists one distinct row with FavoriteColors NULL:
SELECT DISTINCT BY ($LIST(FavoriteColors,1)) Name,FavoriteColors,$LIST(FavoriteColors,1) AS FirstColor
FROM Sample.Person
 
The following query returns the first 20 distinct Home_State values retrieved from Sample.Person in ascending collation sequence order. The “top” rows reflect the ORDER BY clause sequencing of all of the rows in Sample.Person.
SELECT DISTINCT TOP 20 Home_State FROM Sample.Person ORDER BY Home_State
 
The following query uses DISTINCT in both the main query and in a WHERE clause subquery. It returns the first 20 distinct Home_State values in Sample.Person that are also in Sample.Employee. If the subquery DISTINCT was not provided, it would retrieve the distinct Home_State values in Sample.Person that match a random selection of Home_State values in Sample.Employee:
SELECT DISTINCT TOP 20 Home_State FROM Sample.Person 
WHERE Home_State IN(SELECT DISTINCT TOP 20 Home_State FROM Sample.Employee)
ORDER BY Home_State
 
The following query returns the first 20 distinct FavoriteColor values. This reflects the ORDER BY clause sequencing of all of the rows in Sample.Person. The FavoriteColors field is known to have NULLs, so one distinct row with FavoriteColors NULL appears at the top of the collation sequence.
SELECT DISTINCT BY (FavoriteColors) TOP 20 FavoriteColors,Name FROM Sample.Person 
      ORDER BY FavoriteColors
 
Also note in the preceding example that because FavoriteColors is a list field, the collation sequence includes the element length byte. Thus distinct list values beginning with a three-letter element (RED) are listed before list values beginning with a four-letter element (BLUE).
See Also