Retrieves rows from one or more tables within a database.
Synopsis
SELECT [DISTINCT | ALL] [TOP int]
  select-item
  [INTO host-variable-list]
  FROM [%INORDER | %FULL] table-ref [[AS] t-alias]
    {,table-ref [[AS] t-alias]}
  [WHERE condition-expression]
  [GROUP BY scalar-expression]
  [HAVING condition-expression]
  [ORDER BY item-order-list [ASC | DESC] ]

select-item ::= 
  [t-alias.]*
  [t-alias.]scalar-expression [[AS] c-alias]
    {,[t-alias.]scalar-expression [[AS] c-alias]}
Arguments
DISTINCT
ALL
Optional — The DISTINCT keyword specifies that each row returned must contain a unique value for the specified field. When this keyword is in use, SELECT discards extra rows.
The ALL keyword specifies that all rows that meet the SELECT criteria be returned. This is the default for Caché SQL; the ALL keyword is provided for SQL compatibility.
TOP int Optional — The TOP keyword limits the number of rows returned to the number specified in int. These can be the “top” rows stored in the database (unpredictable order), or the top rows after sequencing by an ORDER BY clause. The DISTINCT keyword (if specified) is applied before TOP, specifying that int number of unique values are to be returned. The int argument in mandatory. It can be either an integer, or a variable that resolves to an integer. To force the parsing of int as an integer, you can enclose it in parentheses. If no TOP keyword is specified, the default is to display all the rows that meet the SELECT criteria.
select-item One or more columns to be retrieved. Multiple columns can be specified as a comma-separated list. You can also retrieve all columns by using the * symbol.
INTO host-variable-list Optional — One or more host variables into which select-item values are placed. See INTO clause for more details.
FROM table-ref The table or view from which data is being retrieved. Multiple tables can be specified as a comma-separated list. Any combination of tables or views can be specified. If you specify more than one table-ref here, Caché performs a JOIN on the tables and retrieves data from the results table of the JOIN operation. You can optionally assign an alias (t-alias) to each table-ref. You can specify the %INORDER and %FULL keywords, which optimize query execution. See FROM clause for more details.
WHERE condition-expression Optional — A qualifier specifying parameters for the data that is to be retrieved. See WHERE clause for more details.
GROUP BY scalar-expression Optional — A comma-separated list of one or more scalar expressions specifying how the retrieved data is to be organized; these may include column names. See GROUP BY clause for more details.
HAVING condition-expression Optional — A qualifier specifying parameters for the data that is to be retrieved. See HAVING clause for more details.
ORDER BY item-order-list Optional — A select-item or a comma-separated list of items that specify the order in which rows are displayed. Each item can have an optional ASC (ascending order) or DESC (descending order). The default is ascending order. An ORDER BY clause can only be used on the results of a query. It cannot be used with a SELECT statement that is a subquery (for example, in a UNION statement). See ORDER BY clause for more details.
scalar-expression A field identifier, an expression containing a field identifier, or a general expression, such as a function call or an arithmetic operation.
t-alias Optional — An alias for a table-ref (table or view) name.
c-alias Optional — An alias for a column (field) name.
Description
The SELECT statement retrieves data from a Caché database. In its simplest form, it retrieves one or more items from a single table; in this case, the items are specified by the select-item list and the table is specified by the FROM table-ref clause.
Caché returns a status variable SQLCODE, which indicates the success or failure of the SELECT. In addition, the SELECT operation sets the %ROWCOUNT local variable to the number of selected rows. Successful completion of a SELECT generally sets SQLCODE=0 and %ROWCOUNT to the number of rows selected. In the case of an embedded SQL containing a simple SELECT, data from (at most) one row is selected, so SQLCODE=0 and %ROWCOUNT is set to either 0 or 1. However, in the case of an embedded SQL SELECT that declares a cursor and fetches data from multiple rows, the operation only completes when the cursor has been advanced to the end of the data (SQLCODE=100); at that point, %ROWCOUNT is set to the total number of rows selected.
You can use a SELECT statement either as its own independent query, or as part of a larger statement, in which it is known as a subquery. A subquery is enclosed in parentheses.
The UNION statement allows you to combine one or more SELECT statements into a single query. If a query consists of a single SELECT statement, then the SELECT statement can conclude with an ORDER BY statement; if a query is a UNION of multiple SELECT statements, then the UNION statement can conclude with an ORDER BY statement.
Required Clauses
The following are required clauses for all SELECT statements:
Optional Clauses
The following optional clauses operate on the virtual table that a FROM clause returns. All are optional, but, if used, must appear in the order specified:
The DISTINCT Keyword
The DISTINCT keyword causes redundant field values to be eliminated:
SELECT DISTINCT Home_State FROM Sample.Person
 
This returns one row for each distinct state name. Note that because DISTINCT returns distinct values, rather than specific rows, listing more than one item retrieves all values that are distinct for both items. For example:
SELECT DISTINCT Home_State,Age FROM Sample.Person
 
Retrieves every record that contains a distinct combination of Home_State and Age values.
If the column specified in the DISTINCT clause contains NULL values, DISTINCT returns one row with a value of NULL, as shown in the following example.
SELECT DISTINCT FavoriteColors FROM Sample.Person
 
The syntax DISTINCT * is legal, but not meaningful, since all rows, by definition, contain some distinct unique identifier.
A DISTINCT clause can be specified with one or more aggregate fields:
SELECT DISTINCT Age, AVG(Age) FROM Sample.Person
 
The DISTINCT keyword is not meaningful in Embedded SQL, because a SELECT in Embedded SQL always returns only one row of data.
The TOP Keyword
The TOP keyword specifies that the SELECT statement return only a specified number of rows. It returns the specified number of rows that appear at the “top” of the returned virtual table. By default, which rows are the “top” rows of the table is unpredictable. However, Caché applies the DISTINCT and ORDER BY keywords (if specified) before selecting the TOP rows.
When int is an integer, it specifies the number of rows to return. When int is a parameter (enclosed in parentheses), the number of rows to return is specified by a variable.
The TOP keyword cannot be used in a sub-query.
TOP is generally used in a SELECT with an ORDER BY clause. Note that the default ascending ORDER BY collation sequence considers NULL to be the lowest (“top”) value, followed by the empty string ('').
The following query returns the first 20 rows retrieved from Sample.Person in the order that they are stored in the database.
SELECT TOP 20 Home_State,Name FROM Sample.Person
 
The following query returns the first 20 distinct Home_State values retrieved from Sample.Person in ascending collation sequence order.
SELECT DISTINCT TOP 20 Home_State FROM Sample.Person ORDER BY Home_State
 
The following query returns the first 20 distinct FavoriteColor values. The “top” rows reflect the ORDER BY clause sequencing of all of the rows in Sample.Person in descending (DESC) collation sequence. Descending collation sequence is used rather than the default ascending collation sequence because the FavoriteColors field is known to have NULLs, which would appear at the top of the ascending collation sequence.
SELECT DISTINCT TOP 40 FavoriteColors FROM Sample.Person 
      ORDER BY FavoriteColors DESC
 
Also note in the preceding example that because FavoriteColors is a list field, the collation sequence includes the element length byte. Thus six-letter elements (YELLOW, PURPLE, ORANGE) collate together, listed before five-letter elements (WHITE, GREEN, etc.).
The TOP keyword is not meaningful in Embedded SQL, because a SELECT in Embedded SQL always returns only one row of data.
The select-item
This is a mandatory element of all SELECT statements. Every SELECT statement must have a select-item and a FROM clause. Generally, a select-item refers to a field in the table(s) specified in the FROM clause.
A select-item can be:
The AS Keyword
When specifying a select-item, you can use the optional AS keyword to specify the name of the column returned by the query for that select-item:
select-item AS c-alias
If you omit the AS clause, Caché SQL supplies a unique column name, such as “Expression_1”, or “Aggregate_3”. The number here indicates the column number of the field, as returned by the query. The AS keyword follows immediately after each select-item in the list. The AS keyword itself is not required, but provides clarity of the syntax.
The c-alias name supplied in the AS clause must be a valid identifier.
In the following example, the table column “Home_State” is renamed “US_State_Abbrev”:
SELECT Name, Home_State AS US_State_Abbrev 
FROM Sample.Person
 
In the following example, the aggregate field column created by the AVG function is named “AvgAge”; its default name is “Aggregate_3” (an aggregate field in column 3).
SELECT Name, Age, AVG(Age) AS AvgAge FROM Sample.Person
 
The following example is identical to the previous, except that the AS keyword is here omitted. The use of this keyword is recommended, but not required.
SELECT Name, Age, AVG(Age) AvgAge FROM Sample.Person
 
FROM Clause
The FROM table-ref clause is a mandatory clause that specifies one or more table names or view names (or both), separated by commas.
A table name can take an alias (t-alias), with or without the optional AS keyword. This t-alias can be used to indicate that table when specifying a field name. For further details, refer to FROM.
The following three SELECT statements show the row counts for two individual tables, and the row count for a SELECT specifying both tables. This latter results in a much larger table, a Cartesian product, where every row in the first table is matched with every row of the second table, an operation known as a Cross Join.
SELECT COUNT(*)
FROM Sample.Company
 
SELECT COUNT(*)
FROM Sample.Vendor
 
SELECT COUNT(*)
FROM Sample.Company, Sample.Vendor
 
In most cases, the extensive data duplication of a cross join is not desirable, and some other type of join is preferable. Joins are specified in either the FROM clause or the WHERE clause. For further details, refer to JOIN.
The sequence in which join operations are performed upon a comma-separated table-ref list can be optimized by using the optional %INORDER and %FULL keywords in the FROM clause. For further details, see the FROM clause reference page.
The AS Keyword
When specifying a table-ref, you can use the optional AS keyword to specify an alias for that table name or view name:
table1 AS t-alias1,table2 AS t-alias2,...
This is commonly used to identify fields in the select-item by their source tables, as shown in the following:
SELECT TOP 50 t1.*, t2.Name 
      FROM Sample.Person AS t1, Sample.Company AS t2
 
The t-alias name supplied in the AS clause must be a valid identifier.
In the FROM clause, each table name for which an alias is desired is followed immediately by the AS keyword and its assigned t-alias.
The AS keyword itself is not required, but provides clarity of the syntax. The following example is identical to the previous, except that the AS keyword is here omitted. The use of this keyword is recommended, but not required.
SELECT TOP 50 t1.*, t2.Name 
      FROM Sample.Person t1, Sample.Company t2
 
WHERE Clause
The WHERE clause qualifies or disqualifies specific rows from the query selection. The rows that qualify are those for which the condition-expression is true. The condition-expression is a list of logical tests (predicates) which can be linked by the AND, OR, and NOT operators.
The SQL predicates fall into the following categories:
For further details on these logical predicates, see the WHERE clause reference page.
GROUP BY Clause
The GROUP BY clause takes the resulting rows of a query and breaks them up into individual groups according to one or more database columns. When you use SELECT in conjunction with GROUP BY, one row is retrieved for each distinct value of the GROUP BY fields. The GROUP BY clause is conceptually similar to the Caché extension %FOREACH, but GROUP BY operates on an entire query, while %FOREACH allows selection of aggregates on sub-populations without restricting the entire query population. For instance:
SELECT Home_State, COUNT(Home_State) AS Population
 FROM Sample.Person
  GROUP BY Home_State
 
This query returns one row for each distinct Home_State.
For further details, see the GROUP BY clause reference page.
HAVING Clause
The HAVING clause is like a WHERE clause that operates on groups. It is typically used in combination with the GROUP BY clause, or with the %AFTERHAVING keyword. The HAVING clause qualifies or disqualifies specific rows from the query selection. The rows that qualify are those for which the condition-expression is true. The condition-expression is a list of logical tests (predicates) which can be linked by the AND, OR, and NOT operators. For further details, see the HAVING clause reference page.
ORDER BY Clause
An ORDER BY clause consists of the ORDER BY keywords followed by a select-item or a comma-separated list of items that specify the order in which rows are displayed. Each item can have an optional ASC (ascending order) or DESC (descending order). The default is ascending order. An ORDER BY clause can only be used on the results of a query. It cannot be used with a SELECT statement that is a subquery (for example, in a UNION statement). For further details, see the ORDER BY clause reference page.
The following example returns the selected fields for all rows in the database, and orders these rows in ascending order by age:
SELECT Home_State, Name, Age 
FROM Sample.Person
ORDER BY Age
 
SELECT and Transaction Processing
A transaction is defined as either READ COMMITTED or READ UNCOMMITTED. If the current transaction is defined as READ UNCOMMITTED, a SELECT returns the current state of the data, including changes made to data during the current transaction which have not been committed. These changes may be subsequently rolled back.
If the current transaction is defined as READ COMMITTED, the behavior of the SELECT statement varies. Normally, a SELECT statement would only return data that has been committed. However, if the SELECT statement contains a DISTINCT keyword or a GROUP BY clause, the SELECT returns the current state of the data, including changes made to data during the current transaction which have not been committed. An aggregate function in a SELECT also returns the current state of the data for the specified column(s), including uncommitted changes.
For further details, refer to SET TRANSACTION and START TRANSACTION.
Examples
The following four examples perform similar queries, using different combinations of SELECT clauses. Note that these clauses must be specified in the correct order. In all four examples, three fields are selected from the Sample.Person table: Name, Home_State, and Age, and two fields (AvgAge and AvgMiddleAge) are computed.
HAVING/ORDER BY
In the following example, the AvgAge computed field is computed on all records in Sample.Person. The HAVING clause governs the AvgMiddleAge computed field, calculating the average age of those over 40 from all records in Sample.Person. Thus, every row has the same value for AvgAge and AvgMiddleAge. The ORDER BY clause sequences the display of the rows alphabetically by the Home_State field value.
SELECT Name,Home_State,Age,AVG(Age) AS AvgAge,
 AVG(Age %AFTERHAVING) AS AvgMiddleAge
 FROM Sample.Person
 HAVING Age > 40
 ORDER BY Home_State
 
WHERE/HAVING/ORDER BY
In the following example, the WHERE clause limits the selection to the seven specified northeastern states. The AvgAge computed field is computed on the records from those Home_States. The HAVING clause governs the AvgMiddleAge computed field, calculating the average age of those over 40 from the records from the specified Home_States. Thus, every row has the same value for AvgAge and AvgMiddleAge. The ORDER BY clause sequences the display of the rows alphabetically by the Home_State field value.
SELECT Name,Home_State,Age,AVG(Age) AS AvgAge,
 AVG(Age %AFTERHAVING) AS AvgMiddleAge
 FROM Sample.Person
 WHERE Home_State IN ('ME','NH','VT','MA','RI','CT','NY')
 HAVING Age > 40
 ORDER BY Home_State
 
GROUP BY/HAVING/ORDER BY
The GROUP BY clause causes the AvgAge computed field to be separately computed for each Home_State group. The GROUP BY clause also limits the output display to the first record encountered from each Home_State. The HAVING clause governs the AvgMiddleAge computed field, calculating the average age of those over 40 in each Home_State group. The ORDER BY clause sequences the display of the rows alphabetically by the Home_State field value.
SELECT Name,Home_State,Age,AVG(Age) AS AvgAge,
 AVG(Age %AFTERHAVING) AS AvgMiddleAge
 FROM Sample.Person
 GROUP BY Home_State
 HAVING Age > 40
 ORDER BY Home_State
 
WHERE/GROUP BY/HAVING/ORDER BY
The WHERE clause limits the selection to the seven specified northeastern states. The GROUP BY clause causes the AvgAge computed field to be separately computed for each of these seven Home_State groups. The GROUP BY clause also limits the output display to the first record encountered from each specified Home_State. The HAVING clause governs the AvgMiddleAge computed field, calculating the average age of those over 40 in each of the seven Home_State groups. The ORDER BY clause sequences the display of the rows alphabetically by the Home_State field value.
SELECT Name,Home_State,Age,AVG(Age) AS AvgAge,
 AVG(Age %AFTERHAVING) AS AvgMiddleAge
 FROM Sample.Person
 WHERE Home_State IN ('ME','NH','VT','MA','RI','CT','NY')
 GROUP BY Home_State
 HAVING Age > 40
 ORDER BY Home_State
 
Embedded SQL and Dynamic SQL Examples
Embedded SQL and Dynamic SQL can be used to issue a SELECT query from within a program in another language. Embedded SQL can be included in Caché ObjectScript code. Dynamic SQL can be included in either Caché ObjectScript code or Caché Basic code.
The following embedded SQL program retrieves data values from one record and places them in the output host variables specified in the INTO clause.
   NEW SQLCODE,%ROWCOUNT
   &sql(SELECT Home_State,Name,Age
        INTO :a, :b, :c
        FROM Sample.Person)
   IF SQLCODE=0 {
     WRITE !,"  Name=",b
     WRITE !,"  Age=",c
     WRITE !,"  Home State=",a
     WRITE !,"Row count is: ",%ROWCOUNT }
   ELSE {
     WRITE !,"SELECT failed, SQLCODE=",SQLCODE  }
 
For further details, refer to the Embedded SQL chapter in Using Caché SQL.
The following Dynamic SQL example returns the same three field values to the result.Data() method. By using the WHILE loop to repeatedly invoke the result.Next() method, this example returns values for the first 10 records (in Name order) in Sample.Person:
   NEW SQLCODE,%ROWCOUNT
  SET sqltext = "SELECT Home_State,Name,Age FROM Sample.Person ORDER BY Name"
  SET result = ##class(%Library.ResultSet).%New()
  SET status = result.Prepare(sqltext)
  SET status = result.Execute()
  IF SQLCODE=0 {
   SET x=0
   WHILE x < 10 {
     SET x=x+1
     SET status=result.Next()
      WRITE !,result.Data("Name")," "
      WRITE result.Data("Home_State")," "
      WRITE result.Data("Age") }
    WRITE !,"Row count is: ",%ROWCOUNT
    SET status = result.Close()
 }
  ELSE {
    WRITE !,"SELECT failed, SQLCODE=",SQLCODE }
 
For further details, refer to the Dynamic SQL chapter in Using Caché SQL.
See Also