Retrieves rows from one or more tables within a database.
Synopsis
SELECT [DISTINCT | ALL] [TOP int]
select-item
[INTO host-variable-list]
FROM [%INORDER | %FULL] table-ref [[AS] t-alias]
{,table-ref [[AS] t-alias]}
[WHERE condition-expression]
[GROUP BY scalar-expression]
[HAVING condition-expression]
[ORDER BY item-order-list [ASC | DESC] ]
select-item ::=
[t-alias.]*
[t-alias.]scalar-expression [[AS] c-alias]
{,[t-alias.]scalar-expression [[AS] c-alias]}
The
SELECT statement retrieves data from a Caché database.
In its simplest form, it retrieves one or more items from a single table; in this
case, the items are specified by the
select-item list and the table
is specified by the FROM
table-ref clause.
Caché returns a status variable
SQLCODE,
which indicates the success or failure of the
SELECT. In addition,
the
SELECT operation sets the
%ROWCOUNT local
variable to the number of selected rows. Successful completion of a
SELECT generally
sets SQLCODE=0 and %ROWCOUNT to the number of rows selected. In the case of an embedded
SQL containing a simple
SELECT, data from (at most) one row is
selected, so SQLCODE=0 and %ROWCOUNT is set to either 0 or 1. However, in the case
of an embedded SQL
SELECT that declares a cursor and fetches data
from multiple rows, the operation only completes when the cursor has been advanced
to the end of the data (SQLCODE=100); at that point, %ROWCOUNT is set to the total
number of rows selected.
You can use a
SELECT statement either as its own independent
query, or as part of a larger statement, in which it is known as a subquery. A subquery
is enclosed in parentheses.
The
UNION statement allows you to combine one
or more
SELECT statements into a single query. If a query consists
of a single
SELECT statement, then the
SELECT statement
can conclude with an
ORDER BY statement; if a query
is a
UNION of multiple
SELECT statements,
then the
UNION statement can conclude with an
ORDER
BY statement.
The following are required clauses for all
SELECT statements:
-
A
select-item list,
a comma-separated list of one or more items (the
select-item arguments)
to be retrieved from the table or otherwise generated. Most commonly, these items
are the names of columns in a table. The
select-item consists of
either a scalar expression specifying one or more individual items, or an asterisk
(*) referring to all the columns of a base table.
-
A
FROM clause specifies
one or more tables, views, subqueries, or
JOIN expressions
from which rows are to be retrieved.
The following optional clauses operate on the virtual table that a FROM clause
returns. All are optional, but, if used, must appear in the order specified:
-
A
WHERE clause, which
specifies one or more conditional constraints that rows must match. The particulars
of these constraints are specified by a
condition-expression.
Thus a WHERE clause returns all records that satisfy the conditional constraints.
-
A
GROUP BY clause,
which specifies a comma-delimited list of columns. These organize a query's result
set into subsets with matching values for one or more columns and determine the ordering
of the rows returned. GROUP BY allows scalar expressions as well as columns.
-
A
HAVING clause, which
specifies one or more conditional constraints. The particulars of these constraints
are specified by a
condition-expression and,
typically, operate on each group specified by a GROUP BY clause. Thus a HAVING clause
typically returns one record for each value that satisfies the conditional constraints.
-
An
ORDER BY clause,
which specifies the order in which rows should be displayed. An ORDER BY clause can
only be used if the
SELECT statement is not a subquery. For example,
an
ORDER BY can be used on the results of a
UNION,
but not on the
SELECT statements that are subqueries of the
UNION.
The DISTINCT keyword causes redundant field values to be eliminated:
SELECT DISTINCT Home_State FROM Sample.Person
This returns one row for each distinct state name. Note that because DISTINCT
returns distinct values, rather than specific rows, listing more than one item retrieves
all values that are distinct for both items. For example:
SELECT DISTINCT Home_State,Age FROM Sample.Person
Retrieves every record that contains a distinct combination of Home_State and
Age values.
If the column specified in the DISTINCT clause contains NULL values, DISTINCT
returns one row with a value of NULL, as shown in the following example.
SELECT DISTINCT FavoriteColors FROM Sample.Person
The syntax
DISTINCT * is legal, but not meaningful, since
all rows, by definition, contain some distinct unique identifier.
A DISTINCT clause can be specified with one or more aggregate fields:
SELECT DISTINCT Age, AVG(Age) FROM Sample.Person
The DISTINCT keyword is not meaningful in
Embedded SQL,
because a
SELECT in Embedded SQL always returns only one row of
data.
The TOP keyword specifies that the
SELECT statement return
only a specified number of rows. It returns the specified number of rows that appear
at the top of the returned virtual table. By default, which rows are
the top rows of the table is unpredictable. However, Caché applies
the DISTINCT and ORDER BY keywords (if specified) before selecting the TOP rows.
When
int is an integer, it specifies the number of rows to
return. When
int is a parameter (enclosed in parentheses), the
number of rows to return is specified by a variable.
The TOP keyword cannot be used in a sub-query.
TOP is generally used in a
SELECT with an ORDER BY clause.
Note that the default ascending ORDER BY collation sequence considers NULL to be the
lowest (top) value, followed by the empty string ('').
The following query returns the first 20 rows retrieved from Sample.Person in
the order that they are stored in the database.
SELECT TOP 20 Home_State,Name FROM Sample.Person
The following query returns the first 20 distinct Home_State values retrieved
from Sample.Person in ascending collation sequence order.
SELECT DISTINCT TOP 20 Home_State FROM Sample.Person ORDER BY Home_State
The following query returns the first 20 distinct FavoriteColor values. The
top rows reflect the ORDER BY clause sequencing of all of the rows in
Sample.Person in descending (DESC) collation sequence. Descending collation sequence
is used rather than the default ascending collation sequence because the FavoriteColors
field is known to have NULLs, which would appear at the top of the ascending collation
sequence.
SELECT DISTINCT TOP 40 FavoriteColors FROM Sample.Person
ORDER BY FavoriteColors DESC
Also note in the preceding example that because FavoriteColors is a list field,
the collation sequence includes the element length byte. Thus six-letter elements
(YELLOW, PURPLE, ORANGE) collate together, listed before five-letter elements (WHITE,
GREEN, etc.).
The TOP keyword is not meaningful in
Embedded SQL,
because a
SELECT in Embedded SQL always returns only one row of
data.
This is a mandatory element of all
SELECT statements. Every
SELECT statement
must have a
select-item and a FROM clause. Generally, a
select-item refers
to a field in the table(s) specified in the FROM clause.
-
A list of one or more column names (fields), separated by commas:
SELECT Name,Age FROM Sample.Person
When the FROM clause specifies more than one table or view, you should include
the table name (or a table name alias) as part of the
select-item,
using periods, as shown in the following two examples:
SELECT Sample.Company.Name, Sample.Person.Name
FROM Sample.Person, Sample.Company
SELECT p.Name, c.Name
FROM Sample.Person AS p, Sample.Company AS c
When the
select-item references embedded object property
(serial class) data, use underline syntax. Underline syntax consists of the name of
the object property, an underscore, and the property within the embedded object: for
example,
Home_City and
Home_State. For further
details, refer to
Objects and SQL in
Using
Caché Objects.
SELECT Home_City,Home_State FROM Sample.Person
-
Asterisk syntax (*), which selects all the columns in a table:
SELECT * FROM Sample.Person
If the
select-item uses the asterisk syntax, it specifies
a reference to all the columns in a table. Asterisk syntax can be qualified or unqualified:
-
If the
select-item is unqualified (that is, simply
the asterisk), it must be the only
select-item in the list.
-
If the
select-item is qualified by prefixing a
table name (or table name alias) and period (.) before the asterisk, the
select-item selects
all the columns in the specified table. Qualified asterisk syntax can be combined
with other select items for other tables.
In the following example,
select-item consists
of qualified asterisk syntax that selects all columns from one table, and a list of
column names from another table.
SELECT TOP 50 Sample.Person.*, Sample.Company.Name,
Sample.Company.Home_State
FROM Sample.Person, Sample.Company
Note:
SELECT * is a fully-supported part of Caché SQL that can be extremely
convenient during application development and debugging. However, in production applications
the preferred programming practice is to explicitly list the selected fields, rather
than using the asterisk syntax form. Explicitly listing fields makes your application
clearer and easier to understand, easier to maintain, and easier to search for fields
by name.
-
-
A
select-item list containing one or more arithmetic
expressions. These arithmetic operations generally use one or more field values:
SELECT Name, Age, Age - AVG(Age) FROM Sample.Person
However, it is possible to include arithmetic operations that do not involve
any field values in a
select-item list:
SELECT Name, Age, 9 - 6 FROM Sample.Person
-
A
select-item list containing one or more SQL conversion
functions:
SELECT Name,UCASE(Name) FROM Sample.Person
-
-
AVG average of the values in
column
-
SUM sum of the values in column
-
COUNT number of non-null values
in column
-
MAX largest value in column
-
MIN smallest value in column
-
LIST all of the values used
within a specified column as a comma-separated list.
-
XMLAGG all of the values
used within a specified column as a concatenated string.
An aggregate function always returns a single value. The argument of an aggregate
function may be any of the following:
-
A single column namecomputes the aggregate for all non-null
values of the rows selected by the query:
SELECT AVG(Age) FROM Sample.Person
-
A scalar expression is also permitted to compute an aggregate:
SELECT SUM(Age) / COUNT(*) FROM Sample.Person
-
The wildcard character (*) used with the COUNT function to
compute the number of rows in the table:
SELECT COUNT(*) FROM Sample.Person
-
A select distinct function computes the aggregate by eliminating
redundant values:
SELECT COUNT(DISTINCT Home_State) FROM Sample.Person
-
While ANSI SQL does not allow the combination of column names and
aggregate functions in a single
SELECT statement, Caché
SQL extends the standard by allowing this:
SELECT Name, COUNT(DISTINCT Home_State) FROM Sample.Person
-
An aggregate function using %FOREACH. This causes the aggregate to
be computed for each distinct value of a column or columns:
SELECT DISTINCT Home_State, AVG(Age %FOREACH(Home_State))
FROM Sample.Person
-
An aggregate function using %AFTERHAVING. This causes the aggregate
to be computed on a sub-population specified with the HAVING clause:
SELECT AVG(Age %AFTERHAVING) FROM Sample.Person
HAVING (Age > AVG(Age))
would give the average age for those persons whose age is above the average
for all persons in the database.
-
An extrinsic Caché ObjectScript function call operating on
a database column:
SELECT $$REFORMAT^ABC(name)FROM MyTable
Note that you can only invoke extrinsic functions within an SQL statement if
you configured this option system-wide. Go to the
System Management Portal,
select Configuration, then select SQL Settings. View and edit the current setting
of
Allow Extrinsic Functions in SQL Statements. The default is
No;
by default, attempting to invoke extrinsic functions issues an SQLCODE -372 error
code.
You cannot use an extrinsic function to call a % routine (a routine with a name
that begins with the % character). Attempting to do so issues an SQLCODE -373 error
code.
-
A
select-item which does not reference any fields
in the specified table:
SELECT UCASE('fred') FROM Sample.Person
SELECT 7 * 7, 7 * 8 FROM Sample.Person
Even though no data is being referenced, the specified table must exist. In
a query of this type, the number of rows returned corresponds to the number of rows
in the specified table. To return just one row, use the DISTINCT keyword.
When specifying a
select-item, you can use the optional AS
keyword to specify the name of the column returned by the query for that
select-item:
If you omit the AS clause, Caché SQL supplies a unique column name, such
as Expression_1, or Aggregate_3. The number here indicates
the column number of the field, as returned by the query. The AS keyword follows immediately
after each
select-item in the list. The AS keyword itself is not
required, but provides clarity of the syntax.
In the following example, the table column Home_State is renamed
US_State_Abbrev:
SELECT Name, Home_State AS US_State_Abbrev
FROM Sample.Person
In the following example, the aggregate field column created by the AVG function
is named AvgAge; its default name is Aggregate_3 (an aggregate
field in column 3).
SELECT Name, Age, AVG(Age) AS AvgAge FROM Sample.Person
The following example is identical to the previous, except that the AS keyword
is here omitted. The use of this keyword is recommended, but not required.
SELECT Name, Age, AVG(Age) AvgAge FROM Sample.Person
The FROM
table-ref clause is a mandatory clause that specifies
one or more table names or view names (or both), separated by commas.
A table name can take an alias (
t-alias), with or without
the optional AS keyword. This
t-alias can be used to indicate that
table when specifying a field name. For further details, refer to
FROM.
The following three
SELECT statements show the row counts
for two individual tables, and the row count for a
SELECT specifying
both tables. This latter results in a much larger table, a Cartesian product, where
every row in the first table is matched with every row of the second table, an operation
known as a
Cross Join.
SELECT COUNT(*)
FROM Sample.Company
SELECT COUNT(*)
FROM Sample.Vendor
SELECT COUNT(*)
FROM Sample.Company, Sample.Vendor
In most cases, the extensive data duplication of a cross join is not desirable,
and some other type of join is preferable. Joins are specified in either the FROM
clause or the WHERE clause. For further details, refer to
JOIN.
The sequence in which join operations are performed upon a comma-separated
table-ref list
can be optimized by using the optional %INORDER and %FULL keywords in the FROM clause.
For further details, see the
FROM clause reference
page.
When specifying a
table-ref, you can use the optional AS
keyword to specify an alias for that table name or view name:
table1 AS t-alias1,table2 AS t-alias2,...
This is commonly used to identify fields in the
select-item by
their source tables, as shown in the following:
SELECT TOP 50 t1.*, t2.Name
FROM Sample.Person AS t1, Sample.Company AS t2
In the FROM clause, each table name for which an alias is desired is followed
immediately by the AS keyword and its assigned
t-alias.
The AS keyword itself is not required, but provides clarity of the syntax. The
following example is identical to the previous, except that the AS keyword is here
omitted. The use of this keyword is recommended, but not required.
SELECT TOP 50 t1.*, t2.Name
FROM Sample.Person t1, Sample.Company t2
The WHERE clause qualifies or disqualifies specific rows from the query selection.
The rows that qualify are those for which the
condition-expression is
true. The
condition-expression is a list of logical tests (predicates)
which can be linked by the AND, OR, and NOT operators.
The SQL predicates fall into the following categories:
For further details on these logical predicates, see the
WHERE clause
reference page.
The GROUP BY clause takes the resulting rows of a query and breaks them up into
individual groups according to one or more database columns. When you use
SELECT in
conjunction with GROUP BY, one row is retrieved for each distinct value of the GROUP
BY fields. The GROUP BY clause is conceptually similar to the Caché extension
%FOREACH, but GROUP BY operates on an entire query, while %FOREACH allows selection
of aggregates on sub-populations without restricting the entire query population.
For instance:
SELECT Home_State, COUNT(Home_State) AS Population
FROM Sample.Person
GROUP BY Home_State
This query returns one row for each distinct Home_State.
For further details, see the
GROUP BY clause
reference page.
The HAVING clause is like a WHERE clause that operates on groups. It is typically
used in combination with the GROUP BY clause, or with the %AFTERHAVING keyword. The
HAVING clause qualifies or disqualifies specific rows from the query selection. The
rows that qualify are those for which the
condition-expression is
true. The
condition-expression is a list of logical tests (predicates)
which can be linked by the AND, OR, and NOT operators. For further details, see the
HAVING clause reference page.
An ORDER BY clause consists of the ORDER BY keywords followed by a select-item
or a comma-separated list of items that specify the order in which rows are displayed.
Each item can have an optional ASC (ascending order) or DESC (descending order). The
default is ascending order. An ORDER BY clause can only be used on the results of
a query. It cannot be used with a
SELECT statement that is a subquery
(for example, in a UNION statement). For further details, see the
ORDER
BY clause reference page.
The following example returns the selected fields for all rows in the database,
and orders these rows in ascending order by age:
SELECT Home_State, Name, Age
FROM Sample.Person
ORDER BY Age
SELECT and Transaction Processing
A transaction is defined as either READ COMMITTED or READ UNCOMMITTED. If the
current transaction is defined as READ UNCOMMITTED, a
SELECT returns
the current state of the data, including changes made to data during the current transaction
which have not been committed. These changes may be subsequently rolled back.
If the current transaction is defined as READ COMMITTED, the behavior of the
SELECT statement
varies. Normally, a
SELECT statement would only return data that
has been committed. However, if the
SELECT statement contains a
DISTINCT keyword
or a
GROUP BY clause, the
SELECT returns the
current state of the data, including changes made to data during the current transaction
which have not been committed. An aggregate function in a
SELECT also
returns the current state of the data for the specified column(s), including uncommitted
changes.
The following four examples perform similar queries, using different combinations
of
SELECT clauses. Note that these clauses must be specified in
the correct order. In all four examples, three fields are selected from the Sample.Person
table: Name, Home_State, and Age, and two fields (AvgAge and AvgMiddleAge) are computed.
In the following example, the AvgAge computed field is computed on all records
in Sample.Person. The
HAVING clause governs the AvgMiddleAge computed
field, calculating the average age of those over 40 from all records in Sample.Person.
Thus, every row has the same value for AvgAge and AvgMiddleAge. The
ORDER
BY clause sequences the display of the rows alphabetically by the Home_State
field value.
SELECT Name,Home_State,Age,AVG(Age) AS AvgAge,
AVG(Age %AFTERHAVING) AS AvgMiddleAge
FROM Sample.Person
HAVING Age > 40
ORDER BY Home_State
In the following example, the
WHERE clause limits the selection
to the seven specified northeastern states. The AvgAge computed field is computed
on the records from those Home_States. The
HAVING clause governs
the AvgMiddleAge computed field, calculating the average age of those over 40 from
the records from the specified Home_States. Thus, every row has the same value for
AvgAge and AvgMiddleAge. The
ORDER BY clause sequences the display
of the rows alphabetically by the Home_State field value.
SELECT Name,Home_State,Age,AVG(Age) AS AvgAge,
AVG(Age %AFTERHAVING) AS AvgMiddleAge
FROM Sample.Person
WHERE Home_State IN ('ME','NH','VT','MA','RI','CT','NY')
HAVING Age > 40
ORDER BY Home_State
The
GROUP BY clause causes the AvgAge computed field to be
separately computed for each Home_State group. The
GROUP BY clause
also limits the output display to the first record encountered from each Home_State.
The
HAVING clause governs the AvgMiddleAge computed field, calculating
the average age of those over 40 in each Home_State group. The
ORDER BY clause
sequences the display of the rows alphabetically by the Home_State field value.
SELECT Name,Home_State,Age,AVG(Age) AS AvgAge,
AVG(Age %AFTERHAVING) AS AvgMiddleAge
FROM Sample.Person
GROUP BY Home_State
HAVING Age > 40
ORDER BY Home_State
WHERE/GROUP BY/HAVING/ORDER BY
The
WHERE clause limits the selection to the seven specified
northeastern states. The
GROUP BY clause causes the AvgAge computed
field to be separately computed for each of these seven Home_State groups. The
GROUP
BY clause also limits the output display to the first record encountered
from each specified Home_State. The
HAVING clause governs the AvgMiddleAge
computed field, calculating the average age of those over 40 in each of the seven
Home_State groups. The
ORDER BY clause sequences the display of
the rows alphabetically by the Home_State field value.
SELECT Name,Home_State,Age,AVG(Age) AS AvgAge,
AVG(Age %AFTERHAVING) AS AvgMiddleAge
FROM Sample.Person
WHERE Home_State IN ('ME','NH','VT','MA','RI','CT','NY')
GROUP BY Home_State
HAVING Age > 40
ORDER BY Home_State
Embedded SQL and Dynamic SQL Examples
Embedded SQL and Dynamic SQL can be used to issue a
SELECT query
from within a program in another language. Embedded SQL can be included in Caché
ObjectScript code. Dynamic SQL can be included in either Caché ObjectScript
code or Caché Basic code.
The following embedded SQL program retrieves data values from one record and
places them in the output
host variables specified
in the
INTO clause.
NEW SQLCODE,%ROWCOUNT
&sql(SELECT Home_State,Name,Age
INTO :a, :b, :c
FROM Sample.Person)
IF SQLCODE=0 {
WRITE !," Name=",b
WRITE !," Age=",c
WRITE !," Home State=",a
WRITE !,"Row count is: ",%ROWCOUNT }
ELSE {
WRITE !,"SELECT failed, SQLCODE=",SQLCODE }
The following Dynamic SQL example returns the same three field values to the
result.Data() method. By using the
WHILE loop to repeatedly invoke
the result.Next() method, this example returns values for the first 10 records (in
Name order) in Sample.Person:
NEW SQLCODE,%ROWCOUNT
SET sqltext = "SELECT Home_State,Name,Age FROM Sample.Person ORDER BY Name"
SET result = ##class(%Library.ResultSet).%New()
SET status = result.Prepare(sqltext)
SET status = result.Execute()
IF SQLCODE=0 {
SET x=0
WHILE x < 10 {
SET x=x+1
SET status=result.Next()
WRITE !,result.Data("Name")," "
WRITE result.Data("Home_State")," "
WRITE result.Data("Age") }
WRITE !,"Row count is: ",%ROWCOUNT
SET status = result.Close()
}
ELSE {
WRITE !,"SELECT failed, SQLCODE=",SQLCODE }