Using Caché SQL
Querying the Database
[Back] [Next]
Go to:

This chapter discusses how to query the database. It includes information on the following topics:

Types of Queries
A query is a statement which performs data retrieval and generates a result set. A query can consist of any of the following:
Using a SELECT Statement
A SELECT statement selects one or more rows of data from one or more tables or views. A simple SELECT is shown in the following example:
In this example, Name and DOB are columns (data fields) in the Sample.Person table.
The order that clauses must be specified in a SELECT statement is: SELECT DISTINCT TOP ... select-items INTO ... FROM ... WHERE ... GROUP BY ... HAVING ... ORDER BY. All of these clauses are optional, except SELECT select-items. (The optional FROM clause is required to perform any operations on stored data, and therefore is almost always required in a query.) Refer to the SELECT statement syntax for details on the required order for specifying SELECT clauses.
The operation of a SELECT statement can be understood by noting its semantic processing order (which is not the same as the SELECT syntax order). The clauses of a SELECT are processed in the following order:
  1. FROM clause — specifies a table, a view, multiple tables or views using JOIN syntax, or a subquery.
  2. WHERE clause — restricts what data is selected using various criteria.
  3. GROUP BY clause — organizes the selected data into subsets with matching values; only one record is returned for each value.
  4. HAVING clause — restricts what data is selected from groups using various criteria.
  5. select-item — selects a data fields from the specified table or view. A select-item can also be an expression which may or may not reference a specific data field.
  6. ORDER BY clause — sorts the results by different values.
This semantic order shows that a table alias (which is defined in the FROM clause) can be recognized by all clauses, but a column alias (which is defined in the SELECT select-items) can only be recognized by the ORDER BY clause.
To use a column alias in other SELECT clauses you can use a subquery, as shown in the following example:
      (SELECT Name AS Interns FROM Sample.Employee WHERE Age<21) 
In this example, Name and Age are columns (data fields) in the Sample.Person table, and Interns is a column alias for Name.
Selecting Fields
When you issue a SELECT, Caché SQL attempts to match each specified select-item field name to a property defined in the class corresponding to the specified table. Each class property has both a property name and a SqlFieldName. If you defined the table using SQL, the field name specified in the CREATE TABLE command is the SqlFieldName, and Caché generated the property name from the SqlFieldName.
Field names, class property names, and SqlFieldName names have different naming conventions:
The translation between these three names for a field determine several aspects of query behavior. You can specify a select-item field name using any combination of letter case and Caché SQL will identify the appropriate corresponding property. The data column header name in the result set display is the SqlFieldName, not the field name specified in the select-item. This is why the letter case of the data column header may differ from the select-item field name.
To avoid the performance cost of letter case resolution, the select-item field name should use the exact letter case of the SqlFieldName, or you should provide a column alias. It is sometimes beneficial to specify exact column aliases, for example, FamilyName AS FamilyName.
You can specify a column alias for a select-item field. A column alias can be in any mix of letter case, and can contain non-alphanumeric characters, following identifier naming conventions. A column alias can be referenced using any combination of letter case and Caché SQL resolves to the letter case specified in the select-item field. Caché always attempts to match to the list of column aliases before attempting to match to the list of properties corresponding to defined fields. If you have defined a column alias, the data column header name in the result set display is the column alias in the specified letter case, nor the field name.
When a SELECT query completes successfully, Caché SQL generates a result set class for that query. The result set class contains a property corresponding to each selected field. If a SELECT query contains duplicate field names, Caché generates unique property names for each instance of the field in the query by appending a character. For this reason, you cannot include more than 36 instances of the same field in a query.
The generated result set class for a query also contains properties for column aliases. To avoid the performance cost of letter case resolution, you should use the same letter case when referencing a column alias as the letter case used when specifying the column alias in the SELECT statement.
In addition to user-specified column aliases, Caché SQL also automatically generates up to three aliases for each field name, aliases which correspond to common letter case variants of the field name. These generated aliases are invisible to the user. They are provided for performance reasons, because accessing a property through an alias is faster than resolving letter case through letter case translation. For example, if SELECT specifies FAMILYNAME and the corresponding property is familyname, Caché SQL resolves letter case using a generated alias (FAMILYNAME AS familyname). However, if SELECT specifies fAmILyNaMe and the corresponding property is familyname, Caché SQL must resolves letter case using the slower letter case translation process.
A select-item item can also be an expression, an aggregate function, a subquery, a user-defined function, as asterisk, or some other value. For further details on select-item items other than field names, refer to The select-item section of the SELECT command reference page.
The JOIN Operation
A JOIN provides a way to link data in one table with data in another table and are frequently used in defining reports and queries. Within SQL, a JOIN is an operation that combines data from two tables to produce a third, subject to a restrictive condition. Every row of the resulting table must satisfy the restrictive condition.
Caché SQL supports five types of joins (some with multiple syntactic forms): CROSS JOIN, INNER JOIN, LEFT OUTER JOIN, RIGHT OUTER JOIN, and FULL OUTER JOIN. Outer joins support the ON clause with a full range of conditional expression predicates and logical operators. There is partial support for NATURAL outer joins and outer joins with a USING clause. For definitions of these join types and further details, refer to the JOIN page in the Caché SQL Reference.
If a query contains a join, all of the field references within that query must have an appended table alias. Because Caché does not include the table alias in the data column header name, you may wish to provide column aliases for select-item fields to clarify which table is the source of the data.
The following example uses a join operation to match the “fake” (randomly-assigned) zip codes in Sample.Person with the real zip codes and city names in Sample.USZipCode. A WHERE clause is provided because USZipCode does not include all possible 5-digit zip codes:
SELECT P.Home_City,P.Home_Zip AS FakeZip,Z.ZipCode,Z.City AS ZipCity,Z.State
FROM Sample.Person AS P LEFT OUTER JOIN Sample.USZipCode AS Z 
ON P.Home_Zip=Z.ZipCode
ORDER BY P.Home_City
Queries Selecting Large Numbers of Fields
A query cannot select more than 1,000 select-item fields.
A query selecting more than 150 select-item fields may have the following performance consideration. Caché automatically generates result set column aliases. These generated aliases are provided for field names without user-defined aliases to enable rapid resolution of letter case variations. Letter case resolution using an alias is significantly faster than letter case resolution by letter case translation. However, the number of generated result set column aliases is limited to 500. Because commonly Caché generates three of these aliases (for the three most common letter case variations) for each field, Caché generates aliases for roughly the first 150 specified fields in the query. Therefore, a query referencing less than 150 fields commonly has better result set performance than a query referencing significantly more fields. This performance issue can be avoided by specifying an exact column alias for each field select-item in a very large query (for example, SELECT FamilyName AS FamilyName) and then making sure that you use the same letter case when referencing the result set item by column alias.
Executing Named Queries
You can define a query using CREATE QUERY, and then execute it by name using CALL. In the following example, the first is an SQL program that defines the query AgeQuery, the second is Dynamic SQL that executes the query:
CREATE QUERY Sample.AgeQuery(IN topnum INT DEFAULT 10,IN minage INT 20)
   SELECT TOP :topnum Name,Age FROM Sample.Person
   WHERE Age > :minage ;
  ZNSPACE "Samples"
  SET mycall = "CALL Sample.AgeQuery(11,65)"
  SET tStatement = ##class(%SQL.Statement).%New()
  SET tStatus = tStatement.%Prepare(mycall)
  SET rset = tStatement.%Execute()
  DO rset.%Display()
You can execute a named query defined in a class using %Library.ResultSet. The following two Dynamic SQL examples execute the ByName query defined in the Sample.Person class, passing a string to limit the names returned to those that start with that string value. The two examples use different syntax, but are functionally equivalent:
  SET rs=##class(%ResultSet).%New()
  SET rs.ClassName="Sample.Person"
  set rs.QueryName="ByName"
  DO rs.Execute("H")
  DO rs.%Display()
  SET rs=##class(%ResultSet).%New("Sample.Person:ByName")
  DO rs.Execute("K")
  DO rs.%Display()
For further details, refer to Dynamic SQL Using Older Result Set Classes.
For information on query names automatically assigned to executed queries, refer to the Cached Queries chapter of Caché SQL Optimization Guide.
Queries Invoking User-defined Functions
Caché SQL allows you to invoke class methods within SQL queries. This provides a powerful mechanism for extending the syntax of SQL.
To create a user-defined function, define a class method within a persistent Caché class. The method must have a literal (non-object) return value. This has to be a class method because there will not be an object instance within an SQL query on which to invoke an instance method. It also has to be defined as being an SQL stored procedure.
For example, we can define a Cube() method within the class MyApp.Person:
Class MyApp.Person Extends %Persistent [language = basic]
/// Find the Cube of a number
ClassMethod Cube(val As %Integer) As %Integer [SqlProc]
    Return val * val * val
You can create SQL functions with the CREATE FUNCTION, CREATE METHOD or CREATE PROCEDURE statements.
To call an SQL function, specify the name of the SQL procedure. A SQL function may be invoked in SQL code anywhere where a scalar expression may be specified. The function name may be qualified with its schema name, or unqualified. Unqualified function names take the current default schema. A function name may be a delimited identifier.
An SQL function must have a parameter list, enclosed in parentheses. The parameter list may be empty, but the parentheses are mandatory. All specified parameters act as input parameters. Output parameters are not supported.
An SQL function must return a value.
For example, the following SQL query invokes a user-defined SQL function as a method, just as if it was a built-in SQL function:
SELECT %ID, Age, MyApp.Person_Cube(Age) FROM MyApp.Person
For each value of Age, this query will invoke the Cube() method and place its return value within the results.
SQL functions may be nested.
If the specified function is not found, Caché issues an SQLCODE -359 error. If the specified function name is ambiguous, Caché issues an SQLCODE -358 error.
Collection Properties
A collection property that is projected as a child table to SQL from a class using default storage (%Library.CacheStorage) is also projected as a single column in the table projected by the class. The value of this column is the serilized value of the collection. This single column property is projected as an SQL list field.
For example, the collection column Home in Sample.Person is projected as a child table containing the columns Home_Street, Home_City, Home_State, and Home_Zip. The following example returns values from these child table columns:
SELECT TOP 4 Name,Home_Street,Home_City,Home_State,Home_Zip
FROM Sample.Person
The following example returns the same values as a single collection column with the data in list format:
FROM Sample.Person
By default, this Home column is not projected as a column.
You can use the $SYSTEM.SQL configuration methods GetCollectionProjection() and SetCollectionProjection() to determine whether to project a collection as a column if the collection is projected as a child table. Changes made to this systemwide setting takes effect for each class when that class is compiled or recompiled.
Queries Invoking Free-text Search
Caché supports what is called “free-text search,” which includes support for:
This feature enables SQL to support full text indexing, and also enables SQL to index and reference individual elements of a collection without projecting the collection property as a child table. While the underlying mechanisms that support collection indexing and full text indexing are closely related, text retrieval has many special properties, and therefore special classes and SQL features have been provided for text retrieval.
For details on the underlying classes that support these features, see the %Text.Text class.
Full Text Indexing and Text Retrieval through SQL
The %Library.Text class and the %Text package has been provided to index text and to search textual data with SQL. To use the feature on an existing %String property, change %String to %Text and set the LANGUAGECLASS parameter. For English text, the declaration would be as follows:
 Property myNotes As %Text ( LANGUAGECLASS="%Text.English", MAXLEN=1000 );
    Index myFullTextIndex On myNotes(KEYS);
LANGUAGECLASS specifies the name of a helper class that provides the necessary interface to SQL and to the indexer so that efficient text indexing and text search may be carried out. Specifying a MAXLEN value (in bytes) is required for %Text properties.
The available text-aware predicates are %CONTAINS, %CONTAINSTERM, and %SIMILARITY.
Once the %Text property has been declared and optionally indexed, you can issue full text queries using the SQL %CONTAINS predicate, as follows:
SELECT plaintiff, legalBrief FROM transcript
    WHERE plaintiff = 'John Doe' AND
          legalBrief %CONTAINS ('neighbor', 'tree', 'roof')
The query above returns all transcripts where the plaintiff is 'John Doe' and where the terms 'neighbor' AND 'tree' AND 'roof' are in its legalBrief.
Caché includes language-specific parsers in the %Text package for English, Spanish, French, Italian, German, Japanese, Portuguese. While easy to use, the language specific subclasses can be configured to perform many sophisticated operations, such as word stemming to map various forms of the same word into a common root (block, blocks, blocking, and so on), or to support multi-word phrases (n-grams), or to filter out noise words, or to perform automatic text classification (for example, for junk-mail filtering), as well as other features.
Multi-word strings may be specified to %CONTAINS, even if the type class is not configured to support n-grams of the full length of the query. For example, the following predicate may be specified even if the %Text class is configured to store only individual words:
SELECT author FROM famousQuotations WHERE
  quotetext %CONTAINS('eggs in one basket')
The query above would return all authors where the documents contain exactly the specified phrases, even if the text class represents the document only with single words (NGRAMLEN=1). When the pattern is longer than the n-gram length as in the case above, SQL filters the rows with the "[" (contains) operator. Because the "[" operator is case-sensitive, the %CONTAINS predicate is also case-sensitive on patterns longer than NGRAMLEN. An implication is that NGRAMLEN can also affect which rows get returned, as in the following case:
 mission %CONTAINS('Fortune 5')
If NGRAMLEN >= 2, then only the rows containing 'Fortune 5' are returned.
If NGRAMLEN < 2, then rows containing 'Fortune 5', 'Fortune 50', 'Fortune 500', and so on may be returned, since all of these patterns satisfy the "[" test.
See the class documentation of the %Text.Text class and its language-specific subclasses (such as %Text.English) for more information about the capabilities of the Text interface.
Collection Indexing and Querying Collections through SQL
Collections may be referenced from the SQL WHERE clause with a FOR clause. For example:
 FOR SOME %ELEMENT(collectionRef) [AS label] (predicate)
The FOR SOME %ELEMENT clause can be used for list collections and arrays that specify STORAGEDEFAULT="list". The predicate may contain one reference to the pseudo-columns %KEY, %VALUE, or both. A few examples should help to clarify how the FOR SOME %ELEMENT clause may be used. The following returns the name and the list of FavoriteColors for each person whose FavoriteColors include 'Red'.
SELECT Name,FavoriteColors FROM Sample.Person
    WHERE FOR SOME %ELEMENT(FavoriteColors) (%Value = 'Red')
Any SQL predicate may appear after the %Value (or %Key), so for example the following is also legal syntax:
SELECT Name,FavoriteColors FROM Sample.Person
    WHERE FOR SOME %ELEMENT(Sample.Person.FavoriteColors)
        (%Value IN ('Red', 'Blue', 'Green'))
A list collection is considered a special case of an array collection that has sequential numeric keys 1, 2, and so on. Array collections may have arbitrary non-null keys:
 FOR SOME (children) (%Key = 'betty' AND %Value > 5)
In addition to the built-in list and array collection types, generalized collections may be created by providing a BuildValueArray() class method for any property. The BuildValueArray() class method transforms the value of a property into a local array, where each subscript of the array is a %KEY and the value is the corresponding %VALUE.
In addition to simple selections on the %KEY or %VALUE, it is also possible to logically connect two collections, as in the following example:
   FOR SOME %ELEMENT(flavors) AS f
      (f.%VALUE IN ('Chocolate', 'Vanilla') AND
       FOR SOME %ELEMENT(toppings) AS t
           (t.%VALUE = 'Butterscotch' AND
            f.%KEY = t.%KEY))
This example has two collections: flavors and toppings, that are positionally related through their key. The query qualifies a row that has chocolate or vanilla specified as an element of flavors, and that also has butterscotch listed as the corresponding topping, where the correspondence is established through the %KEY.
Usage Notes and Restrictions
Pseudo-Field Variables
Caché SQL queries support the following pseudo-field values:
Pseudo-field variables can only be returned for a table that contains data.
If multiple tables are specified in the FROM clause you must use table aliases, as shown in the following example:
      INTO :name,:rid,:ptname,:etname
    FROM Sample.Person AS P,Sample.Employee AS E)
    WRITE ptname," Name is: ",name,!
    WRITE ptname," RowId is: ",rid,!
    WRITE "1st TableName is: ",ptname,!
    WRITE "2nd TableName is: ",etname,!
The %TABLENAME and %CLASSNAME columns are assigned the default column name Literal_n, where n is the select-item position of the pseudo-field variable in the SELECT statement.
Query Metadata
You can use Dynamic SQL to return metadata about the query, such as the number of columns specified in the query, the name (or alias) of a column specified in the query, and the data type of a column specified in the query.
The following Caché ObjectScript Dynamic SQL example returns the column name and an integer code for the column's ODBC data type for all of the columns in Sample.Person:
  SET myquery="SELECT * FROM Sample.Person"
  SET rset = ##class(%SQL.Statement).%New()
  SET tStatus = rset.%Prepare(myquery)
    SET x=rset.%Metadata.columns.Count()
  WHILE x>0 {
    SET column=rset.%Metadata.columns.GetAt(x)
  WRITE !,x," ",column.colName," ",column.ODBCType
  SET x=x-1 }
  WRITE !,"end of columns"
In this example, columns are listed in reverse column order. Note that the FavoriteColors column, which contains list structured data, returns a data type of 12 (VARCHAR) because ODBC represents a Caché list data type value as a string of comma-separated values.
For further details, refer to the Dynamic SQL chapter of this manual, and the %SQL.Statement class in the InterSystems Class Reference.
Queries and ECP
Caché implementations that use Enterprise Cache Protocol (ECP) can synchronize query results. ECP is a distributed data caching architecture that manages the distribution of data and locks among a heterogeneous network of server systems.
If ECP synchronization is active, each time a SELECT statement is executed Caché forces all pending ECP requests to the database server. On completion this guarantees that the client cache is in sync. This synchronization occurs in the Open logic of the query. This is in the OPEN cursor execution if this is a cursor query.
You can establish ECP synchronization using the Management Portal. Go to the Management Portal. From System Administration, select Configuration, then SQL and Object Settings, then General SQL Settings ([Home] > [Configuration] > [General SQL Settings]). On this screen you can view and edit the current setting of SQL SELECT Synchronizes ECP Cache The default is “No” (ECP synchronization is not performed). This is a systemwide setting; changing this setting immediately affects all Caché processes on the system.
For further details, refer to the Caché Distributed Data Management Guide