Caché SQL Optimization Guide
Optimizing Query Performance
[Home] [Back] [Next]
InterSystems: The power behind what matters   
Class Reference   
Search:    

Caché SQL automatically uses a Query Optimizer to create a query plan that provides optimal query performance in most circumstances. This Optimizer improves query performance in many ways, including determining which indices to use, determining the order of evaluation of multiple AND conditions, determining the sequence of tables when performing multiple joins, and many other optimization operations. You can supply “hints” to this Optimizer in the FROM clause of the query. This chapter describes tools that you can use to evaluate a query plan and to modify how Caché SQL will optimize a specific query.

Caché supports the following tools for optimizing SQL queries:
The following SQL query performance tools are described in other chapters of this manual:
The following tools are used to optimize table data, and thus can have a significant effect on all queries run against that table:
This chapter also describes how to Write Query Optimization Plans to a File, and how to generate an SQL Troubleshooting Report to submit to InterSystems WRC.
Management Portal SQL Performance Tools
The Caché Management Portal provides access to the following SQL performance tools. From the Management Portal, select the System Explorer option. From there you select Tools, then select SQL Performance Tools, then one of the following SQL performance tools:
SQL Runtime Statistics
You can use SQL Runtime Statistics to measure the performance of query execution on your system. SQL Runtime Statistics measures the performance of SELECT, INSERT, UPDATE, and DELETE operations (collectively known as query operations). This feature is off by default. After activating it, you must recompile SQL queries.
You can use the Caché Management Portal or the %SYS.PTools.SQLStats class to collect performance statistics on an SQL query. By using this class you can determine for each SQL query: the compile time, the number of global references, the number of lines of code executed, the number of times a module is called, the total execution time, the time to first row, disk wait (the disk read access time, in milliseconds), and the number of rows processed.
Gather Query Performance Statistics
You must activate statistics (Stats) code generation to collect performance statistics, using either of the following:
For either of these interfaces, you specify one of the following options: 0 turn off statistics code generation; 1 turn on statistics code generation for all queries, but do not gather statistics (the default); 2 record statistics for just the outer loop of the query (gather statistics at the open and close of the query); 3 record statistics for all module levels of the query. Modules can be nested. If so, the MAIN module statistics are inclusive numbers, the overall results for the full query.
For SetSQLStatsJob() the options differ slightly. They include: -1 turn off statistics for this job; 0 use the system setting value. The 1, 2, and 3 options are the same as SetSQLStats() and override the system setting. The default is 0.
To gather SQL Stats data, queries need to be compiled with statistics code generation turned on (option 1, the default):
This information is stored in %SYS.PTools.SQLQuery and %SYS.PTools.SQLStats.
Purging a cached query purges any related SQL Stats data. Dropping a table or view purges any related SQL Stats data.
Display Query Performance Statistics
You can display performance statistics for an SQL query as follows:
The following examples collect runtime statistics from an INSERT statement:
CreateATable
  &sql(CREATE TABLE sample.sqltest (FullName VARCHAR(25),MyDate DATE DEFAULT CURRENT_DATE))
    IF SQLCODE=0 { WRITE "sqltest table created",! }
    ELSE {WRITE "table create failed SQLCODE=",SQLCODE,! }
 
InsertData
  SET oldstat=$SYSTEM.SQL.SetSQLStatsJob(2)
  &sql(INSERT INTO sample.sqltest(FullName) SELECT Name FROM Sample.Person WHERE Name BETWEEN 'A' AND 'J')
  WRITE "Inserted ",%ROWCOUNT," rows in table SQLCODE=",SQLCODE,!
  DO ##class(%SYS.PTools.SQLStats).GetLastSQLStats()
  DO %sqlcontext.DumpResults()
Cleanup
   DO $SYSTEM.SQL.SetSQLStatsJob(oldstat)
   &sql(DROP TABLE sample.sqltest)
 
You can use the SQLStatsView query to display these statistics, as shown in the following example:
  ZNSPACE "Samples"
  DO ##class(%SYS.PTools.SQLStats).Purge("Samples")
  DO ##class(%SYSTEM.SQL).SetSQLStatsJob(2)
  SET myquery = "SELECT TOP 5 Name,DOB FROM Sample.Person"
  SET tStatement = ##class(%SQL.Statement).%New()
  SET qStatus = tStatement.%Prepare(myquery)
   IF qStatus'=1 {WRITE "%Prepare failed:" DO $System.Status.DisplayError(qStatus) QUIT}
  SET rset = tStatement.%Execute()
  DO rset.%Display()
  WRITE !,"end of query result set",!!
  KILL rset
  DO ##class(%ResultSet).RunQuery("%SYS.PTools.SQLStats","SQLStatsView")
  WRITE !!,"End of SQL Statistics"
 
Exporting Query Performance Statistics
You can export query performance statistics to a text file. By default, columns in this text file are delimited by tabs. If you don't specify a filename argument, these methods create a .psql file in the Mgr directory, using your system ID, Caché installation directory, and Caché version to generate a file name. If you specify a filename argument, these methods create a file in the Mgr subdirectory for the current namespace, or in the path location you specify. This export is limited to data in the current namespace.
Runtime Statistics and Show Plan
The SQL Runtime Statistics tool can be used to display the Show Plan for a query with runtime statistics.
The Alternate Show Plans tool can be used to compare show plans with stats, displaying runtime statistics for a query. The Alternate Show Plans tool in its Show Plan Options displays estimated statistics for a query. If gathering runtime statistics is activated, its Compare Show Plans with Stats option displays actual runtime statistics; if runtime statistics are not active, this option displays estimate statistics.
Using Indices
Indexing provides a mechanism for optimizing queries by maintaining a sorted subset of commonly requested data. Determining which fields should be indexed requires some thought: too few or the wrong indices and key queries will run too slowly; too many indices can slow down INSERT and UPDATE performance (as the index values must be set or updated).
What to Index
To determine if adding an index improves query performance, run the query from the Management Portal SQL interface and note in Performance the number of global references. Add the index and then rerun the query, noting the number of global references. A useful index should reduce the number of global references. You can prevent use of an index by using the %NOINDEX keyword as preface to a WHERE clause or ON clause condition.
You should index fields (properties) that are specified in a JOIN. A LEFT OUTER JOIN starts with the left table, and then looks into the right table; therefore, you should index the field from the right table. In the following example, you should index T2.f2:
   FROM Table1 AS T1 LEFT OUTER JOIN Table2 AS T2 ON T1.f1 = T2.f2
An INNER JOIN should have indices on both ON clause fields.
Run Show Plan and follow to the first map. If the first bullet item in the Query Plan is “Read master map”, or the Query Plan calls a module whose first bullet item is “Read master map”, the query first map is the master map rather than an index map. Unless the table is relatively small, you should create an index so that when you rerun this query the Query Plan first map says “Read index map.”
You should index fields that are specified in a WHERE clause equal condition.
You may wish to index fields that are specified in a WHERE clause range condition, and fields specified in GROUP BY and ORDER BY clauses.
Under certain circumstances, an index based on a range condition could make a query slower. This can occur if the vast majority of the rows meet the specified range condition. For example, if the query clause WHERE Date < CURRENT_DATE is used with a database in which most of the records are from prior dates, indexing on Date may actually slow down the query. This is because the Query Optimizer assumes range conditions will return a relatively small number of rows, and optimizes for this situation. You can determine if this is occurring by prefacing the range condition with %NOINDEX and then run the query again.
If you are performing a comparison using an indexed field, the field as specified in the comparison should have the same collation type as it has in the corresponding index. For example, the Name field in the WHERE clause of a SELECT or in the ON clause of a JOIN should have the same collation as the index defined for the Name field. If there is a mismatch between the field collation and the index collation, the index may be less effective or may not be used at all. For further details, refer to Index Collation in the “Defining and Building Indices” chapter of this manual.
For details on how to create an index and the available index types and options, refer to the CREATE INDEX command in the Caché SQL Reference, and the Defining and Building Indices chapter of this manual.
Index Configuration Options
The following system-wide configuration methods can be used to optimize use of indices in queries:
For further details, refer to SQL configuration settings described in Caché Advanced Configuration Settings Reference.
Index Usage Analysis
You can analyze index usage by SQL cached queries using either of the following:
Index Analyzer
From the Management Portal Tools interface, select System Explorer, then Tools, then select SQL Performance Tools, then Index Analyzer ([Home] > [SQL Index Analyzer]). It provides an SQL Statement Count display for the current namespace, and four index analysis report options.
SQL Statement Count
At the top of the SQL Index Analyzer there is an option to count all SQL statements in the namespace. Press the Gather SQL Statements button. The SQL Index Analyzer displays “Gathering SQL statements ....” while the count is in progress, then “Done” when the count is complete. SQL statements are counted in three categories: a Cached Query count, a Class Method count, and a Class Query count. These counts are for the entire current namespace, and are not affected by the Include System Queries? option or the Schema Selection option.
However, note that running an SQL Index Analyzer Report Option with a Schema Selection generates 1 cached query. Running the Index usage option generates an additional 3 cached queries (a total of 4 if Schema Selection is specified). These generated cached queries will be counted in subsequent use of Gather SQL Statements. Repeated use of the different Report Option choices with different schema selections does not generate additional cached queries.
The corresponding method is GetSQLStatements() in the %SYS.PTools.SQLUtilities class.
Report Options
You can either examine reports for the cached queries for a selected schema in the current namespace, or (by not selecting a schema) examine reports for all cached queries in the current namespace. You can include or exclude system queries in this analysis. The index analysis report options are:
When you select one of these options, the system automatically performs the operation and displays the results. The first time you select an option or invoke the corresponding method, the system generates the results data; if you select that option or invoke that method again, Caché redisplays the same results. To generate new results data you must use the Gather SQL Statements button to reinitialize the Index Analyzer results tables. To generate new results data for the %SYS.PTools.SQLUtilities methods, you must invoke GetSQLStatements() to reinitialize the Index Analyzer results tables. Changing the Include System Queries? check box option also reinitializes the Index Analyzer results tables.
IndexUsage() Method
The following example demonstrates the use of the IndexUsage() method:
  ZNSPACE "Samples"
  DO ##class(%SYS.PTools.SQLUtilities).IndexUsage(1)
  SET utils = "SELECT %EXACT(Type), Count(*) As QueryCount "_
              "FROM %SYS_PTools.SQLUtilities GROUP BY Type"
  SET utilresults = "SELECT SchemaName, Tablename, IndexName, UsageCount "_
                    "FROM %SYS_PTools.SQLUtilResults ORDER BY UsageCount"
  SET tStatement = ##class(%SQL.Statement).%New()
  SET qStatus = tStatement.%Prepare(utils)
   IF qStatus'=1 {WRITE "%Prepare failed:" DO $System.Status.DisplayError(qStatus) QUIT}
  SET rset = tStatement.%Execute()
  DO rset.%Display()
  WRITE !,"End of utilities data",!!
  SET qStatus = tStatement.%Prepare(utilresults)
   IF qStatus'=1 {WRITE "%Prepare failed:" DO $System.Status.DisplayError(qStatus) QUIT}
  SET rset = tStatement.%Execute()
  DO rset.%Display()
  WRITE !,"End of results data"
 
Note that because results are ordered by UsageCount, indices with UsageCount > 0 are listed at the end of the result set.
Index Optimization Options
By default, the Caché SQL query optimizer uses sophisticated and flexible algorithms to optimize the performance of complex queries involving multiple indices. In most cases, these defaults provide optimal performance. However, in infrequent cases, you may wish to give “hints” to the query optimizer by specifying optimize-option keywords.
The FROM clause supports the %ALLINDEX and %IGNOREINDEX optimize-option keywords. These optimize-option keywords govern all index use in the query. They are described in detail in the FROM clause reference page of the Caché SQL Reference.
You can use the %NOINDEX condition-level hint to specify exceptions to the use of an index for a specific condition. The %NOINDEX hint is placed in front of each condition for which no index should be used. For example, WHERE %NOINDEX hiredate < ?. This is most commonly used when the overwhelming majority of the data is selected (or not selected) by the condition. With a less-than (<) or greater-than (>) condition, use of the %NOINDEX condition-level hint is often beneficial. With an equality condition, use of the %NOINDEX condition-level hint provides no benefit. With a join condition, %NOINDEX is not supported for =* and *= WHERE clause outer joins; %NOINDEX is supported for ON clause joins.
The %NOINDEX keyword can be used to override indexing optimization established in the FROM clause. In the following example, the %ALLINDEX optimization keyword applies to all condition tests except the E.Age condition:
  SELECT P.Name,P.Age,E.Name,E.Age
  FROM %ALLINDEX Sample.Person AS P LEFT OUTER JOIN Sample.Employee AS E
       ON P.Name=E.Name
  WHERE P.Age > 21 AND %NOINDEX E.Age < 65
 
Show Plan
Show Plan displays the execution plan for SELECT, UPDATE, DELETE, TRUNCATE TABLE, and some INSERT operations. These are collectively known as query operations because they use a SELECT query as part of their execution. Show Plan is performed when a query operation is prepared; you do not have to actually execute the query operation to generate an execution plan.
Show Plan displays what Caché considers to be the optimal execution plan. However, for most queries there is more than one possible execution plan. You can also display alternate show plans.
Displaying an Execution Plan
You can use Show Plan to display the execution plan for a query in any of the following ways:
Show Plan by default returns values in Logical mode. However, when invoking Show Plan from the Management Portal or the SQL Shell, Show Plan uses Runtime mode.
Execution Plan: Statement Text and Query Plan
The Show Plan execution plan consists of two components, Statement Text and Query Plan:
Statement Text replicates the original query, with the following modifications: The Show Plan button from the Management Portal SQL interface displays the SELECT query prefaced with DECLARE QRS CURSOR FOR (QRS is Query Result Set). This is done to allow Show Plan to use a frozen plan. The Show Plan button display also performs literal substitution, replacing each literal with a ?, unless you have suppressed literal substitution by enclosing the literal value in double parentheses. These modifications are not done when displaying a show plan using the ShowPlan() method, or when displaying alternate show plans.
Query Plan shows the plan that would be used to execute the query. A Query Plan can consist of the following:
Some operations create a Show Plan that indicates no Query Plan could be generated:
Alternate Show Plans
You can display alternate execution plans for a query using the Management Portal or the ShowPlanAlt() method.
From the Management Portal System Explorer, select Tools, SQL Performance Tools, Alternate Show Plans ([Home] > [Alternate Show Plans]). Using this tool, you input a query then press the Show Plan Options button to display multiple alternate show plans. Select the plans that you wish to compare, then press the Compare Show Plans with Stats button to run them and display their SQL statistics.
The ShowPlanAlt() method shows all of the execution plans for a query. It first shows the plan the Caché considers optimal (lowest cost), the same Show Plan display as the ShowPlan() method. ShowPlanAlt() then allows you to select an alternate plan to display. Alternate plans are listed in ascending order of cost. Specify the ID number of an alternate plan at the prompt to display its execution plan. ShowPlanAlt() then prompts you for the ID of another alternate plan. To exit this utility, press the return key at the prompt.
The following example displays the same execution plan as the ShowPlan() example, then lists alternate plans and prompts you to specify an alternate plan for display:
  ZNSPACE "Samples"
  DO $SYSTEM.SQL.SetSQLStatsJob(3)
  SET mysql=1
  SET mysql(1)="SELECT TOP 4 Name,DOB FROM Sample.Person ORDER BY Age"
  DO $SYSTEM.SQL.ShowPlanAlt(.mysql,0,1)
To display an alternate plan, specify the plan’s ID number from the displayed list and press Return. To exit ShowPlanAlt(), just press Return.
Also refer to the PossiblePlans methods in the %SYS.PTools.SQLUtilities class.
Stats
The Show Plans Options lists assigns each alternate show plan a Cost value, which enables you to make relative comparisons between the execution plans.
The Alternate Show Plan details provides for each Query Plan a set of stats (statistics) for the Query Totals, and (where applicable) for each Query plan module. The stats for each module include Time (overall performance, in seconds), Globals (number of global references), Commands (number of commands executed), and Disk Wait (disk read latency, in milliseconds). The Query Totals stats also includes Rows (the number of rows returned).
Writing Query Optimization Plans to a File
The following utility lists the query optimization plan(s) for one or more queries to a text file.
QOPlanner^%apiSQL(infile,outfile,eos,schemapath)
infile A file pathname to a text file containing a listing of cached queries. Specified as a quoted string.
outfile A file pathname where query optimization plans are to be listed. Specified as a quoted string. If the file does not exist, the system creates it. If the file already exists, Caché overwrites it.
eos Optional — The end-of-statement delimiter used to separate the individual cached queries in the infile listing. Specified as a quoted string. The default is “GO”. If this eos string does not match the cached query separator, no outfile is generated.
schemapath Optional — A comma-separated list of schema names that specifies a search path for unqualified table names, view names, or stored procedure names. Can include DEFAULT_SCHEMA, the current system-wide schema default. If infile contains #Import directives, QOPlanner adds these #Import package/schema names to the end of schemapath.
The following is an example of evoking this query optimization plans listing utility. This utility takes as input the file generated by the ExportSQL^%qarDDLExport() utility, as described in Listing Cached Queries to a File section of the “Cached Queries” chapter. You can either generate this query listing file, or write a query (or queries) to a text file.
  DO QOPlanner^%apiSQL("C:\temp\test\qcache.txt","C:\temp\test\qoplans.txt","GO")
When executed from the Terminal command line progress is displayed to the terminal screen, such as the following example:
Importing SQL Statements from file: C:\temp\test\qcache.txt
 
Recording any errors to principal device and log file: C:\temp\test\qoplans.txt
  
  SQL statement to process (number 1):
      SELECT TOP ? P . Name , E . Name FROM Sample . Person AS P , 
      Sample . Employee AS E ORDER BY E . Name
  Generating query plan...Done
 
  SQL statement to process (number 2):
      SELECT TOP ? P . Name , E . Name FROM %INORDER Sample . Person AS P 
      NATURAL LEFT OUTER JOIN Sample . Employee AS E ORDER BY E . Name
  Generating query plan...Done
 
Elapsed time: .16532 seconds
The created query optimization plans file contains entries such as the following:
<pln>
<sql>
 SELECT TOP ? P . Name , E . Name FROM Sample . Person AS P , Sample . Employee AS E ORDER BY E . Name
</sql>
Read index map Sample.Employee.NameIDX.
Read index map Sample.Person.NameIDX.
</pln>
######
<pln>
<sql>
 SELECT TOP ? P . Name , E . Name FROM %INORDER Sample . Person AS P 
    NATURAL LEFT OUTER JOIN Sample . Employee AS E ORDER BY E . Name
</sql>
Read master map Sample.Person.IDKEY.
Read extent bitmap Sample.Employee.$Employee.
Read master map Sample.Employee.IDKEY.
Update the temp-file.
Read the temp-file.
Read master map Sample.Employee.IDKEY.
Update the temp-file.
Read the temp-file.
</pln>
######
You can use the query optimization plan text files to compare generated optimization plans using different variants of a query, or compare optimization plans between different versions of Caché.
When exporting the SQL queries to the text file, a query that comes from a class method or class query will be preceded by the code line:
#import <package name>
This #import statement tells the QOPlanner utility what default package/schema to use for the plan generation of the query. When exporting the SQL queries from a routine, any #import lines in the routine code prior to the SQL statement will also precede the SQL text in the export file. Queries exported to the text file from cached queries are assumed to contain fully qualified table references; if a table reference in a text file is not fully qualified, the QOPlanner utility uses the default schema defined on the system when QOPlanner is run.
Parallel Query Processing
The optional %PARALLEL keyword is specified in the FROM clause of a query. It suggests that Caché perform parallel processing of the query, using multiple processors (if applicable). This can significantly improve performance of some queries that uses one or more COUNT, SUM, AVG, MAX, or MIN aggregate functions, and/or a GROUP BY clause, as well as many other types of queries. These are commonly queries that process a large quantity of data and return a small result set. For example, SELECT AVG(SaleAmt) FROM %PARALLEL User.AllSales GROUP BY Region would likely use parallel processing.
A “one row” query that specifies only aggregate functions, expressions, and subqueries performs parallel processing, with or without a GROUP BY clause. However, a “multi-row” query that specifies both individual fields and one or more aggregate functions does not perform parallel processing unless it includes a GROUP BY clause. For example, SELECT Name,AVG(Age) FROM %PARALLEL Sample.Person does not perform parallel processing, but SELECT Name,AVG(Age) FROM %PARALLEL Sample.Person GROUP BY Home_State does perform parallel processing.
If a query that specifies %PARALLEL is compiled in Runtime mode, all constants are interpreted as being in ODBC format.
Specifying %PARALLEL may degrade performance for some queries. Running a query with %PARALLEL on a system with multiple concurrent users may result in degraded overall performance.
For further details, refer to the FROM clause in the Caché SQL Reference.
%PARALLEL Keyword Ignored
Regardless of the presence of the %PARALLEL keyword in the FROM clause, some queries may use linear processing, not parallel processing. Caché makes the decision whether or not to use parallel processing for a query after optimizing that query, applying other query optimization options (if specified). Caché may determine that the optimized form of the query is not suitable for parallel processing, even if the user-specified form of the query would appear to benefit from parallel processing. You can determine if and how Caché has partitioned a query for parallel processing using Show Plan.
In the following circumstances specifying %PARALLEL does not perform parallel processing. The query executes successfully and no error is issued, but parallelization is not performed:
%PARALLEL in Subqueries
%PARALLEL is intended for SELECT queries and their subqueries. An INSERT command subquery cannot use %PARALLEL.
%PARALLEL is ignored when applied to a subquery that is correlated with an enclosing query. For example:
SELECT name,age FROM Sample.Person AS p 
WHERE 30<(SELECT AVG(age) FROM %PARALLEL Sample.Employee where Name = p.Name)
%PARALLEL is ignored when applied to a subquery that includes a complex predicate, or a predicate that optimizes to a complex predicate. Predicates that are considered complex include the %CONTAINS, %CONTAINSTERM, FOR SOME, and FOR SOME %ELEMENT predicates.
Shared Memory Considerations
For parallel processing, Caché supports multiple InterProcess Queues (IPQ). Each IPQ handles a single parallel query. It allows parallel work unit subprocesses to send rows of data back to the main process so the main process does not have to wait for a work unit to complete. This enables parallel queries to return their first row of data as quickly as possible, without waiting for the entire query to complete. It also improves performance of aggregate functions.
Parallel query execution uses shared memory from the generic memory heap (gmheap). Users may need to increase gmheap size if they are using parallel SQL query execution. As a general rule, the memory requirement for each IPQ is 4 x 64k = 256k. Caché splits a parallel SQL query into the number of available CPU cores. Therefore, users need to allocate this much extra gmheap:
<Number of concurrent parallel SQL requests> x <Number cores> x 256 = <required size increase (in kilobytes) of gmheap>
Note that this formula is not 100% accurate, because a parallel query can spawn sub queries which are also parallel. Therefore it is prudent to allocate more extra gmheap than is specified by this formula.
Failing to allocate adequate gmheap results in errors reported to cconsole.log. SQL queries may fail. Other errors may also occur as other subsystems try to allocate gmheap.
To review gmheap usage by an instance, including IPQ usage in particular, from the home page of the management portal choose System then System Usage, and click the Shared Memory Heap Usage link; see Generic (Shared) Memory Heap Usage in the “Monitoring Caché Using the Management Portal” chapter of the Caché Monitoring Guide for more information.
To change the size of the generic memory heap or gmheap (sometimes known as the shared memory heap or SMH), from the home page of the management portal choose System Administration then Configuration then Additional Settings then Advanced Memory; see Advanced Memory Settings in the “Caché Additional Configuration Settings” chapter of the Caché Additional Configuration Settings Reference for more information.
Cached Query Considerations
If you are running a cached SQL query which uses %PARALLEL and while this query is being initialized you do something that purges cached queries, then this query could get a <NOROUTINE> error reported from one of the worker jobs. Typical things that causes cached queries to be purged are calling $SYSTEM.SQL.Purge() or recompiling a class which this query references. Recompiling a class automatically purges any cached queries relating to that class.
If this error occurs, running the query again will probably execute successfully. Removing %PARALLEL from the query will avoid any chance of getting this error.
SQL Statements and Plan State
An SQL query which uses %PARALLEL can result in multiple SQL Statements. The Plan State for these SQL Statements is Unfrozen/Parallel. A query with a plan state of Unfrozen/Parallel cannot be frozen by user action. Refer to the SQL Statements chapter for further details.