Interpreting an SQL Query Plan
This chapter explains how to read a system-generated InterSystems SQL Query Plan. It breaks down the tools you can use to view these plans, as well as how to interpret the language you will find in them.
When a SQL query is compiled, the process produces a set of instructions to access and return the data specified by the query. The instructions and the sequence in which they are executed are influenced by the data the SQL compiler has about the structure and content of the tables involved in the query. The compiler attempts to use information such as table sizes and available indexes to make the set of instructions as efficient as possible.
The query access plan (Show Plan) is a human-readable translation of that resulting set of instructions. The author of the query can use this query access plan to see how the data will be accessed. While the SQL compiler tries to make the most efficient use of data as specified by the query, you may know more about some aspect of the stored data than is evident to the compiler. In this case, you can make use of the query plan to modify the original query to provide more information or more guidance to the query compiler.
Viewing the Plan
You can use the EXPLAIN or Show Plan tools to display an execution plan for SELECT, DECLARE, UPDATE, DELETE, TRUNCATE TABLE, and some INSERT operations. These are collectively known as query operations because they use a SELECT query as part of their execution. InterSystems IRIS generates an execution plan when a query operation is prepared; you do not have to actually execute the query to generate an execution plan.
By default, these tools display what InterSystems IRIS considers to be the optimal query plan. For most queries there is more than one possible query plan. In addition to the query plan that InterSystems IRIS deems as optimal, you can also generate and display alternate query execution plans.
InterSystems IRIS provides the following query plan tools:
The SQL EXPLAIN command can be used to generate an XML-formatted query plan and, optionally, alternate query plans and SQL statistics. All generated query plans and statistics are included in a single result set field named Plan. Note that the EXPLAIN command can only be used with a SELECT query.
The Management Portal—>System Explorer—>SQL interface Show Plan button.
The Management Portal—>System Explorer—>Tools—>SQL Performance Tools.
The $SYSTEM.SQL.Explain() method can be used to generate and display an XML-formatted query plan and, optionally, alternate query plans.
The SHOW PLAN and SHOW PLANALT Shell commands can be used from the SQL Shell to display the execution plan for the most recently executed query.
For generated %PARALLEL and Sharded queries, these tools display all of the applicable query plans.
You can generate a query execution plan by executing an EXPLAIN command, like the one in the following example:
EXPLAIN SELECT TOP 10 Name,DOB FROM Sample.Person
To use the EXPLAIN command, a user must have the %Development:USE resource.
If you specify the ALT keyword, the EXPLAIN command will generate alternate query plans and include them in the returned query plan.
If you specify the STAT keyword, the EXPLAIN command will generate performance statistics for each module of the query. For each module, the following statistics are returned:
<ModuleName>: module name.
<TimeSpent>: total execution time for the module, in seconds.
<GlobalRefs>: a count of global references.
<LinesOfCode>: a count of lines of code executed.
<DiskWait>: disk wait time in seconds.
<RowCount>: number of rows in result set.
<ModuleCount>: number of times this module was executed.
<Counter>: number of times this program was executed.
The query plan is returned as an XML-formatted string. The highest order tag is <plans>. If you did not specify the ALT keyword, the <plans> tag will contain a single <plan> tag that has <sql> tags (which specify the query), a <cost> tag (which contains the relative cost of this particular plan), and some text that describes how the SQL optimizer processed the query. If you did specify the ALT keyword, multiple <plan> tags will appear in the <plans> tag. If you specified the STAT keyword, a series of <stats> tags will appear after the <cost> tag; a separate <stats> tag is generated for each module involved in the processing of the query.
Using Show Plan in the Management Portal
You can use Show Plan to display the execution plan for a query in any of the following ways:
From the Management Portal SQL interface: Select System Explorer, then SQL. Select a namespace by clicking the name of the current namespace displayed at the top of the page. (You can set the Management Portal default namespace for each user.) Write a query, then press the Show Plan button. (You can also invoke Show Plan from the Show History listing by clicking the plan option for a listed query.) See Executing SQL Statements in the “Using the Management Portal SQL Interface” chapter of this manual.
From the Management Portal Tools interface: Select System Explorer, then Tools, then select SQL Performance Tools, then SQL Runtime Statistics:
From the Query Test tab: Select a namespace by clicking the name of the current namespace displayed at the top of the page. Write a query in the text box. Then press the Show Plan with SQL Stats button. This generates a Show Plan without executing the query.
From the View Stats tab: Press the Show Plan button for one of the listed queries. The listed queries include both those written at Execute Query, and those written at Query Test.
Show Plan by default returns values in Logical mode. However, when invoking Show Plan from the Management Portal or the SQL Shell, Show Plan uses Runtime mode.
Structure of the Plan
The Show Plan execution plan consists of two components, Statement Text and Query Plan.
Statement Text replicates the original query, with the following modifications: The Show Plan button from the Management Portal SQL interface displays the SQL statement with comments and line breaks removed. Whitespace is standardized. The Show Plan button display also performs literal substitution, replacing each literal with a ?, unless you have suppressed literal substitution by enclosing the literal value in double parentheses. These modifications are not performed when displaying a show plan using the Explain() method, or when displayed using the SQL Runtime Statistics or Alternate Show Plans tools.
Query Plan shows the plan that would be used to execute the query. A Query Plan can include the following:
“Frozen Plan” is the first line of Query Plan if the query plan has been frozen; otherwise, the first line is blank.
“Relative cost” is an integer value which is computed from many factors as an abstract number for comparing the efficiency of different execution plans for the same query. This calculation takes into account (among other factors) the complexity of the query, the presence of indexes, and the size of the table(s). Relative cost is not useful for comparing two different queries. “Relative cost not available” is returned by certain aggregate queries, such as COUNT(*) or MAX(%ID) without a WHERE clause.
The Query Plan consists of a main module, and (when needed) one or more subcomponents. One or more module subcomponents may be shown, named alphabetically, starting with B: Module:B, Module:C, etc.), and listed in the order of execution (not necessarily alphabetically).
A named subquery module is shown for each subquery in the query. Subquery modules are named alphabetically. Subquery naming skips one or more letters before each named subquery. When the end of the alphabet is reached, additional subqueries are numbered, parsing Z=26 and using the same skip sequence. The following example is an every-third subquery naming sequence starting with Subquery:F: F, I, L, O, R, U, X, 27, 30, 33. The following example is an every-second subquery naming sequence starting with Subquery:G: G, I, K, M, O, Q, S, U, W, Y, 27, 29. If a subquery calls a module, the module is placed in alphabetical sequence after the subquery with no skip. For example, Subquery:H might call Module:I.
“Read master map” as the first bullet item in the main module indicates an inefficient Query Plan. The Query Plan begins execution with one of the following map type statements Read master map... (no available index), Read index map... (use available index), or Generate a stream of idkey values using the multi-index combination... (Multi Index, use multiple indexes). Because the master map reads the data itself, rather than an index to the data, Read master map... almost always indicates an inefficient Query Plan. Unless the table is relatively small, you should define an index so that when you regenerate the Query Plan the first map says Read index map.... For information on interpreting a Query Plan, refer to “Interpret an SQL Execution Plan.”
Some operations create a Show Plan that indicates no Query Plan could be generated:
Non-query INSERT: An INSERT... VALUES() command does not perform a query, and therefore does not generate a Query Plan.
Query always FALSE: In a few cases, InterSystems IRIS can determine when preparing a query that a query condition will always be false, and thus cannot return data. The Show Plan informs you of this situation in the Query Plan component. For example, a query containing the condition WHERE %ID IS NULL or the condition WHERE Name %STARTSWITH('A') AND Name IS NULL cannot return data, and therefore InterSystems IRIS generates no execution plan. Rather than generating an execution plan, the Query Plan says “Output no rows”. If a query contains a subquery with one of these conditions, the subquery module of the Query Plan says “Subquery result NULL, found no rows”. This condition check is limited to a few situations involving NULL, and is not intended to catch all self-contradictory query conditions.
Invalid query: Show Plan displays an SQLCODE error message for most invalid queries. However, in a few cases, Show Plan displays as empty. For example, WHERE Name = $$$$$ or WHERE Name %STARTSWITH('A") (note single-quote and double-quote). In these cases, Show Plan displays no Statement Text, and Query Plan says [No plan created for this statement]. This commonly occurs when quotation marks delimiting a literal are imbalanced. It also occurs when you specify two or more leading dollar signs without specifying the correct syntax for a user-defined (“extrinsic”) function.
Reading the Plan
The result of Show Plan is a series of statements about what processing will be done to access and present the data specified in the query. The following provides information on how to interpret Show Plan statements.
An SQL table is stored as a set of maps. Each table has a master map that contains all the data in the table; the table may also have other maps such as index maps and bitmaps. Each map can be pictured as a multidimensional global, with the data for some fields in one or more subscripts, and with the remaining fields stored in the node value. The subscripts control what data is being accessed.
For the master map, the RowID or the IDKEY fields are normally used as the map subscripts.
For an index map, normally other fields are used as the leading subscript(s), with the RowID/IDKEY fields as additional lower-level subscripts.
For a bitmap, the bitmap layer can be thought of as an additional RowID subscript level. However, bitmaps can only be used for RowIDs that are positive integers.
The plan for a query could access several tables. When accessing a table, the plan may access a single map (index or master map), two maps (an index map followed by the master map), or, in the case of a multi-index plan, several maps.
In accessing the data via a map, the plan indicates the subscripts used. It also indicates what the actual subscript values will be: a single given value, a set of given values, a range of values, or all values present in the table for that subscript. Which one is chosen depends on the conditions specified in the query. Obviously, accessing a single subscript value or only a few subscript values is faster than accessing all the values at that subscript level.
Conditions and Expressions
When the query is run, various conditions specified by the query are tested. Except for certain subscript-limiting conditions as just mentioned, the Show Plan output does not explicitly indicate the testing of conditions. It is always best to test conditions as early as possible. The optimal place for testing the various conditions can be inferred from the plan details.
Similarly, Show Plan does not detail the computation of expressions and sub-expressions. Besides simplicity, the main reason for this is that in most database environments, table and index access constitute the more important aspect of processing; the cost of retrieving the table data dominates the overall query cost, as disk access speed is still orders of magnitude slower than CPU processing.
When accessing data from a table, it is often necessary to examine multiple rows iteratively. Such access is indicated by a loop. The instructions to be executed for each pass are referred to as the body of the loop. They are visually indicated by being indented. It is common for database access involving multiple tables to require loops within loops. In this case, each loop level is indicated by a further indentation when compared to the previous level.
A query plan might also indicate the need to build and use an intermediate temporary file (temp-file). This is a “scratch” area in a local array. It is used to save temporary results for various purposes, such as sorting. Just like a map, a temp-file has one or more subscripts, and possibly also node data.
Some temp-files contain data from processing a single table. In this instance, building the temp-file could be considered pre-processing for the data in that table. Reading such a temp-file may or may not be followed by accessing the master map of the source table. In other cases, temp-files could contain the results of processing multiple tables. In still other situations, temp-files are used to store grouped aggregate values, to check DISTINCT, etc.
The building of temp-files, as well as other processing, may be delegated to a separate unit of work called a module. Each module is named. When separate modules are listed, the plan indicates where each module is invoked. When execution of the module finishes, processing resumes at the next statement following the module invocation.
Queries Sent for Processing
For external tables linked through an ODBC or JDBC gateway connection, the plan shows the text of the query being sent to the remote SQL Gateway Connection to retrieve the requested data from the remote tables.
For parallel query processing and for sharding, the plan shows the various queries being sent to be processed in parallel or on the shards. The plan used for each of these queries is also displayed.
Sub-Queries, JOINs and UNIONs
Some subqueries (and views) within the given query might also be processed separately. Their plans are specified in separate subquery sections. The precise place where a subquery section is called from is not indicated in the plan. This is because they are often invoked as part of the processing of conditions or expressions.
For queries that specify OUTER JOIN, the plan may indicate the possible generation of a row of NULLs if no matching rows were found, in order to satisfy the requirements of the outer join semantics.
For UNION, the plan might indicate the combining of the result rows from the various union subqueries in a separate module, where further processing of these result rows may be done.
Alternate Show Plans
You can display alternate execution plans for a query using the Management Portal or the Explain() method.
To display alternate execution plans for a query from the Management Portal using either of the following:
Select System Explorer, select Tools, select SQL Performance Tools, then select Alternate Show Plans.
Select System Explorer, select SQL, then from the Tools drop-down menu select Alternate Show Plans.
Using the Alternate Show Plans tool:
Input an SQL query text, or retrieve one using the Show History button. You can clear the query text field by clicking the round "X" circle on the right hand side.
Press the Show Plan Options button to display multiple alternate show plans. The Run ... in the background check box is unselected by default, which is the preferred setting for most queries. It is recommended that you select the Run ... in the background check box for large or complex queries. While a long query is being run in background a View Process button is shown. Clicking View Process opens the Process Details page in a new tab. From the Process Details page, you can view the process, and may Suspend, Resume or Terminate the process.
Possible Plans are listed in ascending order by Cost, with the Map Type and Starting Map. You can select the Show Plan (no statistics) or Show Plan with Stats link for each plan for further details.
From the list of possible plans, use the check boxes to select the plans that you wish to compare, then press the Compare Show Plans with Stats button to run them and display their SQL statistics.
The Explain()Opens in a new tab method with the all qualifier shows all of the execution plans for a query. It first shows the plan the InterSystems IRIS considers optimal (lowest cost), then displays alternate plans. Alternate plans are listed in ascending order of cost.
The following example displays the optimal execution plan, then lists alternate plans:
SET mysql(1)="SELECT TOP 4 Name,DOB FROM Sample.Person ORDER BY Age"
Also refer to the possiblePlans methods in the %SYS.PTools.StatsSQLOpens in a new tab class.
The Show Plans Options lists assigns each alternate show plan a Cost value, which enables you to make relative comparisons between the execution plans.
The Alternate Show Plan details provides for each Query Plan a set of stats (statistics) for the Query Totals, and (where applicable) for each Query plan module. The stats for each module include Time (overall performance, in seconds), Global Refs (number of global references), Commands (number of lines executed), and Read Latency (disk wait, in milliseconds). The Query Totals stats also includes the number of Rows Returned.