InterSystems SQL Optimization Guide
There are a number of things you can do to ensure the maximum performance of InterSystems SQL tables on InterSystems IRIS Data Platform™. The optimizations can have a significant effect on any queries run against this table. The following performance optimizing considerations are discussed in this chapter:
to analyze representative table data in a populated table; this generated metadata is used to optimize future queries.
ExtentSize, Selectivity, and BlockCount
When the Query Optimizer decides the most efficient way to execute a specific SQL query, three of the things it considers are:
row count for each table used within the query.
the percentage of distinct values calculated for each column used by the query.
count for each SQL map used by the query.
In order to ensure that the Query Optimizer can make the correct decisions, it is important that these values are set correctly.
You can explicitly set any of these statistics during class (table) definition, prior to populating the table with data.
After populating the table with representative data, you can run Tune Table
to calculate these statistics.
After running Tune Table, you can override a calculated statistic by specifying an explicit value.
You can compare your explicitly set statistics to the Tune Table generated results. If the assumptions made by Tune Table result in less-than-optimal results from the Query Optimizer, you can use an explicitly set statistic rather than a Tune Table generated statistic.
the Class Editor window displays the class source code. At the bottom of the source code it displays the Storage definition, which includes the class ExtentSize
, and the Selectivity
(and, where appropriate, the OutlierSelectivity
) for each property.
value for a table is simply the number of rows (roughly) stored within the table.
At development time, you can provide an initial ExtentSize
value. If you do not specify an ExtentSize
, the default is 100,000.
Typically, you provide a rough estimate of what you expect the size of this table will be when populated with data. It is not important to have an exact number. This value is used to compare the relative costs of scanning over different tables; the most important thing is to make sure that the relative values of ExtentSize
between associated tables represent an accurate ratio (that is, small tables should have a small value and large tables a large one).
provides an %EXTENTSIZE
parameter keyword to specify the expected number of rows in the table, as shown in the following example:
CREATE TABLE Sample.DaysInAYear (%EXTENTSIZE 366,
MonthName VARCHAR(24),Day INTEGER,
Holiday VARCHAR(24),ZodiacSign VARCHAR(24))
A persistent class definition for a table can specify an ExtentSize parameter within the storage definition
In this example, the fragment is the storage definition for the MyClass
class, which specifies a value of 200 for ExtentSize
If your table has real (or realistic) data, you can automatically calculate and set its ExtentSize
value using the Tune Table facility within the Management Portal
; for details, see the following section on Tune Table
Within an InterSystems SQL table (class), every column (property) has a Selectivity
value associated with it. The Selectivity
value for a column is the percentage of rows within a table that would be returned as a result of query searching for a typical value of the column. Selectivity is 1/D where D is the number of distinct values for the field.
is based on roughly equal quantities of the distinct values. For example, suppose a table contains a Gender
column whose values are roughly evenly distributed between M
. The Selectivity
value for the Gender
column would be 50%
. The Selectivity
value for a more distinguishing property, such as Street Name
, is typically a small percentage.
A field in which all the values are the same has a selectivity of 100%. A field that is defined as Unique (all values different) has a selectivity of 1 (which should not be confused with a selectivity of 1.0000%).
At development time, you can provide this value by defining a Selectivity
parameter within the storage definition that is part of the class definition for the table:
To view a class’s storage definition, in Studio, from the
; Studio includes the storage at the bottom of the source code for the class.
Typically you provide an estimate of what you expect the Selectivity
will be when used within an application. As with ExtentSize
, it is not important to have an exact number. Many of the data type classes provided by InterSystems IRIS will provide reasonable default values for Selectivity
If your table has real (or realistic) data, you can automatically calculate and set its Selectivity
values using the Tune Table facility
within the Management Portal
. Tune Table determines if a field has an outlier value, a value that is far more common than any other value. If so, Tune Table calculates a separate Outlier Selectivity
percentage, and calculates Selectivity
based on the presence of this outlier value. The presence of an outlier value may dramatically change the Selectivity
When you run Tune Table, it measures the actual block count for each SQL map. Unless specified otherwise, the Tune Table measured values replace the class compiler approximate values. These Tune Table measured values are represented in the class definition as negative integers, to distinguish them from specified BlockCount values. This is shown in the following example:
Tune Table measured values are represented in Tune Table as positive integers, identified as Measured by TuneTable
You can define explicit BlockCount values in the class definition. You can explicitly specify a block count as a positive integer, as shown in the following example:
When you define a class you can omit defining the BlockCount for a map, explicitly specify a BlockCount as a positive integer, or explicitly define the BlockCount as NULL.
If you do not specify a BlockCount, or specify a BlockCount of 0, the class compiler estimates the block count. Running Tune Table replaces the class compiler estimated value.
If you specify an explicit positive integer BlockCount, running Tune Table does not replace this explicit BlockCount value. Explicit class definition block count values are represented in Tune Table as positive integers, identified as Defined in class definition
. These block count values are not changed by subsequently running Tune Table.
If you specify an explicit BlockCount of NULL, the SQL Map uses the BlockCount value estimated by the class compiler. Because BlockCount is defined in the class definition, running Tune Table does not replace this estimated BlockCount value.
The size of all InterSystems SQL map blocks is 2048 bytes (2K bytes).
Tune Table does not measure BlockCount in the following circumstances:
If the table is a child table projected by an array or a list collection. The BlockCount values for these types of child tables are the same as BlockCount for the data map of the parent table.
If a global map is a remote global
(a global in a different namespace). The estimated BlockCount used during class compilation is used instead.
Tune Table is a utility that examines the data in a table and returns statistics about the ExtentSize (the number of rows in the table), the relative distribution of distinct values in each field, and the Average Field Size (average length of values in each field). It also generates the BlockCount for each SQL map
. You can specify that Tune Table use this information to update the metadata associated with a table and each of its fields. The query optimizer can subsequently use these statistics to determine the most efficient execution plan for a query.
Using Tune Table on an external table
will only calculate the ExtentSize. Tune Table cannot calculate field Selectivity values, Average Field Size, or map BlockCount values for an external table.
You should run Tune Table on each table after that table has been populated with a representative quantity of real data. Commonly, you only need to run Tune Table once, as a final step in application development, before the data goes live. Tune Table is not a maintenance utility; it should not
be run periodically on live data.
There are rare cases where running Tune Table can decrease SQL performance. While Tune Table can be run on live data, it is recommended that you run Tune Table on a test system with real data, rather than on a production system.
Generally, Tune Table should not be re-run when table data is added, modified, or deleted, unless there is an order-of-magnitude change to the characteristics of the current data, as follows:
Relative Table Sizes: Tune Table assumes that it is analyzing a representative subset of the data. This subset can be only a small percentage of the full data set, if it is a representative subset. Tune Table results remain relevant as the number of rows in a table changes, provided that the ExtentSizes of tables involved in joins or other relationships maintain roughly the same relative sizes. ExtentSize needs to be updated if the ratio between joined tables changes by an order of magnitude. This is important for JOIN
statements, because the SQL optimizer uses ExtentSize when optimizing the table join order. As a general rule, a smaller table is joined before a larger table, regardless of the join order specified in the query. Therefore, you would want to re-run Tune Table on one or more tables if the ratio of rows in tableA to tableB changes from 1000:2000 to 10000:2000, but not if it changes to 2100:4000.
Even Value Distribution: Tune Table assumes that every data value is equally likely. If it detects an outlier value, it assumes that every data value other than the outlier value is equally likely. Tune Table establishes Selectivity by analyzing the current data values for each field. Equal likelihood in real data is always a rough approximation; normal variation in the number of distinct data values and their relative distribution should not warrant re-running Tune Table. However, an order-of-magnitude change in the number of possible values for a field (the ratio of distinct values to records), or the overall likelihood of a single field value can result in inaccurate Selectivity. Dramatically changing the percentage of records with a single field value can cause Tune Table to designate an outlier value or to remove outlier value designation, significantly changing the calculated Selectivity. If the Selectivity of a field no longer reflects the actual distribution of data values, you should re-run Tune Table.
A significant InterSystems IRIS upgrade, or a new site installation may warrant re-running Tune Table.
You can run Tune Table using the Management Portal
, or by invoking the $SYSTEM.SQL.TuneTable()
method. Before you run Tune Table, the table must contain some data.
If the table is mapped to a readonly datababase, Tune Table cannot be performed and an error message is generated.
After running the Tune Table facility, the resulting ExtentSize
values are saved in the class’s storage definition. To view the storage definition, in Studio, from the
; Studio includes the storage at the bottom of the source code for the class.
Tune Table from the Management Portal
To run Tune Table from the Management Portal
Select System Explorer
, then SQL
. Select a namespace by clicking the Switch
option at the top of the page, then selecting a namespace from the displayed list. (You can set the Management Portal default namespace
for each user.)
Tune a Single Table: Expand the Tables
category and select a table from the list. Once you have selected a table, click the Actions
drop-down list and select Tune Table Information
. This displays the table’s current ExtentSize and Selectivity information. If Tune Table has never been run, ExtentSize=100000, no Selectivity, Outlier Selectivity, Outlier Value, or Average Field Size information is shown (other than the ID having a selectivity of 1), and the Map BlockCount information is listed as Estimated by class compiler
Tune All Tables in the Schema: click the Actions
drop-down list and select Tune All Tables in Schema
. This displays the Tune Table box. Select the Finish
button to run Tune Table on all tables in the schema. When Tune Table completes this box displays a Done
button. Select Done
to exit the Tune Table box.
The SQL Tune Table window has two tabs: Selectivity
and Map BlockCount
. These tabs display the current values generated by Tune Table. They also allow you to manually set different values than the values generated by Tune Table.
Keep class up to date
check box. Any changes to statistics generated by Tune Table, or by user input value from the Tune Table interface, or from Tune Table methods are immediately represented in the class definition.
If this box is checked (Yes), the modified class definition is automatically compiled, and thus kept up to date. Cached queries
that use this persistent class (table) are not purged. This is the preferred option when making changes to statistics on a live system
If this box is not checked (No), the modified class definition is flagged as out of date, requiring compilation. Cached queries
that use this persistent class (table) are immediately purged. The default is not checked.
Fields table with columns for Field Name, Selectivity, Notes
, Outlier Selectivity, Outlier Value, and Average Field Size
. By clicking on a Fields table heading, you can sort by that column’s values. By clicking on a Fields table row, you can manually set values for Selectivity, Outlier Selectivity, Outlier Value, and Average Field Size for that field.
Map Name table with columns for SQL Map Name, BlockCount, and Source of BlockCount. By clicking on an individual map name, you can manually set a BlockCount value for that map name.
From the Selectivity
tab, you can click the
button to run Tune Table on this table.
Tune Table using a Method
You can use the $SYSTEM.SQL.TuneTable()
method to run the Tune Table facility on a table in the current namespace. This is shown in the following example:
In this example, the second parameter specifies that the Calculated results from this invocation of Tune Table should not update the table’s current ExtentSize
values. The third parameter specifies to display the Tune Table results at the Terminal.
When using the TuneTable()
method, the following error messages may be generated:
Running Tune Table on Live Data
Tune Table can be run on live systems (systems where table data is being actively accessed and modified). Running Tune Table will have a performance impact.
After running Tune Table, you will need to purge cached queries
(either manually, or by recompiling the class) for the new Tune Table values to be used by the query optimizer.
Extent Size and the Row Count
When running the Tune Table facility from the Management Portal
, the ExtentSize
is the actual count of the rows currently in the table. By default, the TuneTable()
method also uses the actual row count as the ExtentSize. When a table contains a large number of rows, it may be preferable to perform analysis on a smaller number of rows. When running the TuneTable()
method, you can optionally specify a different ExtentSize, as shown in the following example:
You can use this option to improve TuneTable()
performance when running against a table with a large number of rows. If you specify an ExtentSize smaller than the actual number of rows in the table, this ExtentSize number should be large enough to sample representative data.
A specified ExtentSize can be smaller or larger than the actual number of rows. However, ExtentSize should not significantly exceed the actual number of rows in the current table data. When you specify an ExtentSize, TuneTable()
extrapolates row Ids for that number of rows, then performs sampling. If the ExtentSize greatly exceeds the actual number of rows, most of the sampled row Ids will not correspond to actual row data. If this is the case, field selectivities cannot be calculated; instead, TuneTable()
lists the ExtentSize you specified as the CALCULATED ExtentSize and a smaller number as the SAMPLESIZE; TuneTable()
returns <Not Specified> for these non-existent Calculated values.
You can specify a SamplePercent for the TuneTable()
method. This specifies the percentage of the ExtentSize to sample. SamplePercent defaults to "" (the null string). This default samples the data as follows:
If ExtentSize < 1000: SamplePercent="" uses the entire extent size. For example, an ExtentSize of 600 would use a sample of 600.
If ExtentSize >= 1000: SamplePercent="" uses 3*sqrt(ExtentSize). For example, an ExtentSize of 10,000 would use a sample of 300.
You can set an ExtentSize of 0. This may be desirable when you have a table that is never intended to be populated with data, but used for other purposes such as query joins. When you set ExtentSize to 0, InterSystems IRIS sets the Selectivity of each field as 100%, and the Average Field Size
of each field as 0.
CALCSELECTIVITY Parameter and Not Calculating Selectivity
Under certain circumstances, you may not want the Tune Table facility to calculate the Selectivity
for a property. To prevent Selectivity
from being calculated, specify the value of the property’s CALCSELECTIVITY
parameter to 0 (the default is 1). In Studio, you can set CALCSELECTIVITY
on the Property Parameters
page of the New Property Wizard
or in the list of a property’s parameters in the Inspector (you may need to contract and re-expand the property parameter list to display it).
One circumstance where you should specify CALCSELECTIVITY
=0 is a field that is known to contain only one value in all rows (Selectivity=100%), if that field is not indexed.
Selectivity and Outlier Selectivity
Tune Table calculates a Selectivity
for each property (field) value as a percentage. It does this by sampling the data, so selectivity is always an estimate, not an exact value. Selectivity is based on the assumption that all property values are, or could be, equally likely. This is a reasonable assumption for most data. For example, in a general population table most data values are typical: any given date of birth will appear in around .27% of the data (1 in 365); roughly half will be female and half male (50%). A field that is defined as Unique has a selectivity of 1 (which should not be confused with a selectivity of 1.0000 (1%). A selectivity percentage is sufficient for most properties.
For a few properties, Tune Table also calculates an Outlier Selectivity
. This is a percentage for a single property value that appears much more frequently in the sample than the other data values. Tune Table only returns an outlier selectivity when there is a substantial difference between the frequency of one data value and the frequency of the other data values. Tune Table returns, at most, one outlier for a table, regardless of the distribution of data values. If an outlier is selected, Tune Table displays this value as the Outlier Value
. NULL is represented as <Null>
If Tune Table returns an outlier selectivity, the normal selectivity is still the percentage of each non-outlier data value within the whole set of rows. For example, if the outlier selectivity is 80% and the regular selectivity is 1%, then in addition to the outlier value, you can expect to find about 20 ((1-.80)/.01) additional non-outlier values.
The most common example of outlier selectivity is a property that permits NULLs. If the number of records with NULL for a property greatly exceeds the number of records that have any specific data value for that property, NULL is the outlier. The following is the Selectivity and Outlier Selectivity for the FavoriteColors field:
SELECTIVITY of FIELD FavoriteColors
CURRENT = 1.8966%
CALCULATED = 1.4405%
CURRENT OUTLIER = 45.0000%, VALUE = <Null>
CALCULATED OUTLIER = 39.5000%, VALUE = <Null>
If a field only contains one distinct value (all rows have the same value), it has a Selectivity of 100%. A value that has a selectivity of 100% is not considered to be an outlier. Normally, Tune Table establishes Selectivity and Outlier Selectivity values by sampling the data and estimating, which is not sufficient for establishing that there is only one value. All rows must be examined. Consequently, Tune Table can only recognize a field of this sort if the field is indexed, the field is the first field of the index, and the field and the index have the same collation type. If the field is not indexed as described, you should manually specify a Selectivity of 100%, delete any outlier selectivity, and set CALCSELECTIVITY
=0 to prevent Tune Table attempting to calculate selectivity or specify this value as an outlier.
By default, the query optimizer assumes that a query will not
select for the outlier value. For example, a query commonly selects for specific field value(s) and returns a small number of records from the database, rather than returning the large number of records in which that field value is the outlier. The query optimizer always uses Selectivity
for construction of the query plan unless you perform some action that requests consideration of the Outlier Selectivity
There are several actions you can perform to adjust query optimization based on selecting for an outlier value:
If the Outlier Value
is <null>, specify an IS NULL or IS NOT NULL condition for that field in the query WHERE clause. This causes the query optimizer to use the Outlier Selectivity
when constructing the query.
If the Outlier Value
is a data value, the query optimizer assumes that the field value you are selecting for is not the outlier value. For example, the employee records for a Massachusetts-based company might have an Office_State field outlier of ‘MA’ (Massachusetts). The optimizer assumes that the query is not going to select for ‘MA’, because that would return the majority of the records in the database. If, however, you are writing a query that does
select for the outlier value, you can inform the optimizer of this by enclosing the outlier value in double parentheses. When querying on the field, specify a WHERE clause such as the following: WHERE Office_State=(('MA'))
. This technique suppresses literal substitution
and forces the query optimizer to use the Outlier Selectivity
when constructing the query plan. This syntax is necessary for Dynamic SQL queries, and for queries written outside of InterSystems IRIS that are supplied using ODBC/JDBC. It is not necessary for class queries, Embedded SQL queries, or queries accessed through a view. For further details, refer to the WHERE clause
Optimization for an outlier value is automatically performed using Run Time Plan Choice (RTPC). Do not use the Bias queries as outlier
configuration option. Leave this check box unselected.
The Management Portal Tune Table Information
option displays a Notes column for each field. The values in this field are system-defined and non-modifiable. They include the following:
: A table has one RowID
, which is defined by the system. Its name is commonly ID, but it can have a different system-assigned name. Because all of its values are (by definition) unique, its Selectivity is always 1. If the class definition includes SqlRowIdPrivate
, the Notes column value is RowID field, Hidden field
: A hidden field is defined as private, and is not displayed by SELECT *. By default, CREATE TABLE
defines the RowID field as hidden; you can specify the %PUBLICROWID
keyword to make the RowID not hidden and public. By default, tables defined by a persistent class definition define the RowID as not hidden; you can specify SqlRowIdPrivate
to define the RowID as hidden and private. Container fields are defined as hidden.
Running Tune Table calculates the average field size (in characters) for all non-Stream fields, based on the current table data set. This is (unless otherwise noted) the same as AVG($LENGTH(field))
, rounded to two decimal places. You can change this average field size for individual fields to reflect the anticipated average size of the field’s data.
NULL: Because the $LENGTH
function treats NULL fields as having a length of 0, NULL fields are averaged in, with a length 0. This may result in an Average Field Size of less than one character.
Empty column: If a column contains no data (no field values for all of the rows), the average field size value is 1, not 0. The AVG($LENGTH(field))
is 0 for a column that contains no data.
ExtentSize=0: When you set ExtentSize to 0, Average Field Size for all fields is reset to 0.
Logical field values: Average Field Size is always calculated based on the field’s Logical (internal) value.
List fields: InterSystems IRIS List fields are calculated based on their Logical (internal) encoded value. This encoded length is longer than the total length of the elements in the list.
Stream fields: A stream field does not have an average field size.
If the property parameter CALCSELECTIVITY
is set to 0 for a property/field, Tune Table does not calculate the Average Field Size for that property/field.
You can modify an Average Field Size
calculated value by selecting an individual field from the Tune Table display. This displays the values for that field in the Details
area to the right of the display. You can modify the Average Field Size
to a value that better fits the anticipated full data set. Because Tune Table performs no validation when you set this value, you should make sure that the field is not a Stream field, and that the value you specify is not larger than the maximum field size (MaxLen
You can modify a BlockCount
calculated value by selecting an individual SQL Map Name from the Tune Table display. This displays the block count for that Map Name in the Details
area to the right of the display. You can modify the BlockCount
to a value that better fits the anticipated full data set. Because Tune Table performs no validation when you set this value, you should make sure that the block count is a valid value. Modifying BlockCount
changes the Source of BlockCount
to Defined in class definition
. For further details, refer to the BlockCount
section in this chapter.
Exporting and Re-importing Tune Table Statistics
You can export Tune Table statistics from a table or group of tables and then import these Tune Table statistics into a table or group of tables. The following are three circumstances in which you might want to perform this export/import. (For simplicity, these describe the export/import of statistics from a single table; in actual use, export/import of statistics from multiple inter-related tables is often performed):
To model a production system: A production table is fully populated with actual data and optimized using Tune Table. In a test environment you create a table with the same table definition but far less data. By exporting the Tune Table statistics from the production table and importing them into the test table, you can model the production table optimization on the test table.
To replicate a production system: A production table is fully populated with actual data and optimized using Tune Table. A second production table with the same table definition is created. (For example, a production environment and its backup environment, or a multiple identical table definitions with each table containing the patient records for a different hospital.) By exporting the Tune Table statistics from the first table and importing them into the second table, you can give the second table the same optimization as the first table without the overhead of running Tune Table a second time or waiting for the second table to be populated with representative data.
To revert to a prior set of statistics: You can create optimization statistics for a table by running Tune Table or by explicitly setting statistics. By exporting these statistics you can preserve them while experimenting with other statistics settings. Once you have determined the optimal set of statistics, you can import them back into the table.
You can export Tune Table statistics to an XML file using the $SYSTEM.SQL.ExportTuneStats()
method. This method can export the Tune Table statistics for one, more than one, or all tables within a namespace, as shown in the following examples:
/* Exports TuneTable Statistics for all schemas/tables in the current namespace */
/* Exports TuneTable Statistics for all tables in the Sample schema */
/* Exports TuneTable Statistics for all tables beginning with the letter "P" in the Sample schema */
/* Exports TuneTable Statistics for the Sample.Person table */
has a KeepClassUpToDate boolean option. If TRUE (and update is TRUE), ImportTuneStats()
will update the class definition with the new EXTENTSIZE and SELECTIVITY values, but the class definition will be kept as up-to-date. In many cases, however, it is desirable to recompile the class after its table has been tuned so that queries in the class definition can be recompiled and the SQL query optimizer can use the updated data statistics. The default is FALSE (0). Note that if the class is deployed the class definition will not be updated.
has a ClearCurrentStats boolean option. If TRUE, ImportTuneStats()
will clear any prior EXTENTSIZE, SELECTIVITY, BLOCKCOUNT and other Tune Table statistics from the existing table before importing the stats. This can be used if you want to completely clear those table stats that are not specified in the import file, instead of leaving them defined in the persistent class for the table. The default is FALSE (0).
does not find the corresponding table, it skips that table and proceeds to the next table specified in the import file. If a table is found, but some of the fields are not found, those fields will simply be skipped.
The BlockCount for a map in a class storage definition cannot be inherited. The BlockCount can only appear in the storage definition of the class where the map originated. ImportTuneStats()
only sets the projected table's BlockCount metadata and not the class storage BlockCount metadata if the map originated in a super class.