Skip to main content

Table Statistics for Query Optimizer

To ensure maximum performance of InterSystems SQL tables, you should employ representative or anticipated data distribution metrics. As these metrics, referred to as table statistics, are used extensively by the optimizer for query planning, they can have a significant effect on any queries run against this table.

Important:

Table statics gathered by the system may include per-field values that may be visible as part of the class definition or upon export. These values cannot be traced back to a particular row, but their visibility may be a concern in certain environments.

You can set the CALCSELECTIVITY property parameter to 0 to prevent this visibility.

Collected Statistics vs. Fixed Statistics

There are two types of statistics that the query optimizer can use to create the most efficient query plan: collected statistics and fixed statistics. These two types of statistics are stored and maintained differently, but capture the same information. Each set of statistics captures the following information:

  • Exent Size: The row count for each table used within the query.

  • Field Selectivity: The percentage of rows selected when filtering for a random field value; the inverse of cardinality.

  • Average Field Size: The average length of the field.

  • Outlier Selectivity and Value: If one field value occurs significantly more frequently than all other values, the percentage of rows selected when filtering for this field value. Fields with an even distribution of values may not have any outlier value.

  • Map Size: The approximate size of the storage global used for this index or data map.

  • Histogram: The distribution of field values across a series of buckets.

If any fixed statistics exist, the query optimizer uses them to create a query plan, ignoring any collected statistics. This default behavior allows you to directly control the optimizer’s choice of query plan for your queries by testing out assigned values on your data, such as when your data has a very predictable distribution of values so that these expected statistics can be packaged as part of your application code. It also ensure backwards compatibility with any version pre-2025.2.

InterSystems recommends using representative statistics at all times to ensure efficient query plans. Using collected statistics that are automatically refreshed is the best guarantee for gathering representative statistics, though fixed statistics offer predictability when used in controlled environments.

Collected Statistics

Collected statistics are derived directly from the data that is stored in the table, rather than set manually. They are collected through either an automatic utility that runs at scheduled intervals or through an explicit collection command that you issue.

This set of statistics is stored alongside the data as part of the table’s extent index. When collected, the system stores some metadata alongside the statistics. The metadata collected includes:

  • The date and time at which the statistics were collected.

  • The duration of the collection process.

  • The username of the user that collected them.

  • The sample percentage of the table (if sampling was performed).

When new statistics for a table are collected, they are appended to the set of existing statistics; you can query the history of the table’s collected statistics through a catalog function.

Automatic Statistic Collection Mechanisms

Collected statistics can be maintained automatically by system task in a background process.

The system task collects statistics for tables until it reaches its configurable timeout value, at which point it stops collecting statistics until the next scheduled task run. In order to ensure that the system collects statistics for every table, it prioritizes table based on the following criteria:

  1. Tables that have no statistics at all.

  2. Tables whose most recently collected statistics have been invalidated (such as creating a different global mapping for the table), ranked by the age of those statistics in descending order.

  3. Tables for which the sum of the recorded ROWCOUT of INSERT, UPDATE, and DELETE statements (according to the runtime statistics in the Statement Index) exceeds 25% of the table’s most recently recorded extent size, ranked by percentage in descending order.

When statistics are older than 400 days, they are automatically purged as part of the automatic task. You may purge stale statistics manually using %SYSTEM.SQL.Stats.Table.PurgeStaleStats().

If you would like to omit a table from the automatic collection mechanism, set the SKIPAUTOMATICSTATSCOLLECTION class parameter to 1 or use the ALTER TABLE command as follows:

ALTER TABLE Sample.Table SET RUNTIME SKIPAUTOMATICSTATSCOLLECTION = TRUE

On the SQL configuration page in the Management Portal (accessed at System Administration > Configuration > SQL and Object Settings > SQL), you can change some of the default behavior of the automatic utility. By default, the following behavior is specified.

  • The system gathers collected statistics for tables that have fixed statistics. This is controlled by the Disable automatic statistics collection for tables with fixed statistics option or, alternatively, the AutoStatsForFixedStatsTable CPF parameter.

  • The system does not gather collected statistics for tables whose globals map to remote databases. This is controlled by the Enable automatic statistics collection for tables whose default data global maps to remote databases option or, alternatively, the AutoStatsForRemoteGlobalTable CPF parameter.

  • The system gathers collected statistics only for tables that support an efficient sampling algorithm. Most tables using default row or columnar storage support efficient sampling. The presence of parent-child relationships, complex class inheritance structures, or custom storage definitions may prevent efficient sampling. This is controlled by the Disable automatic statistics collection for tables supporting efficient sampling only option or, alternatively, the AutoStatsForEfficientSamplingTableOnly CPF parameter.

Manual Statistic Collection Mechanisms

To manually start the collected statistic utility, issue a COLLECT STATISTICS command, which allows you to collect statistics at either the table or schema level. You can issue this command at any time.

Note:

COLLECT STATISTICS is a synonym for the TUNE TABLE command available on versions 2025.1 and earlier.

Ignore Fixed Statistics

By default, the query optimizer ignores collected statistics in favor of fixed statistics (if they exist). To change this default behavior (for example, if you are considering dropping fixed statistics to evaluate the impact of using collected statistics), you must change the fixed statistics runtime setting at the table level or at the query level.

To ignore fixed statistics (and use collected statistics instead) at the table level, use the ALTER TABLE SQL command to set the IGNOREFIXEDSTATS setting to true. For example:

ALTER TABLE Example.Person SET IGNOREFIXEDSTATS = TRUE

To ignore fixed statistics at the query level, include the %NOFIXEDSTATS keyword on queries to cause the optimizer to build a query plan using the latest set of collected statistics.

SELECT * FROM %NOFIXEDSTATS Sample.t1

ALTER TABLE Example.Person DROP FIXED STATISTICS

Fixed Statistics

Fixed statistics are explicitly set by developers or administrators to deterministically influence a query plan, rather than collected by an automatic utility. By manually setting values for these statistics, you can directly inform the query optimizer on how to create an efficient query plan. Since these statistics are likely used alongside active application code, they are stored as part of the class’s storage definition.

When at least one statistic is present in the class definition, such as a custom extent size or selectivity for one field, the table is treated as having fixed statistics and will use compiler-generated defaults for all other statistics.

Setting Fixed Statistics: ALTER TABLE and ALTER SCHEMA

You can use the ALTER TABLE or ALTER SCHEMA commands to convert the latest collected statistics into fixed statistics, storing the collected statistics as part of a class’s storage definition, rather than its extent index. For example:

ALTER TABLE Example.Person FIX STATISTICS
ALTER SCHEMA Example FIX STATISTICS
Setting Fixed Statistics: Extent Size

At development time, you can provide an initial Extent Size value. If you do not specify an Extent Size, the default is 100,000.

  • CREATE TABLE provides an %EXTENTSIZE parameter keyword to specify the expected number of rows in the table, as shown in the following example:

    CREATE TABLE Sample.DaysInAYear (%EXTENTSIZE 366, 
                                     MonthName VARCHAR(24),Day INTEGER,
                                     Holiday VARCHAR(24),ZodiacSign VARCHAR(24))
  • A persistent class definition for a table can specify an ExtentSize parameter within the storage definition:

    <Storage name="Default">
    <Data name="MyClassDefaultData">
    ...
    <ExtentSize>200</ExtentSize>
    ...
    </Storage>
    
    
    

    In this example, the fragment is the storage definition for the MyClass class, which specifies a value of 200 for ExtentSize.

Alternatively, in the Management Portal, navigate to System Explorer > SQL, then select the table you would like to modify fixed statistics for from the drop down on the left. Select Actions from the toolbar at the top and select Tune Table information... from the list to open the Table Statistics window. On the Selectivity tab, click the edit button to the right of Current Extentsize (per shard if table is sharded) to open the ExtentSize override box. Enter your approximate ExtentSize, then click Save.

Setting Fixed Statistics: Field Selectivity
Edit the Class Definition Directly

At development time, you can provide this value by defining a Selectivity parameter within the storage definition that is part of the class definition for the table:

<Storage name="Default">
<Data name="MyClassDefaultData">
...
<Property name="CoinFlip">
<Selectivity>50%</Selectivity>
</Property>
...
</Storage>
Call an API

You can also use the SetFieldSelectivity() method to set the Selectivity value for a specific field.

Change the Value from the Management Portal

Alternatively, in the Management Portal, navigate to System Explorer > SQL, then select the table you would like to modify fixed statistics for from the drop down on the left. Select Actions from the toolbar at the top and select Tune Table information... from the list to open the Table Statistics window. On the Selectivity tab, select the field you would like to modify the selectivity for and in the Details menu on the right side, enter your approximate selectivity into the Selectivity box, then click Save.

When you edit the selectivity through the Management Portal, you can specify Selectivity either as a percentage of rows with a percent (%) sign or as an integer number of rows (no percent sign). If specified as an integer number of rows, InterSystems IRIS uses the extent size to calculate the Selectivity percentage.

Setting Fixed Statistics: Average Field Size
  1. Open the Management Portal.

  2. Navigate to System Explorer > SQL and select the table you would like to modify fixed statistics for from the drop down on the left.

  3. Select Actions from the toolbar at the top and select Tune Table information... from the list to open the Table Statistics window.

  4. On the Selectivity tab, select the field you would like to modify the selectivity for and in the Details menu on the right side, enter your approximate average field size into the Average Field Size box.

  5. Click Save.

Setting Fixed Statistics: Outlier Selectivity
  1. Open the Management Portal.

  2. Navigate to System Explorer > SQL and select the table you would like to modify fixed statistics for from the drop down on the left.

  3. Select Actions from the toolbar at the top and select Tune Table information... from the list to open the Table Statistics window.

  4. On the Selectivity tab, select the field you would like to modify the outlier selectivity for.

  5. In the Details menu on the right side, enter your approximate outlier selectivity with a percent sign (%) into the Outlier Selectivity box and enter the value of the specific outlier into the Outlier Value box. If you leave the Outlier Value blank, the system assumes the outlier value is <Null>.

  6. Click Save.

Setting Fixed Statistics: Map Size
  1. Open the Management Portal.

  2. Navigate to System Explorer > SQL and select the table you would like to modify fixed statistics for from the drop down on the left.

  3. Select Actions from the toolbar at the top and select Tune Table information... from the list to open the Table Statistics window.

  4. On the Map Size tab, select the SQL Map Name you would like to Map Size for .

  5. In the Details menu on the right side, enter your approximate Map Size into the Map Size box.

  6. Click Save.

Statistics Reference

Note:

While the following sections describe each kind of table statistics in detail, it should only be considered as a reference. The use cases for manually updating these fixed statistics beyond the methods previously described are extremely rare and should only be used in the most advanced circumstances.

InterSystems recommends using collected statistics based on actual table data.

Extent Size

The Extent Size value for a table is simply the number of rows (roughly) stored within the table. This value is used to compare the relative costs of scanning over different tables; the most important thing is to make sure that the relative values of Extent Size between associated tables represent an accurate ratio (that is, small tables should have a small value and large tables a large one).

Field Selectivity

Within an InterSystems SQL table, every column has a Selectivity value associated with it. The Selectivity value for a column is the percentage of rows within a table that would be returned as a result of query searching for a typical value of the column. The presence of outlier values may change how InterSystems SQL interprets Selectivity values.

Selectivity is based on roughly equal quantities of the distinct values. For example, suppose a table contains a CoinFlip column whose values are roughly evenly distributed between “heads” and “tails”. The Selectivity value for the CoinFlip column would be 50%. The Selectivity value for a more distinguishing property, such as State, is typically a smaller percentage. A field in which all the values are the same has a selectivity of 100%.

A field that is defined as Unique (all values different) has a selectivity of 1 (which should not be confused with a selectivity of 1.0000%). For example, a RowID has a selectivity of 1.

If your table has real (or realistic) data, the automatic collected statistics utility calculates Selectivity. The utility also determines if a value that is far more common than any other value; if it finds one, it calculates a separate Outlier Selectivity percentage to represent the selectivity for that outlier value, and calculates Selectivity representing the selectivity value that is not the outlier value. Therefore, the presence of an outlier value may dramatically change the Selectivity value.

Selectivity values are calculated and retained for tables only. When a view selects from a table, it inherits the selectivity values for the selected fields, irrespective of the view’s query, even though that view may have a different distribution of rows than the source table because of filters applied as part of the view query. This can affect the accuracy of view field selectivity.

Average Field Size

The Average Field Size is average length (in characters) for all non-Stream fields, based on the current table data set. It is the same as AVG($LENGTH(field)), rounded to two decimal places. The automatic collected statistics utility calculates this value for you. However, you may set the Average Field Size as a fixed statistic to reflect an anticipated average length of the field.

The statistics collection utility handles the following special cases:

  • NULL: Since the $LENGTH function treats NULL fields as having a length of 0, NULL fields are averaged in, with a length 0. This may result in an Average Field Size of less than one character.

  • Empty column: If a column contains no data (no field values for all of the rows), the average field size value is 1, not 0. The AVG($LENGTH(field)) is 0 for a column that contains no data.

  • ExtentSize=0: When you set ExtentSize to 0, Average Field Size for all fields is reset to 0, as the table is considered empty.

  • Logical field values: Average Field Size is always calculated based on the field’s Logical (internal) value as this is the metric that is relevant to the optimizer.

  • List fields: InterSystems IRIS List fields are calculated based on their Logical (internal) encoded value. This encoded length is longer than the total length of the elements in the list.

  • Stream fields: A stream field does not have an average field size.

If the property parameter CALCSELECTIVITY is set to 0 for a property/field, the collected statistics utility does not calculate the Average Field Size for that property/field.

Outlier Selectivity and Value

Some properties are assigned an Outlier Selectivity. This value is a percentage for a single property value that appears much more frequently in the sample than the other data values. The automatic collected statistics utility returns an outlier selectivity when there is a substantial difference between the frequency of one data value and the frequency of the other data values. At most, the utility returns one outlier for a table, regardless of the distribution of data values. If an outlier is selected, the value is displayed as the Outlier Value. NULL is represented as <Null>. If the collected statistics utility returns an Outlier Selectivity, the normal selectivity is still the percentage of each non-outlier data value within the whole set of rows.

The most common example of outlier selectivity is a property that permits NULLs. If the number of records with NULL for a property greatly exceeds the number of records that have any specific data value for that property, NULL is the outlier.

Map Size

When you compile a persistent class, the class compiler estimates the size of the map and the approximate numbers of map blocks used by each global map (index and data maps). The Management Portal reports the Map Size as the number of megabytes the map takes up; however, in the storage of the persistent class, it is reported as the number of map blocks the map takes up (reported as a BlockCount unit).

The automatic collected statistics utility does not measure the Map Size in the following circumstances:

  • If the table is a child table projected by an array or a list collection. The Map Size values for these types of child tables are the same as Map Sizes for the data map of the parent table.

  • If a global map is empty, such as when there is a new index that has not yet been built. The estimated Map Size used during class compilation is used instead.

  • If a global map is a remote global (a global in a different namespace). The estimated Map Size used during class compilation is used instead.

BlockCount in Storage Details

The BlockCount for each SQL map is stored in the storage definition of the class. Unless specified otherwise, the measured values replace the class compiler approximate values. These measured values are represented in the class definition as negative integers, to distinguish them from manually specified BlockCount values.

<SQLMap name="IDKEY">
 <BlockCount>-4</BlockCount>
</SQLMap>

You can define explicit BlockCount values in the class definition as a positive integer.

<SQLMap name="IDKEY">
 <BlockCount>12</BlockCount>
</SQLMap>

When you define a class, you can either omit defining the BlockCount for a map, explicitly specify a BlockCount as a positive integer, or explicitly define the BlockCount as NULL.

  • If you do not specify a BlockCount, or specify a BlockCount of 0, the class compiler estimates the block count. The automatic collected statistics utility replaces this value.

  • If you specify an explicit positive integer BlockCount, collected statistics utilities does not change it. Explicit class definition block count values are represented as positive integers, identified as Defined in class definition.

  • If you specify an explicit BlockCount of NULL, the SQL Map uses the BlockCount value estimated by the class compiler. Since BlockCount is “defined” in the class definition, running the collected statistics utility does not replace this estimated BlockCount value.

Histograms

For each collated field of a table, the collected statistics utility compiles histograms. These histograms describe the distribution of values by storing the boundary values of a fixed set of buckets. This information is used to estimate the selectivity of range conditions by interpolating these bucket boundaries with the range boundaries specified for the query.

To demonstrate the use of histograms in an example, say you wanted to query a certain set of transactions that were completed before a certain date using the following query:

SELECT * FROM OwnGoods.Transactions WHERE EncounterDate > '2019-01-01'

When processing this query, the optimizer can disregard any rows that have an EncounterDate from before 2019-01-01. The stored histogram for the OwnGoods.Transations table can now be used to estimate the selectivity of this condition by comparing the date with the histograms bucket boundaries. For example, assume that this date falls into the second of 16 buckets (the standard number used in histograms). If that is the case, then this condition will filter out at most about 12% (or 2/16ths) of the table’s rows and the optimizer can evaluate the benefit of using any available index on Encounter Date.

Though the above example uses a range condition with a lower bound as an example, the approach is equally effective when using an upper bound or both an upper and lower bound.

Histograms are created and stored only by the automatic collected statistics utility and cannot be created manually. However, you may use ALTER TABLE ... FIX STATISTICS to store the histogram as part of the class definition.

Exporting and Importing Statistics

You can import and export collected statistics using the corresponding methods in the $SYSTEM.SQL.Stats.Table class. Both the Import() and Export() methods contain arguments that allow you to appropriately export collected statistics.

You can export table statistics from a table or group of tables and then subsequently import into another. The following are three circumstances in which you might want to perform this export/import:

  • To model a production system: A production table is fully populated with actual data and representative statistics have been collected. In a test environment you create a table with the same table definition, but with far less data. By exporting the statistics from the production table and importing them into the test table, you can model the production table’s data distribution on the test table.

  • To revert to a prior set of statistics: You can create collect representative statistics for a table by manually running COLLECT STATISTICS, letting the automatic collected statistics utility complete, or by explicitly fixing the statistics. By exporting these statistics you can preserve them while experimenting with other statistics settings. Once you have determined the optimal set of statistics, you can import them back into the table.

You can export collected statistics to an XML file using the $SYSTEM.SQL.Stats.Table.Export() method. This method can export the collected statistics for one, more than one, or all tables within a namespace, as shown in the following examples:

  DO $SYSTEM.SQL.Stats.Table.Export("C:\AllStats.xml")
  /*  Exports statistics for all schemas/tables in the current namespace */
  DO $SYSTEM.SQL.Stats.Table.Export("C:\SampleStats.xml","Sample")
  /*  Exports statistics for all tables in the Sample schema */
  DO $SYSTEM.SQL.Stats.Table.Export("C:\SamplePStats.xml","Sample","P*")
  /*  Exports statistics for all tables beginning with the letter "P" in the Sample schema */
  DO $SYSTEM.SQL.Stats.Table.Export("C:\SamplePersonStats.xml","Sample","Person")
  /*  Exports statistics for the Sample.Person table */
  DO $SYSTEM.SQL.Stats.Table.Export("C:\SampleStats.xml","Sample","Person",,2)
  /* Exports statistics stored in the extent metadata database for table Sample.Person in the current namespace */

You can re-import any statistics that were exported using $SYSTEM.SQL.Stats.Table.Export() by using the $SYSTEM.SQL.Stats.Table.Import() method.

$SYSTEM.SQL.Stats.Table.Import() has a ClearCurrentStats boolean option. If TRUE, $SYSTEM.SQL.Stats.Table.Import() will clear any prior statistics from the existing table before importing the new statistics. The default is FALSE (0).

If $SYSTEM.SQL.Stats.Table.Import() does not find the corresponding table, it skips that table and proceeds to the next table specified in the import file. If a table is found, but some of the fields are not found, those fields will be skipped.

The BlockCount for a map in a class storage definition cannot be inherited from a super class. The BlockCount can only appear in the storage definition of the class where the map originated. $SYSTEM.SQL.Stats.Table.Import() only sets the projected table's BlockCount metadata and not the class storage BlockCount metadata if the map originated in a super class.

Note:

Collected statistics are new in version 2025.2. Any collected statistics exported through the above methods from an instance running version 2025.2 or later cannot be imported into any pre-2025.2 release.

Instead, you should export fixed statistics as part of the class definition.

Importing and Exporting Statistics with Class Definitions

When you use $SYSTEM.OBJ.Export() to export a class definition, the latest collected and fixed table statistics are included in the exported file. They can subsequently be imported using $SYSTEM.OBJ.Import().

By default, only fixed statistics are exported. However, if you add .ext to the class name you are exporting, you can export the latest collected statistics. For example:

do $SYSTEM.OBJ.Export("Sample.Person.ext","PersonLatestCollectedStats.xml")

If you pass in the qualifier /exportselectivity=1 when exporting a class definition, the system exports fixed statistics in the existing class definition format.

Important:

If you intend on exporting statistics from a current version of InterSystems IRIS and importing it into an instance of InterSystems IRIS that is running version 2025.1 or earlier, be sure to use the /exportversion=2025.1 or /exportselectivity=0 qualifiers when exporting to ensure backwards compatibility.

FeedbackOpens in a new tab