Caché SQL Optimization Guide
Optimizing Tables
[Home] [Back] [Next]
InterSystems: The power behind what matters   
Class Reference   
Search:    

There are a number of things you can do to ensure the maximum performance of Caché SQL tables. The optimizations can have a significant effect on any queries run against this table. The following performance optimizing considerations are discussed in this chapter:

ExtentSize, Selectivity, and BlockCount
When the Query Optimizer decides the most efficient way to execute a specific SQL query, three of the things it considers are:
In order to ensure that the Query Optimizer can make the correct decisions, it is important that these values are set correctly.
You can compare your explicitly set statistics to the Tune Table generated results. If the assumptions made by Tune Table result in less-than-optimal results from the Query Optimizer, you can use an explicitly set statistic rather than a Tune Table generated statistic.
In Studio the Class Editor window displays the class source code. At the bottom of the source code it displays the Storage definition, which includes the class ExtentSize, and the Selectivity (and, where appropriate, the OutlierSelectivity) for each property.
ExtentSize
The ExtentSize value for a table is simply the number of rows (roughly) stored within the table.
At development time, you can provide an initial ExtentSize value. If you do not specify an ExtentSize, the default is 100,000.
Typically, you provide a rough estimate of what you expect the size of this table will be when populated with data. It is not important to have an exact number. This value is used to compare the relative costs of scanning over different tables; the most important thing is to make sure that the relative values of ExtentSize between associated tables represent an accurate ratio (that is, small tables should have a small value and large tables a large one).
If your table has real (or realistic) data, you can automatically calculate and set its ExtentSize value using the Tune Table facility within the Management Portal; for details, see the following section on Tune Table.
Selectivity
Within a Caché SQL table (class), every column (property) has a Selectivity value associated with it. The Selectivity value for a column is the percentage of rows within a table that would be returned as a result of query searching for a typical value of the column. Selectivity is 1/D where D is the number of distinct values for the field.
Selectivity is based on roughly equal quantities of the distinct values. For example, suppose a table contains a Gender column whose values are roughly evenly distributed between “M” and “F”. The Selectivity value for the Gender column would be 50%. The Selectivity value for a more distinguishing property, such as Street Name, is typically a small percentage.
A field in which all the values are the same has a selectivity of 100%. A field that is defined as Unique (all values different) has a selectivity of 1 (which should not be confused with a selectivity of 1.0000%).
At development time, you can provide this value by defining a Selectivity parameter within the storage definition that is part of the class definition for the table:
<Storage name="Default">
<Data name="MyClassDefaultData">
...
<Property name="Gender">
<Selectivity>50%</Selectivity>
</Property>
...
</Storage>
To view a class’s storage definition, in Studio, from the View menu, select View Storage; Studio includes the storage at the bottom of the source code for the class.
Typically you provide an estimate of what you expect the Selectivity will be when used within an application. As with ExtentSize, it is not important to have an exact number. Many of the data type classes provided by Caché will provide reasonable default values for Selectivity.
You can also use the $SYSTEM.SQL.SetFieldSelectivity() method to set the Selectivity value for a specific field (property).
If your table has real (or realistic) data, you can automatically calculate and set its Selectivity values using the Tune Table facility within the Management Portal. Tune Table determines if a field has an outlier value, a value that is far more common than any other value. If so, Tune Table calculates a separate Outlier Selectivity percentage, and calculates Selectivity based on the presence of this outlier value. The presence of an outlier value may dramatically change the Selectivity value.
BlockCount
When you compile a persistent class, the class compiler computes approximate numbers of map blocks used by each SQL map based on the ExtentSize and the property definitions. You can view these BlockCount values in the Map BlockCount tab of the Tune Table facility. The BlockCount is identified in Tune Table as Estimated by class compiler. Note that if you change the ExtentSize, you must close and re-open the SQL Tune Table window to see this change reflected in the BlockCount values.
When you run Tune Table, it measures the actual block count for each SQL map. Unless specified otherwise, the Tune Table measured values replace the class compiler approximate values. These Tune Table measured values are represented in the class definition as negative integers, to distinguish them from specified BlockCount values. This is shown in the following example:
<SQLMap name="IDKEY">
 <BlockCount>-4</BlockCount>
</SQLMap>
Tune Table measured values are represented in Tune Table as positive integers, identified as Measured by TuneTable.
You can define explicit BlockCount values in the class definition. You can explicitly specify a block count as a positive integer, as shown in the following example:
<SQLMap name="IDKEY">
 <BlockCount>12</BlockCount>
</SQLMap>
When you define a class you can omit defining the BlockCount for a map, explicitly specify a BlockCount as a positive integer, or explicitly define the BlockCount as NULL.
The size of all Caché SQL map blocks is 2048 bytes (2K bytes).
Tune Table does not measure BlockCount in the following circumstances:
Tune Table
Tune Table is a utility that examines the data in a table and returns statistics about the ExtentSize (the number of rows in the table), the relative distribution of distinct values in each field, and the Average Field Size (average length of values in each field). It also generates the BlockCount for each SQL map. You can specify that Tune Table use this information to update the metadata associated with a table and each of its fields. The query optimizer can subsequently use these statistics to determine the most efficient execution plan for a query.
Using Tune Table on an external table will only calculate the ExtentSize. Tune Table cannot calculate field Selectivity values, Average Field Size, or map BlockCount values for an external table.
When to Run Tune Table
You should run Tune Table on each table after that table has been populated with a representative quantity of real data. Commonly, you only need to run Tune Table once, as a final step in application development, before the data goes “live.” Tune Table is not a maintenance utility; it should not be run periodically on live data.
Note:
There are rare cases where running Tune Table can decrease SQL performance. While Tune Table can be run on live data, it is recommended that you run Tune Table on a test system with real data, rather than on a production system.
Generally, Tune Table should not be re-run when table data is added, modified, or deleted, unless there is an order-of-magnitude change to the characteristics of the current data, as follows:
Running Tune Table
You can run Tune Table using the Management Portal, or by invoking the $SYSTEM.SQL.TuneTable() method. Before you run Tune Table, the table must contain some data.
After running the Tune Table facility, the resulting ExtentSize and Selectivity values are saved in the class’s storage definition. To view the storage definition, in Studio, from the View menu, select View Storage; Studio includes the storage at the bottom of the source code for the class.
Tune Table from the Management Portal
To run Tune Table from the Management Portal:
  1. Select System Explorer, then SQL ([Home] > [SQL]). Select a namespace by clicking the Switch option at the top of the page, then selecting a namespace from the displayed list. (You can set the Management Portal default namespace for each user.)
  2. Select a Schema from the drop-down list on the left side of the screen, or use a Filter. For further details on how to use Schema and Filter, refer to Browsing SQL Schemas in the “Using the Management Portal SQL Interface” of this manual.
  3. Do one of the following:
The SQL Tune Table window has two tabs: Selectivity and Map BlockCount. These tabs display the current values generated by Tune Table. They also allow you to manually set different values than the values generated by Tune Table.
The Selectivity tab contains the following fields:
The Map BlockCount tab contains the following fields:
From the Selectivity tab, you can click the Tune Table button to run Tune Table on this table.
Tune Table using a Method
You can use the $SYSTEM.SQL.TuneTable() method to run the Tune Table facility on a table. This is shown in the following example:
  ZNSPACE "Samples"
  DO $SYSTEM.SQL.TuneTable("Sample.Person",0,1)
 
In this example, the second parameter specifies that the Calculated results from this invocation of Tune Table should not update the table’s current ExtentSize and Selectivity values. The third parameter specifies to display the Tune Table results at the Terminal.
When using the TuneTable() method, the following error messages may be generated:
You can use the $SYSTEM.SQL.TuneSchema() method to run the Tune Table facility for all the tables in a schema.
Running Tune Table on Live Data
Tune Table can be run on live systems (systems where table data is being actively accessed and modified). Running Tune Table will have a performance impact.
Running Tune Table on a live system with the Keep class up to date check box unchecked, or running the TuneTable() method with KeepUpToDate=0 can cause application errors, because in-use cached queries could be automatically purged (deleted). You should check the Keep class up to date check box or the $SYSTEM.SQL.TuneTable() KeepClassUpToDate=1 parameter to prevent the class from being treated as out-of-date after Tune Table is run.
After running Tune Table, you will need to purge cached queries (either manually, or by recompiling the class) for the new Tune Table values to be used by the query optimizer.
Extent Size and the Row Count
When running the Tune Table facility from the Management Portal, the ExtentSize is the actual count of the rows currently in the table. By default, the TuneTable() method also uses the actual row count as the ExtentSize. When a table contains a large number of rows, it may be preferable to perform analysis on a smaller number of rows. When running the TuneTable() method, you can optionally specify a different ExtentSize, as shown in the following example:
  ZNSPACE "Samples"
  DO $SYSTEM.SQL.TuneTable("Sample.Person",0,1,,,,,50)
 
You can use this option to improve TuneTable() performance when running against a table with a large number of rows. If you specify an ExtentSize smaller than the actual number of rows in the table, this ExtentSize number should be large enough to sample representative data.
A specified ExtentSize can be smaller or larger than the actual number of rows. However, ExtentSize should not significantly exceed the actual number of rows in the current table data. When you specify an ExtentSize, TuneTable() extrapolates row Ids for that number of rows, then performs sampling. If the ExtentSize greatly exceeds the actual number of rows, most of the sampled row Ids will not correspond to actual row data. If this is the case, field selectivities cannot be calculated; instead, TuneTable() lists the ExtentSize you specified as the CALCULATED ExtentSize and a smaller number as the SAMPLESIZE; TuneTable() returns <Not Specified> for these non-existent Calculated values.
You can specify a SamplePercent for the TuneTable() method. This specifies the percentage of the ExtentSize to sample. SamplePercent defaults to "" (the null string). This default samples the data as follows:
You can set an ExtentSize of 0. This may be desirable when you have a table that is never intended to be populated with data, but used for other purposes such as query joins. When you set ExtentSize to 0, Caché sets the Selectivity of each field as 100%, and the Average Field Size of each field as 0.
CALCSELECTIVITY Parameter and Not Calculating Selectivity
Under certain circumstances, you may not want the Tune Table facility to calculate the Selectivity for a property. To prevent Selectivity from being calculated, specify the value of the property’s CALCSELECTIVITY parameter to 0 (the default is 1). In Studio, you can set CALCSELECTIVITY on the Property Parameters page of the New Property Wizard or in the list of a property’s parameters in the Inspector (you may need to contract and re-expand the property parameter list to display it).
One circumstance where you should specify CALCSELECTIVITY=0 is a field that is known to contain only one value in all rows (Selectivity=100%), if that field is not indexed.
Selectivity and Outlier Selectivity
Tune Table calculates a Selectivity for each property (field) value as a percentage. It does this by sampling the data, so selectivity is always an estimate, not an exact value. Selectivity is based on the assumption that all property values are, or could be, equally likely. This is a reasonable assumption for most data. For example, in a general population table most data values are typical: any given date of birth will appear in around .27% of the data (1 in 365); roughly half will be female and half male (50%). A field that is defined as Unique has a selectivity of 1 (which should not be confused with a selectivity of 1.0000 (1%). A selectivity percentage is sufficient for most properties.
For a few properties, Tune Table also calculates an Outlier Selectivity. This is a percentage for a single property value that appears much more frequently in the sample than the other data values. Tune Table only returns an outlier selectivity when there is a substantial difference between the frequency of one data value and the frequency of the other data values. Tune Table returns, at most, one outlier for a table, regardless of the distribution of data values. If an outlier is selected, Tune Table displays this value as the Outlier Value. NULL is represented as <Null>.
If Tune Table returns an outlier selectivity, the normal selectivity is still the percentage of each non-outlier data value within the whole set of rows. For example, if the outlier selectivity is 80% and the regular selectivity is 1%, then in addition to the outlier value, you can expect to find about 20 ((1-.80)/.01) additional non-outlier values.
The most common example of outlier selectivity is a property that permits NULLs. If the number of records with NULL for a property greatly exceeds the number of records that have any specific data value for that property, NULL is the outlier. The following is the Selectivity and Outlier Selectivity for the FavoriteColors field:
SELECTIVITY of FIELD FavoriteColors
   CURRENT =     1.8966%
   CALCULATED =  1.4405%
   CURRENT OUTLIER = 45.0000%, VALUE = <Null>
   CALCULATED OUTLIER = 39.5000%, VALUE = <Null> 
If a field only contains one distinct value (all rows have the same value), it has a Selectivity of 100%. A value that has a selectivity of 100% is not considered to be an outlier. Normally, Tune Table establishes Selectivity and Outlier Selectivity values by sampling the data and estimating, which is not sufficient for establishing that there is only one value. All rows must be examined. Consequently, Tune Table can only recognize a field of this sort if the field is indexed, the field is the first field of the index, and the field and the index have the same collation type. If the field is not indexed as described, you should manually specify a Selectivity of 100%, delete any outlier selectivity, and set CALCSELECTIVITY=0 to prevent Tune Table attempting to calculate selectivity or specify this value as an outlier.
To modify these Selectivity, Outlier Selectivity, and Outlier Value calculated values, select an individual field from the Tune Table display. This displays these values for that field in the Details area to the right of the display. You can modify Selectivity, Outlier Selectivity, and/or Outlier Value to values that better fit the anticipated full data set.
Outlier Optimization
By default, the query optimizer assumes that a query will not select for the outlier value. For example, a query commonly selects for specific field value(s) and returns a small number of records from the database, rather than returning the large number of records in which that field value is the outlier. The query optimizer always uses Selectivity for construction of the query plan unless you perform some action that requests consideration of the Outlier Selectivity.
There are several actions you can perform to adjust query optimization based on selecting for an outlier value:
The Notes Column
The Management Portal Tune Table Information option displays a Notes column for each field. The values in this field are system-defined and non-modifiable. They include the following:
An IDENTITY field, ROWVERSION field, SERIAL field, or UNIQUEIDENTIFIER (GUID) field is not identified in the Notes column.
Average Field Size
Running Tune Table calculates the average field size (in characters) for all non-Stream fields, based on the current table data set. This is (unless otherwise noted) the same as AVG($LENGTH(field)), rounded to two decimal places. You can change this average field size for individual fields to reflect the anticipated average size of the field’s data.
If the property parameter CALCSELECTIVITY is set to 0 for a property/field, Tune Table does not calculate the Average Field Size for that property/field.
You can modify an Average Field Size calculated value by selecting an individual field from the Tune Table display. This displays the values for that field in the Details area to the right of the display. You can modify the Average Field Size to a value that better fits the anticipated full data set. Because Tune Table performs no validation when you set this value, you should make sure that the field is not a Stream field, and that the value you specify is not larger than the maximum field size (MaxLen).
The Average Field Size is also displayed in the Management Portal Catalog Details tab Fields option table. Tune Table must have been run for the Fields option table to display Average Field Size values. For further details, refer to the Catalog Details Tab section in the “Using the Management Portal SQL Interface” chapter of this guide.
Map BlockCount Tab
The Tune Table Map BlockCount tab displays the SQL Map Name, BlockCount (as a positive integer), and Source of BlockCount. The Source of BlockCount can be Defined in class definition, Estimated by class compiler, or Measured by TuneTable. Running Tune Table changes Estimated by class compiler to Measured by TuneTable; it does not affect Defined in class definition values.
You can modify a BlockCount calculated value by selecting an individual SQL Map Name from the Tune Table display. This displays the block count for that Map Name in the Details area to the right of the display. You can modify the BlockCount to a value that better fits the anticipated full data set. Because Tune Table performs no validation when you set this value, you should make sure that the block count is a valid value. Modifying BlockCount changes the Source of BlockCount to Defined in class definition. For further details, refer to the BlockCount section in this chapter.
Exporting and Re-importing Tune Table Statistics
You can export Tune Table statistics to an XML file using the $SYSTEM.SQL.ExportTuneStats() method. This method can export the Tune Table statistics for one, more than one, or all tables within a namespace, as shown in the following examples:
  DO $SYSTEM.SQL.ExportTuneStats("C:\AllStats.xml")
  /*  Exports TuneTable Statistics for all schemas/tables in the current namespace */
  DO $SYSTEM.SQL.ExportTuneStats("C:\SampleStats.xml","Sample")
  /*  Exports TuneTable Statistics for all tables in the Sample schema */
  DO $SYSTEM.SQL.ExportTuneStats("C:\SamplePStats.xml","Sample","P*")
  /*  Exports TuneTable Statistics for all tables beginning with the letter "P" in the Sample schema */
  DO $SYSTEM.SQL.ExportTuneStats("C:\SamplePersonStats.xml","Sample","Person")
  /*  Exports TuneTable Statistics for the Sample.Person table */
You can re-import Tune Table statistics that were exported using ExportTuneStats() by using the $SYSTEM.SQL.ImportTuneStats() method.
ImportTuneStats() has a KeepClassUpToDate boolean option. If TRUE (and update is TRUE), ImportTuneStats() will update the class definition with the new EXTENTSIZE and SELECTIVITY values, but the class definition will be kept as up-to-date. In many cases, however, it is desirable to recompile the class after its table has been tuned so that queries in the class definition can be recompiled and the SQL query optimizer can use the updated data statistics. The default is FALSE (0). Note that if the class is deployed the class definition will not be updated.
ImportTuneStats() has a ClearCurrentStats boolean option. If TRUE, ImportTuneStats() will clear any prior EXTENTSIZE, SELECTIVITY, BLOCKCOUNT and other Tune Table statistics from the existing table before importing the stats. This can be used if you want to completely clear those table stats that are not specified in the import file, instead of leaving them defined in the persistent class for the table. The default is FALSE (0).
If ImportTuneStats() does not find the corresponding table, it skips that table and proceeds to the next table specified in the import file. If a table is found, but some of the fields are not found, those fields will simply be skipped.
The BlockCount for a map in a class storage definition cannot be inherited. The BlockCount can only appear in the storage definition of the class where the map originated. ImportTuneStats() only sets the projected table's BlockCount metadata and not the class storage BlockCount metadata if the map originated in a super class.