Introduction to SQL Performance Optimization
InterSystems SQL supports several features that optimize SQL performance for InterSystems IRIS® data platform.
Table Definition Optimization
SQL performance is fundamentally dependent upon good data architecture. Dividing your data into multiple tables and establishing relationships between those tables is essential to efficient SQL. How to define tables and their relations to each other is beyond the scope of this book.
This book describes the following operations to optimize a table definition. These operations require that the table be defined, but do not require the table to be populated with data:
Global Naming Strategy: you can use the USEEXTENTSET parameter to specify shorter and more efficient hashed global names for data and index lookup operations.
Indices: you can define an index for a table field or group of fields. You can define several different type of indices: standard, bitmap, bitslice, and bitmap extent. SQL optimization uses the defined indices, rather than the data values themselves, to access specific records for a query, update, or delete operation.
Table Data Optimization
You can perform the following operations to optimize table access based on analysis of typical data in the table:
Tune Table: examines typical table data and generates ExtentSize (number of rows), Selectivity (percent of rows having a specific value), and BlockCount metadata. The query optimizer uses this information to determine the most efficient query execution plan.
Selectivity and Outlier Selectivity: determines the percent of rows having a specific value for a field, and whether one value is an outlier value, a value significantly more common than the other values for that field.
In almost all cases, execution of a query written in Embedded SQL is faster than the same query written in Dynamic SQL. Also note that because of cached queries, re-execution of a query is significantly faster than its initial execution for both Embedded SQL and Dynamic SQL. For further details, see Dynamic SQL versus Embedded SQL.
You can perform the following operations to optimize the execution of a specific query. These query optimizations use existing table definition and table data optimizations:
Runtime Statistics: used to measure the performance of query execution on your system.
Show Plan displays the execution plan for a query.
Cached Queries and Literal Substitution: maintains a cache of recent dynamic queries, allowing for re-execution of a query without repeating the overhead of preparing the query.
SQL Statements and Frozen Plans allows you to retain a query execution plan, allowing you to make changes to a table without risking the performance degradation of an existing query.
Index configuration and usage: used to specify how existing indices should be used.
Index optimization hints: %ALLINDEX, %IGNOREINDEX
JOIN optimization hints: %FIRSTTABLE, %FULL, %INORDER, %STARTTABLE
Subquery optimization hints: %NOFLATTEN, %NOMERGE, %NOREDUCE, %NOSVSO
Parallel query execution: FROM %PARALLEL
You can also improve query performance against large database tables by using data sharding.
By default, the Memory and Startup Settings default to Automatically configured, and the Maximum Per-Process Memory defaults to 262144 kb. To optimize SQL running on a production system, you should change the default to Manually configured, and increase the Maximum Per-Process Memory setting. For further details, refer to Memory and Startup Settings in the “Configuring InterSystems IRIS” chapter of the System Administration Guide.
Optimizing Large Tables using Sharding
Sharding is the partitioning of data and its associated cache across a number of systems. A sharded cluster partitions large database tables horizontally — that is, by row — across multiple InterSystems IRIS instances, called shard data servers, while allowing applications to transparently access these tables through a single instance, called the shard master data server.
A table must be defined as sharded. A sharded table can only be used in a sharded environment; a non-sharded table can be used in a sharded or non-sharded environment. Not all tables are good candidates for sharding. Optimal performance in a sharded environment is generally achieved by using a combination of sharded tables (generally very large tables) and non-sharded tables. For further details, refer to Evaluating the Benefits of Sharding and Evaluate Existing Tables for Sharding in the Scalability Guide.
For further details on deploying a sharded cluster, creating sharded tables, and querying using sharding, refer to Horizontally Scaling InterSystems IRIS for Data Volume with Sharding in the Scalability Guide.
InterSystems SQL supports Fast Select, Fast Insert, and Fast Truncate Table. “Fast” means that standard invocations of these SQL commands are executed using highly-efficient internal code. These Fast operations “just work”; no special syntax is used and no optimization options are provided.
Fast Select is supported for SELECT queries over ODBC or JDBC. Fast Insert is supported for INSERT operations over JDBC. Fast Truncate Table is supported for TRUNCATE TABLE operations that do not involve referential integrity.
Not all tables can support Fast operations, and not all command syntax can be executed using fast execution. InterSystems SQL performs fast execution in circumstances where it can be; if fast execution cannot be performed, InterSystems SQL performs standard execution of the specified command.