Vertically Scaling InterSystems IRIS
Scaling a system vertically by increasing its capacity and resources is a common, well-understood practice. Recognizing this, InterSystems IRIS includes a number of built-in capabilities that help you leverage the gains. Some operate transparently, while others require specific adjustments on your part to take full advantage.
This chapter discusses how to calculate the memory and CPU requirements of a server hosting an InterSystems IRIS instance and application, both initially and after collecting benchmarking and load testing results and information from existing sites, and how to take the best advantage of vertically scaling by increasing system memory or the CPU core count. In some cases, you may use these guidelines to evaluate whether a system that was chosen based on other criteria (such as corporate standards and cloud budget limits) is roughly sufficient to handle your workload requirements, whereas in others you may use them to plan the system you need based on those requirements. Additional actions that may improve performance are also discussed.
Memory Management and Scaling for InterSystems IRIS
Memory management is a critical element in optimizing performance and availability. For procedures for allocating memory within InterSystems IRIS using the Management Portal, see Memory and Startup Settings in the “Configuring InterSystems IRIS” chapter of the System Administration Guide.
Memory Management Overview
The goal of memory planning and management is to provide enough memory on each host for all of the entities running on the host under all normal operating circumstances. This is a critical factor in both performance and availability.
Generally, there are four main consumers of memory on a server hosting an InterSystems IRIS instance, as shown:
Operating system, including the file system cache.
Running applications, services, and processes other than InterSystems IRIS and the application based on it.
The memory needs of other entities processes running on the system can vary widely. If possible, make realistic estimates of the memory to be consumed by the software that will be cohosted with InterSystems IRIS.
InterSystems IRIS and application processes.
InterSystems IRIS is process-based. If you look at the operating system statistics while your application is running, you will see numerous processes running as part of InterSystems IRIS.
InterSystems IRIS shared memory, which includes:
The database cache (also known as the global buffer pool).
The routine cache.
The generic memory heap (gmheap).
Other shared memory structures.
For the best possible performance, all four of these should be maintained in physical (system) memory under all normal operating conditions. Virtual memory and mechanisms for using it such as swap space and paging are important because they enable the system to continue operating during a transient memory capacity problem, but the highest priorit (if resources allow) is to include enough physical memory to avoid the use of virtual memory altogether under normal operating circumstances. Achieving this involves these two steps:
Estimating your memory requirements before deployment.
Allocating shared memory within InterSystems IRIS once deployed.
Of course, every application is different and any given system may require a series of adjustments to optimize memory use. However, the following sections provide general guidelines to use as a basis in estimating memory requirements and allocating memory in InterSystems IRIS. Benchmarking and performance load testing the application will further influence your estimate of the ideal memory sizing and parameters.
If you have not configured sufficient physical memory on a Linux system and thus regularly come close to capacity, you run the risk that the out of memory killer may misidentify long-running InterSystems IRIS processes that touch a lot of memory in normal operation, such as the write daemon and CSP server processes, as the source of the problem and terminate them. This will result in an outage of the InterSystems IRIS instance and require crash recovery at the subsequent startup. Disabling the out of memory killer is not recommended, however, as this safety mechanism keeps your operating system from crashing when memory runs short, giving you a chance to intervene and restore InterSystems IRIS to normal operation. The recommended way to avoid this problem is to configure enough physical memory to avoid any chance of the out of memory killer coming into play. (For a detailed discussion of process memory in InterSystems IRIS, see Process Memory in InterSystems Products.)
Estimating Memory Requirements
Estimate the system memory to be installed in a physical system or provisioned in a virtual system as follows:
Estimate the memory required for the first two purposes cited in the previous section:
Operating system including the file system cache
Other installed programs.
For the sum of up to 1000 typical InterSystems IRIS processes and the instance’s shared memory requirements, plan on 4 to 8 GB per CPU core (physical or virtual).
This core count should not include any threads such as Intel HyperThreading (HT) or IBM Simultaneous Multi-Threading (SMT) (see General Performance Enhancement on InterSystems IRIS Platforms). So, for example, if you have an IBM AIX LPAR with 8 cores allocated, the calculation would be 4-8 GB * 8 = 32 to 64 GB of total RAM allocated to that LPAR, even with SMT-4 enabled (which would appear as 32 logical processors).
Bear in mind that the number of InterSystems IRIS processes running and their memory needs can vary significantly, and that shared memory requirements are influenced by various factors, including in particular the size of the database cache, which is typically larger for instances handling query-intensive workloads (including the nodes of distributed cache clusters and sharded clusters, as described in the following two chapters of this guide). In production, you can review the instance’s use of shared memory and process memory as described in the next step.
Allocating InterSystems IRIS Shared Memory
Allocate shared memory within InterSystems IRIS as follows:
On servers with less than 64 GB of RAM, allocate
50% of total memory to the database cache
256 MB minimum to the routine cache
256 MB minimum to the generic memory heap
On servers with more than 64 GB of RAM, allocate
70% of total memory to the database cache
512 MB minimum to the routine cache
384 MB minimum to the generic memory heap
Once the system is in production, you can review actual memory usage within InterSystems IRIS as follows:
To view the instance’s shared memory usage, go to Management Portal’s Shared Memory Heap Usage page (System Operation > System Usage, then click the Shared Memory Heap Usage button); for more information, see Generic (Shared) Memory Heap Usage in the Monitoring Guide.
To roughly estimate the maximum memory usage by InterSystems IRIS processes, multiply the peak number of running processes by the default Maximum Per-Process Memory (bbsiz) setting of 262.144 MB. However, if this setting has been changed to -1 for “unlimited” (see Setting the Maximum Per-Process Memory in the System Administration Guide), which is recommended by InterSystems for most production systems, a more detailed analysis is required to estimate the maximum memory usage by these processes is required. To learn more about memory use by InterSystems IRIS processes, see Process Memory in InterSystems Products.
If System Monitor (described in Using System Monitor in the Monitoring Guide) generates the alert Updates may become suspended due to low available buffers or the warning Available buffers are getting low (25% above the threshold for suspending updates) while the system is under normal production workload, the database cache (global buffer pool) is not large enough and should be increased to optimize performance.
For swap space or the page file, as a general guideline, configure the smaller of a) 25 to 50% of your physical memory or b) 32 GB as virtual memory. As noted in Memory Overview, swapping and paging degrade performance and should come into play only when transient memory capacity problems (such as the failure of a memory card) require it. Further, you should configure alerts to notify operators when the system uses virtual memory so they can take immediate action to avoid more severe consequences.
When large and huge pages are configured, as is highly recommended, InterSystems shared memory segments are pinned in physical memory and never swapped out; for more information, see Configuring Large and Huge Pages.
For procedures for allocating memory to the routine and database caches, configuring the generic memory heap, and setting the maximum memory per process, see Memory and Startup Settings in the “Configuring InterSystems IRIS” chapter of the System Administration Guide.
Vertically Scaling for Memory
Performance problems in production systems are often due to insufficient memory for application needs. Adding memory to the server hosting one or more InterSystems IRIS instances lets you allocate more to the database cache, the routine cache, generic memory, or some combination. A database cache that is too small to hold the workload’s working set forces queries to fall back to disk, greatly increasing the number of disk reads required and creating a major performance problem, so this is often a primary reason to add memory. Increases in generic memory and the routine cache may also be helpful under certain circumstances.
Configuring Large and Huge Pages
Where supported, the use of large and huge memory pages can be of significant performance benefit and is highly recommended, as described in the following:
IBM AIX® — The use of large pages is highly recommended, especially when configuring over 16GB of shared memory (the sum of the database cache, the routine cache, and the generic memory heaps, as discussed in Estimating Memory Requirements).
By default, when large pages are configured, the system automatically uses them in memory allocation. If shared memory cannot be allocated in large pages, it is allocated in standard (small) pages. However, you can use the memlock parameter for finer-grained control over large pages.
For more information, see Configuring Large Pages on IBM AIX® in the “Preparing to Install” chapter of the Installation Guide and memlock in the Configuration Parameter File Reference.
Linux (all distributions) — The use of static huge pages (2MB) when available is highly recommended for either physical (bare metal) servers or virtualized servers. Using static huge pages for the InterSystems IRIS shared memory segments yields an average CPU utilization reduction of approximately 10-15% depending on the application.
By default, when huge pages are configured, InterSystems IRIS attempts to provision shared memory in huge pages on startup. If there is not enough space, InterSystems IRIS reverts to standard pages and orphans the allocated huge page space, potentially causing system paging. However, you can use the memlock parameter to control this behavior and fail at startup if huge page allocation fails.
For more information, see Configuring Huge Pages on Linux in the “Preparing to Install” chapter of the Installation Guide and memlock in the Configuration Parameter File Reference.
The use of large pages is recommended to reduce page table entry (PTE) overhead.
By default, when large pages are configured, InterSystems IRIS attempts to provision shared memory in large pages on startup. If there is not enough space, InterSystems IRIS reverts to standard pages. However, you can use the memlock parameter to control this behavior and fail at startup if large page allocation fails.
For more information, see Configuring Large Pages on Windows in the “Preparing to Install” chapter of the Installation Guide and memlock in the Configuration Parameter File Reference.
CPU Sizing and Scaling for InterSystems IRIS
InterSystems IRIS is designed to make the most of a system’s total CPU capacity. Keep in mind that not all processors or processor cores are alike. There are variations at the surface such as clock speed, number of threads per core, and processor architectures, and also the varying impact of virtualization.
Basic CPU Sizing
Applications vary significantly from one to another, and there is no better measurement of CPU resource requirements than benchmarking and load testing your application and performance statistics collected from existing sites. If neither benchmarking or existing customer performance data is available, start with one of the following calculations:
1-2 processor cores per 100 users.
1 processor core for every 200,000 global references per second.
These recommendations are only starting points when application-specific data is not available, and may not be appropriate for your application. It is very important to benchmark and load test your application to verify its exact CPU requirements.
Balancing Core Count and Speed
Given a choice between faster CPU cores and more CPU cores, consider the following:
The more processes your application uses, the greater the benefit of raising the core count to increase concurrency and overall throughput.
The fewer processes your application uses, the greater the benefit of the fastest possible cores.
For example, an application with a great many users concurrently running simple queries will benefit from a higher core count, while one with relatively fewer users executing compute-intensive queries would benefit from faster but fewer cores. In theory, both applications would benefit from many fast cores, assuming there is no resource contention when multiple processes are running in all those cores simultaneously. As noted in Calculating Memory Requirements and Allocation, the number of processor cores is a factor in estimating the memory to provision for a server, so increasing the core count may require additional memory.
Virtualization Considerations for CPU
Production systems are sized based on benchmarks and measurements at live customer sites. Virtualization using shared storage adds very little CPU overhead compared to bare metal, so it is valid to size virtual CPU requirements from bare metal monitoring.
For hyper-converged infrastructure (HCI) deployments, add 10% to your estimated host-level CPU requirements to cover the overhead of HCI storage agents or appliances.
In determining the best core count for individual VMs, strike a balance between the number of hosts required for availability and minimizing costs and host management overhead; by increasing core counts, you may be able to satisfy the former requirement without violating the latter.
The following best practices should be applied to virtual CPU allocation:
Production systems, especially database servers, are assumed to be highly utilized and should therefore be initially sized based on assumed equivalence between a physical CPU and its virtual counterpart. If you need six physical CPUs, assume you need six virtual CPUs.
Do not allocate more vCPUs than required to optimize performance. Although large numbers of vCPUs can be allocated to a virtual machine, there can be a (usually small) performance overhead for managing unused vCPUs. The key here is to monitor your systems regularly to ensure that vCPUs are correctly allocated.
Leveraging Core Count with Parallel Query Execution
When you upgrade by adding CPU cores, an InterSystems IRIS feature called parallel query execution helps you take the most effective advantage of the increased capacity.
Parallel query execution is built on a flexible infrastructure for maximizing CPU usage that spawns one process per CPU core, and is most effective with large data volumes, such as analytical workloads that make large aggregations.
For more information on parallel query processing, see Parallel Query Processing in the “Optimizing Query Performance “ chapter of the SQL Optimization Guide.
General Performance Enhancement on InterSystems IRIS Platforms
The following information may be helpful in improving the performance of your InterSystems IRIS deployment.
In most situations, the use of Intel Hyper-Threading or AMD Simultaneous Multithreading (SMT) is recommended for improved performance, either within a physical server or at the hypervisor layer in virtualized environments. There may be situations in a virtualized environment in which disabling Hyper-Threading or SMT is warranted; however, those are exceptional cases specific to a given application.
In the case of IBM AIX®, IBM Power processors offer multiple levels of SMT at 2, 4, and 8 threads per core. With the latest IBM Power9 processors, SMT-8 is the level most commonly used with InterSystems IRIS. There may be cases, however, especially with previous generation Power7 and Power8 processors, in which SMT-2 or SMT-4 is more appropriate for a given application. Benchmarking the application is the best approach to determining the ideal SMT level for a specific deployment.
By default, InterSystems IRIS allocates the minimum number of semaphore sets by maximizing the number of semaphores per set (see Semaphores in InterSystems Products). However, this is some evidence that this is not ideal for performance on Linux systems with non-uniform memory access (NUMA) architecture.
To address this, the semsperset parameter in the configuration parameter file (CPF) can be used to specify a lower number of semaphores per set. By default, semsperset is set to 0, which specifies the default behavior. Determining the most favorable setting will likely require some experimentation; if you have InterSystems IRIS deployed on a Linux/NUMA system, InterSystems recommends that you try an initial value of 250.