Vertically Scaling InterSystems IRIS
Scaling a system vertically by increasing its capacity and resources is a common, well-understood practice. Recognizing this, InterSystems IRIS™ includes a number of built-in capabilities that help you leverage the gains. Some operate transparently, while others require specific adjustments on your part to take full advantage.
This chapter discusses how to calculate the memory and CPU requirements of a server hosting an InterSystems IRIS instance and application, both initially and after collecting benchmarking and load testing results and information from existing sites, and how to take the best advantage of vertically scaling by increasing system memory or the CPU core count. In some cases, you may use these guidelines to evaluate whether a system that was chosen based on other criteria (such as corporate standards and cloud budget limits) is roughly sufficient to handle your workload requirements, whereas in others you may use them to plan the system you need based on those requirements. Additional actions that may improve performance are also discussed.
Memory management is a critical element in optimizing performance. For basic information on memory allocation and management in InterSystems IRIS, see Managing InterSystems IRIS Memory
in the Preparing to Install chapter of the Installation Guide
Generally, there are four main consumers of memory on a server hosting an InterSystems IRIS instance. At a high level, you can calculate the amount of physical memory required by simply adding up the requirements of each of the items on the following list. All of them use real memory, but they can also use virtual memory; a key part of capacity planning is to size a system so that there is enough physical memory to avoid paging.
Operating system, including the file system cache
InterSystems IRIS and application processes
InterSystems IRIS is process-based. If you look at the operating system statistics while your application is running, you will see numerous processes running as part of InterSystems IRIS.
InterSystems IRIS shared memory, which includes
The database and routine caches
The generic memory heap (gmheap)
Other shared memory structures
Of course, every application is different and any given system may require a series of adjustments to optimize memory use. However, the following provides general rules to use as a basis in sizing memory for your application. Benchmarking and performance load testing the application will further influence your estimate of the ideal memory sizing and parameters.
When InterSystems IRIS is first installed, routine and database cache memory allocation is set to Automatically, under which InterSystems IRIS allocates a conservative fraction of the available physical memory for the database cache (global buffers), not to exceed 1 GB. This setting is not appropriate for production use.
For procedures for allocating memory to the routine and database caches and the configuring generic memory heap, as well as setting the maximum per process memory (also on the Memory and Startup page), see Managing InterSystems IRIS Memory
in the Preparing to Install chapter of the Installation Guide
If you are configuring a data server in a distributed cache cluster, see Increase Data Server Database Caches for ECP Control Structures
in the Horizontally Scaling for User Volume with Distributed Caching chapter of this guide for important information about adjustments to database cache sizes that may be necessary.
Performance problems in production systems are often due to insufficient memory for application needs. Adding memory to the server hosting one or more InterSystems IRIS instances lets you allocate more to the database cache, the routine cache, generic memory, or some combination. A database cache that is too small to hold the workload’s working set forces queries to fall back to disk, greatly increasing the number of disk reads required and creating a major performance problem, so this is often a primary reason to add memory. Increases in generic memory and the routine cache may also be helpful under certain circumstances.
Where supported, the use of large and huge memory pages can be of significant performance benefit, as described in the following:
IBM AIX® — The use of large pages is recommended, especially when configuring over 16GB of shared memory (the sum of the database cache, the routine cache, and the generic memory heaps, as discussed in Calculating Initial Memory Requirements
By default, when large pages are configured, the system automatically uses them in memory allocation. If shared memory cannot be allocated in large pages, it is allocated in standard (small) pages. However, you can use the memlock parameter for finer-grained control over large pages.
Linux (all distributions) The use of static huge pages (2MB) when available is highly recommended for either physical (bare metal) servers or virtualized servers. Using static huge pages for the InterSystems IRIS shared memory segments yields an average CPU utilization reduction of approximately 10-15% depending on the application.
By default, when huge pages are configured, InterSystems IRIS attempts to provision shared memory in huge pages on startup. If there is not enough space, InterSystems IRIS reverts to standard pages and orphans the allocated huge page space, potentially causing system paging. However, you can use the memlock parameter to control this behavior and fail at startup if huge page allocation fails.
The use of large pages is recommended to reduce page table entry (PTE) overhead.
By default, when large pages are configured, InterSystems IRIS attempts to provision shared memory in large pages on startup. If there is not enough space, InterSystems IRIS reverts to standard pages. However, you can use the memlock parameter to control this behavior and fail at startup if large page allocation fails.
InterSystems IRIS is designed to make the most of a system’s total CPU capacity. Keep in mind that not all processors or processor cores are alike. There are variations at the surface such as clock speed, number of threads per core, and processor architectures, and also the varying impact of virtualization.
Applications vary significantly from one to another, and there is no better measurement of CPU resource requirements than benchmarking and load testing your application and performance statistics collected from existing sites. If neither benchmarking or existing customer performance data is available, start with one of the following calculations:
These recommendations are only starting points when application-specific data is not available, and may not be appropriate for your application. It is very important to benchmark and load test your application to verify its exact CPU requirements.
Given a choice between faster CPU cores and more CPU cores, consider the following:
The more processes your application uses, the greater the benefit of raising the core count to increase concurrency and overall throughput.
The fewer processes your application uses, the greater the benefit of the fastest possible cores.
For example, an application with a great many users concurrently running simple queries will benefit from a higher core count, while one with relatively fewer users executing compute-intensive queries would benefit from faster but fewer cores. In theory, both applications would benefit from many fast cores, assuming there is no resource contention when multiple processes are running in all those cores simultaneously. As noted in Calculating Initial Memory Requirements
, the number of processor cores is a factor in estimating the memory to provision for a server, so increasing the core count may require additional memory.
Production systems are sized based on benchmarks and measurements at live customer sites. Virtualization using shared storage adds very little CPU overhead compared to bare metal, so it is valid to size virtual CPU requirements from bare metal monitoring.
For hyper-converged infrastructure (HCI) deployments, add 10% to your estimated host-level CPU requirements to cover the overhead of HCI storage agents or appliances.
In determining the best core count for individual VMs, strike a balance between the number of hosts required for availability and minimizing costs and host management overhead; by increasing core counts, you may be able to satisfy the former requirement without violating the latter.
The following best practices should be applied to virtual CPU allocation:
Production systems, especially database servers, are assumed to be highly utilized and should therefore be initially sized based on assumed equivalence between a physical CPU and its virtual counterpart. If you need six physical CPUs, assume you need six virtual CPUs.
Do not allocate more vCPUs than required to optimize performance. Although large numbers of vCPUs can be allocated to a virtual machine, there can be a (usually small) performance overhead for managing unused vCPUs. The key here is to monitor your systems regularly to ensure that vCPUs are correctly allocated.
When you upgrade by adding CPU cores, an InterSystems IRIS feature called parallel query execution helps you take the most effective advantage of the increased capacity.
Parallel Query Execution
Parallel query execution is built on a flexible infrastructure for maximizing CPU usage that spawns one process per CPU core, and is most effective with large data volumes, such as analytical workloads that make large aggregations.
For more information on parallel query processing, see Parallel Query Processing
in the “Optimizing Query Performance “ chapter of the SQL Optimization Guide
The following information may be helpful in improving the performance of your InterSystems IRIS deployment.
In most situations, the use of Intel Hyper-Threading or AMD Simultaneous Multithreading (SMT) is recommended for improved performance, either within a physical server or at the hypervisor layer in virtualized environments. There may be situations in a virtualized environment in which disabling Hyper-Threading or SMT is warranted; however, those are exceptional cases specific to a given application.
In the case of IBM AIX®, IBM Power processors offer multiple levels of SMT at 2, 4, and 8 threads per core. With the latest IBM Power9 processors, SMT-8 is the level most commonly used with InterSystems IRIS. There may be cases, however, especially with previous generation Power7 and Power8 processors, in which SMT-2 or SMT-4 is more appropriate for a given application. Benchmarking the application is the best approach to determining the ideal SMT level for a specific deployment.
By default, InterSystems IRIS allocates the minimum number of semaphore sets by maximizing the number of semaphores per set (see Semaphores in InterSystems Products
). However, this is some evidence that this is not ideal for performance on Linux systems with non-uniform memory access (NUMA) architecture.
To address this, the semsperset
parameter in the iris.cpf
file can be used to specify a lower number of semaphores per set. By default, semsperset is set to 0, which specifies the default behavior. Determining the most favorable setting will likely require some experimentation; if you have InterSystems IRIS deployed on a Linux/NUMA system, InterSystems recommends that you try an initial value of 250.
Content Date/Time: 2019-08-23 06:48:00