Deploy Compute Nodes for Workload Separation and Increased Query Throughput
Deploy Compute Nodes for Workload Separation and Increased Query Throughput
For advanced use cases in which extremely low query latencies are required, potentially at odds with a constant influx of data, compute nodes can be added to provide a transparent caching layer for servicing queries. Each compute node caches the sharded data on the data node it is associated with, as well as nonsharded data when necessary. When a cluster includes compute nodes, read-only queries are automatically executed in parallel on the compute nodes, rather than on the data nodes; all write operations (insert, update, delete, and DDL operations) continue to be executed on the data nodes. This division of labor separates the query and data ingestion workloads while maintaining the advantages of parallel processing and distributed caching, improving the performance of both. Assigning multiple compute nodes per data node can further improve the query throughput and performance of the cluster.
When compute nodes are added to a cluster, they are automatically distributed as evenly as possible across the data nodes. Adding compute nodes yields significant performance improvement only when there is at least one compute node per data node; since a cluster is only as fast as its slowest data node, the most efficient use of resources, in general, is to assign the same number of compute nodes to each data node. Because compute nodes support query execution only and do not store any data, their hardware profile can be tailored to suit those needs, for example by emphasizing memory and CPU and keeping storage to the bare minimum.
For information about planning compute nodes and load balancing application connections to clusters with compute nodes, see Plan Compute Nodes.
You can add compute nodes to a sharded cluster using any of the deployment methods described in Deploying the Sharded Cluster. The automatic deployment methods described there all include compute nodes as an option. This section provides additional manual instructions for deploying compute nodes. First, manually deploy and configure the sharded cluster using the steps described in that section, then complete deployment with the steps described here, as follows:
Provision or identify the infrastructureNote:
Include hosts for your planned compute nodes along with those for the planned data nodes. For example, if you plan calls for eight data nodes and eight compute nodes, your cluster requires 16 hosts. All nodes in a sharded cluster should have identical or at least closely comparable specifications and resources, with the exception of storage for compute nodes, which do not use any in their sharded cluster role.
Configure the data nodes using either the Management Portal or the %SYSTEM.Cluster API.
Add compute nodes using either the Management Portal or the %SYSTEM.Cluster API, as described in the following sections.
Configure or Deploy Compute Nodes Using the Management Portal
When you add a compute node to a cluster, it is assigned to a data node that previously had the minimum number of associated compute nodes, so as to automatically balance compute nodes across the data nodes. The procedure is the same regardless of whether the cluster is mirrored.
On all systems, you can open the Management Portal by loading in your browser the URL http://host:webserverport/csp/sys/UtilHome.csp, where host is the host identifier and port is the instance’s web server port number, for example http://localhost:52773/csp/sys/UtilHome.csp. (For more detailed information about opening the Management Portal using this URL, see the instructions for an instance deployed in a containerOpens in a new tab or one installed from a kitOpens in a new tab in the InterSystems IRIS Connection InformationOpens in a new tab section of InterSystems IRIS Basics: Connecting an IDE.) On a Windows system, you can also open the Management Portal by clicking the InterSystems IRIS icon in the system tray and selecting Management Portal.
To add an instance on a networked system to your cluster as a compute node, use the following steps.
Open the Management Portal for the instance, select System Administration > Configuration > System Configuration > Sharding > Enable Sharding, and on the dialog that displays, click OK. (The value of the Maximum Number of ECP Connections setting need not be changed as the default is appropriate for virtually all clusters.)
Restart the instance. (There is no need to close the browser window or tab containing the Management Portal; you can simply reload it after the instance has fully restarted.)
Navigate to the Configure Node-Level page (System Administration > Configuration > System Configuration > Sharding > Configure Node-Level) and click the Configure button.
On the CONFIGURE NODE-LEVEL CLUSTER dialog, select Add this instance to an existing sharded cluster and respond to the prompts that display as follows:
Enter the cluster URL, which is the address displayed for any node on the Shards tab of the Configure Node-Level page on an instance that already belongs to the cluster, as described in the Configure the remaining data nodes step in “Deploy the Cluster Using the Management Portal”.Note:
If the cluster is mirrored, enter the address of a primary data node or a compute node, but not a backup data node.
Select compute at the Role prompt to configure the instance as a data node.
In some cases, the hostname known to InterSystems IRIS does not resolve to an appropriate address, or no hostname is available. If for this or any other reason, you want other cluster nodes to communicate with this node using its IP address instead, enter the IP address at the Override hostname prompt.
The Mirrored cluster checkbox is not available, because configuration of compute nodes is the same regardless of mirroring (except for the cluster URL provided, as above.)
Click OK to return to the Configure Node-Level page, which now includes two tabs, Shards and Sharded Tables. The data and compute nodes you have configured so far are listed under Shards, starting with node 1, and showing which data node each compute node is assigned to.
Click Verify Shards to verify that the compute node is correctly configured and can communicate with the others.Note:
If you have many compute nodes to configure, you can make the verification operation automatic by clicking the Advanced Settings button and selecting Automatically verify shards on assignment on the ADVANCED SETTINGS dialog. (Other settings in this dialog should be left at the defaults when you deploy a sharded cluster.)
Deploy Compute Nodes Using the %SYSTEM.Cluster API
To add an instance on a networked system to your cluster as a compute node, open the InterSystems TerminalOpens in a new tab for the instance and call the $SYSTEM.Cluster.AttachAsComputeNode()Opens in a new tab method specifying the hostname of an existing cluster node and the superserver port of its InterSystems IRIS instance, for example:
set status = $SYSTEM.Cluster.AttachAsComputeNode("IRIS://datanode2:1972")
To see the return value (for example, 1 for success) for the each API call detailed in these instructions, enter:
If a call does not succeed, display the user-friendly error message by entering:
If you provided the IP address of the template node when configuring it (see Configure node 1 in “Deploy the Cluster Using the %SYSTEM.Cluster API”), use the IP address instead of the hostname.
set status = $SYSTEM.Cluster.AttachAsComputeNode("IRIS://100.00.0.01:1972")
If you want other nodes to communicate with this one using its IP address, specify the IP address as the second argument.
From the perspective of another node (which is what you need in this procedure), the superserver port of a containerized InterSystems IRIS instance depends on which host port the superserver port was published or exposed as when the container was created. For details on and examples of this, see Running an InterSystems IRIS Container with Durable %SYSOpens in a new tab and Running an InterSystems IRIS Container: Docker Compose ExampleOpens in a new tab in Running InterSystems Products in Containers and Container networkingOpens in a new tab in the Docker documentation.
The default superserver port number of a kit-installed InterSystems IRIS instance that is the only such on its host is 1972. To see or set the instance’s superserver port number, select System Administration > Configuration > System Configuration > Memory and Startup in the instance’s Management Portal. (For information about opening the Management Portal for the instance, see InterSystems IRIS Connection InformationOpens in a new tab in InterSystems IRIS Basics: Connecting an IDE.)
If the cluster node you identify in the first argument is a data node, it is used as the template; if it is a compute node, the data node to which it is assigned is used as the template. The AttachAsComputeNode() call does the following:
Enables the ECP and sharding services
Associates the new compute node with a data node that previously had the minimum number of associated compute nodes, so as to automatically balance compute nodes across the data nodes.
Creates the cluster namespace, configuring it to match the settings on the template node (specified in the first argument), as described in Configure Node 1 in “Configure the Cluster Using the %SYSTEM.Cluster API”, and creating all needed mappings.
Sets all SQL configuration optionsOpens in a new tab to match the template node.
If a namespace of the same name as the cluster namespace already exists on the new compute node, it is used as the cluster namespace, and only the mappings are replicated.
If you want other cluster nodes to communicate with this node using its IP address instead of its hostname, supply the IP address as the second argument.
The AttachAsComputeNode() call returns an error if the InterSystems IRIS instance is already a node in a sharded cluster.
When you have configured all of the compute nodes, you can call the $SYSTEM.Cluster.ListNodes()Opens in a new tab method to list them, for example:
set status = $system.Cluster.ListNodes() NodeId NodeType DataNodeId Host Port 1 Data datanode1 1972 2 Data datanode2 1972 3 Data datanode3 1972 1001 Compute 1 computenode1 1972 1002 Compute 2 computenode2 1972 1003 Compute 3 computenode3 1972
When compute nodes are deployed, the list indicates the node ID of the data node that each compute node is assigned to. You can also use the $SYSTEM.Cluster.GetMetadata()Opens in a new tab retrieve metadata for the cluster, including the names of the cluster and master namespaces and their default globals databases and settings for the node on which you issue the call.