Deploy Compute Nodes for Workload Separation and Increased Query Throughput

For advanced use cases in which extremely low query latencies are required, potentially at odds with a constant influx of data, compute nodes can be added to provide a transparent caching layer for servicing queries. Each compute node caches the sharded data on the data node it is associated with, as well as nonsharded data when necessary. When a cluster includes compute nodes, read-only queries are automatically executed in parallel on the compute nodes, rather than on the data nodes; all write operations (insert, update, delete, and DDL operations) continue to be executed on the data nodes. This division of labor separates the query and data ingestion workloads while maintaining the advantages of parallel processing and distributed caching, improving the performance of both. Assigning multiple compute nodes per data node can further improve the query throughput and performance of the cluster.

When compute nodes are added to a cluster, they are automatically distributed as evenly as possible across the data nodes. Adding compute nodes yields significant performance improvement only when there is at least one compute node per data node; since a cluster is only as fast as its slowest data node, the most efficient use of resources, in general, is to assign the same number of compute nodes to each data node. Because compute nodes support query execution only and do not store any data, their hardware profile can be tailored to suit those needs, for example by emphasizing memory and CPU and keeping storage to the bare minimum.

Sharded cluster with compute nodes

8 compute nodes manage query traffic for the 4 data nodes in the cluster, which handle data operations separately

For information about planning compute nodes and load balancing application connections to clusters with compute nodes, see Plan Compute Nodes.

You can add compute nodes to a sharded cluster using any of the deployment methods described in Deploying the Sharded Cluster. The automatic deployment methods described there all include compute nodes as an option. This section provides additional manual instructions for deploying compute nodes. First, manually deploy and configure the sharded cluster using the steps described in that section, then complete deployment with the steps described here, as follows:

Plan data nodes
Estimate the database cache and database sizes
Provision or identify the infrastructure

Note:

Include hosts for your planned compute nodes along with those for the planned data nodes. For example, if you plan calls for eight data nodes and eight compute nodes, your cluster requires 16 hosts. All nodes in a sharded cluster should have identical or at least closely comparable specifications and resources, with the exception of storage for compute nodes, which do not use any in their sharded cluster role.
Deploy InterSystems IRIS on the data node hosts
Configure the data nodes using either the Management Portal or the %SYSTEM.Cluster API.
Add compute nodes using either the Management Portal or the %SYSTEM.Cluster API, as described in the following sections.

Configure or Deploy Compute Nodes Using the Management Portal

When you add a compute node to a cluster, it is assigned to a data node that previously had the minimum number of associated compute nodes, so as to automatically balance compute nodes across the data nodes. The procedure is the same regardless of whether the cluster is mirrored.

For information about opening the Management Portal in your browser, see the instructions for an instance deployed in a container or one installed from a kit.

To add an instance on a networked system to your cluster as a compute node, use the following steps.

Open the Management Portal for the instance, select System Administration > Configuration > System Configuration > Sharding > Enable Sharding, and on the dialog that displays, click OK. (The value of the Maximum Number of ECP Connections setting need not be changed as the default is appropriate for virtually all clusters.)
Restart the instance. (There is no need to close the browser window or tab containing the Management Portal; you can simply reload it after the instance has fully restarted.)
Navigate to the Configure Node-Level page (System Administration > Configuration > System Configuration > Sharding > Configure Node-Level) and click the Configure button.
On the CONFIGURE NODE-LEVEL CLUSTER dialog, select Add this instance to an existing sharded cluster and respond to the prompts that display as follows:
- Enter the cluster URL, which is the address displayed for any node on the Shards tab of the Configure Node-Level page on an instance that already belongs to the cluster, as described in Configure the remaining data nodes.
  
  Note:
  
  If the cluster is mirrored, enter the address of a primary data node or a compute node, but not a backup data node.
- Select compute at the Role prompt to configure the instance as a data node.
- In some cases, the hostname known to InterSystems IRIS does not resolve to an appropriate address, or no hostname is available. If for this or any other reason, you want other cluster nodes to communicate with this node using its IP address instead, enter the IP address at the Override hostname prompt.
- The Mirrored cluster check box is not available, because configuration of compute nodes is the same regardless of mirroring (except for the cluster URL provided, as above.)
Click OK to return to the Configure Node-Level page, which now includes two tabs, Shards and Sharded Tables. The data and compute nodes you have configured so far are listed under Shards, starting with node 1, and showing which data node each compute node is assigned to.
Click Verify Shards to verify that the compute node is correctly configured and can communicate with the others.

Note:

If you have many compute nodes to configure, you can make the verification operation automatic by clicking the Advanced Settings button and selecting Automatically verify shards on assignment on the ADVANCED SETTINGS dialog. (Other settings in this dialog should be left at the defaults when you deploy a sharded cluster.)

Deploy Compute Nodes Using the %SYSTEM.Cluster API

To add an instance on a networked system to your cluster as a compute node, open the InterSystems Terminal for the instance and call the $SYSTEM.Cluster.AttachAsComputeNode()Opens in a new tab method specifying the hostname of an existing cluster node and the superserver port of its InterSystems IRIS instance, for example:

set status = $SYSTEM.Cluster.AttachAsComputeNode("IRIS://datanode2:1972")

Note:

To see the return value (for example, 1 for success) for the each API call detailed in these instructions, enter:

zw status

If a call does not succeed, display the user-friendly error message by entering:

do $SYSTEM.Status.DisplayError(status)

If you provided the IP address of the template node when configuring it (see Configure node 1), use the IP address instead of the hostname.

set status = $SYSTEM.Cluster.AttachAsComputeNode("IRIS://100.00.0.01:1972")

If you want other nodes to communicate with this one using its IP address, specify the IP address as the second argument.

Note:

From the perspective of another node (which is what you need in this procedure), the superserver port of a containerized InterSystems IRIS instance depends on which host port the superserver port was published or exposed as when the container was created. For details on and examples of this, see Running an InterSystems IRIS Container with Durable %SYS and Running an InterSystems IRIS Container: Docker Compose Example and see Container networkingOpens in a new tab in the Docker documentation.

The default superserver port number of a noncontainerized InterSystems IRIS instance that is the only such on its host is 1972. To see or set the instance’s superserver port number, select System Administration > Configuration > System Configuration > Memory and Startup in the instance’s Management Portal. (For information about opening the Management Portal for the instance and determining the superserver port, see the instructions for an instance deployed in a container or one installed from a kit.)

If the cluster node you identify in the first argument is a data node, it is used as the template; if it is a compute node, the data node to which it is assigned is used as the template. The AttachAsComputeNode() call does the following:

Enables the ECP and sharding services
Associates the new compute node with a data node that previously had the minimum number of associated compute nodes, so as to automatically balance compute nodes across the data nodes.
Creates the cluster namespace, configuring it to match the settings on the template node (specified in the first argument), as described in Configure Node 1, and creating all needed mappings.
Sets all SQL configuration options to match the template node.

If a namespace of the same name as the cluster namespace already exists on the new compute node, it is used as the cluster namespace, and only the mappings are replicated.

If you want other cluster nodes to communicate with this node using its IP address instead of its hostname, supply the IP address as the second argument.

The AttachAsComputeNode() call returns an error if the InterSystems IRIS instance is already a node in a sharded cluster.

When you have configured all of the compute nodes, you can call the $SYSTEM.Cluster.ListNodes()Opens in a new tab method to list them, for example:

set status = $system.Cluster.ListNodes()
NodeId  NodeType    DataNodeId    Host          Port
1       Data                      datanode1     1972
2       Data                      datanode2     1972
3       Data                      datanode3     1972
1001    Compute     1             computenode1  1972
1002    Compute     2             computenode2  1972
1003    Compute     3             computenode3  1972

When compute nodes are deployed, the list indicates the node ID of the data node that each compute node is assigned to. You can also use the $SYSTEM.Cluster.GetMetadata()Opens in a new tab retrieve metadata for the cluster, including the names of the cluster and master namespaces and their default globals databases and settings for the node on which you issue the call.

Install Multiple Data Nodes per Host

Mirror Data Nodes for High Availability