Skip to main content
Previous section

Horizontally Scaling for Data Volume with Sharding

This chapter describes the deployment and use of an InterSystems IRIS sharded cluster.

Overview of InterSystems IRIS Sharding

Sharding is a significant horizontal scalability feature of InterSystems IRIS data platform. An InterSystems IRIS sharded cluster partitions both data storage and caching across a number of servers, providing flexible, inexpensive performance scaling for queries and data ingestion while maximizing infrastructure value through highly efficient resource utilization. Sharding is easily combined with the considerable vertical scaling capabilities of InterSystems IRIS, greatly widening the range of workloads for which InterSystems IRIS can provide solutions.

Note:

For a brief introduction to sharding that includes a hands-on exploration of deploying and using a sharded cluster, see First Look: Scaling for Data Volume with an InterSystems IRIS Sharded Cluster.

Elements of Sharding

Horizontally scaling InterSystems IRIS through sharding can benefit a wide range of applications, but provides the greatest gains in use cases involving one or both of the following:

  • Large amounts of data retrieved from disk, complex processing of data, or both, for example as in analytic workloads

  • High-volume, high-velocity data ingestion

Sharding horizontally partitions large database tables and their associated indexes across multiple InterSystems IRIS instances, called data nodes, while allowing applications to access these tables through any one of those instances. Together, the data nodes form a sharded cluster. This architecture provides the following advantages:

  • Queries against a sharded table are run in parallel on all of the data nodes, with the results merged, aggregated, and returned as full query results to the application.

  • Because the data partitions are hosted by separate instances, each has a dedicated cache to serve its own partition of the data set, rather than a single instance’s cache serving the entire data set.

With sharding, the performance of queries against large tables is no longer constrained by the resources of a single system. By distributing both query processing and caching across multiple systems, sharding provides near-linear scaling of both compute and memory resources, allowing you to design and maintain a cluster tailored to your workload. When you scale out by adding data nodes, sharded data can be rebalanced across the cluster. The distributed data layout can be further exploited for parallel data loading and with third party frameworks like Apache Spark.

A shard is a subset of a table's rows, with each row contained within exactly one shard, and all shards of a table containing roughly the same number of rows. Each data node hosts a data shard, which is comprised of one shard of each sharded table on the cluster. A federated software component called the sharding manager keeps track of which shards (and therefore which table rows) are located on which data nodes and directs queries accordingly, as well as managing other sharded operations. Each table is automatically horizontally partitioned across the data nodes by using one of its fields as a shard key, which provides a deterministic method of distributing data evenly. A shard key is typically the table’s RowID (the default), but can also be a user-defined field or set of fields.

While sharded data is physically partitioned across the data nodes, it is all logically visible from on any data node (as are nonsharded data, metadata, and code). Each data node has a cluster namespace (identically named across the cluster) that provides transparent access to all data and code on the cluster; applications can connect to any node's cluster namespace and experience the full dataset as if it were local. Application connections can therefore be load balanced across all of the data nodes in the cluster to take greatest advantage of parallel query processing and partitioned caching.

Basic sharded cluster
generated description: data nodes

Nonsharded data is stored only on the first data node configured (called data node 1, or just node 1). This distinction is transparent to the user except for the fact that more data is stored on node 1, but this difference is typically small. From the perspective of the application SQL, the distinction between sharded and nonsharded tables is transparent.

InterSystems IRIS mirroring can be used to provide high availability for the data nodes in a sharded cluster; a mirrored failover pair of InterSystems IRIS instances can be added to a cluster as easily as a single instance. For more information on deploying a mirrored sharded cluster, see Mirror Data Nodes for High Availability.

For advanced use cases in which extremely low query latencies are required, potentially at odds with a constant influx of data, compute nodes can be added to provide a transparent caching layer for servicing queries, separating the query and data ingestion workloads and improving the performance of both. Assigning multiple compute nodes per data node can further improve the cluster’s query throughput. For more information about compute nodes and instructions for deploying them, see Deploy Compute Nodes for Workload Separation and Increased Query Throughput.

Evaluating the Benefits of Sharding

InterSystems IRIS sharding can benefit a wide range of applications, but provides the greatest gains in use cases involving the following:

  • Relatively large data sets, queries that return large amounts of data, or both.

    Sharding scales caching capacity to match data size by partitioning the cache along with the data, leveraging the memory resources of multiple systems. Each data node dedicates its database cache (global buffer pool) to a fraction of the data set, as compared to a single instance’s database cache being available for all of the data. The resulting improvement becomes most evident when the data in regular use is too big to fit in the database cache of a single nonsharded instance.

  • A high volume of complex queries doing large amounts of data processing.

    Sharding scales query processing throughput by decomposing queries and executing them in parallel across multiple data nodes, leveraging the computing resources of multiple systems. The resulting improvement is most evident when queries against the cluster:

    • Read large amounts of data from persistent storage, and in particular have a high ratio of data retrieved to results returned.

    • Involve significant compute work (including aggregation, grouping, and sorting)

  • High-volume or high-speed data ingestion, or a combination.

    Sharding scales data ingestion through the InterSystems IRIS JDBC driver’s use of direct connections to the data nodes for parallel loading, distributing ingestion across multiple instances. If the data can be assumed to be validated and uniqueness checking omitted, gains are enhanced.

Each of these factors on its own influences the potential gain from sharding, but the benefit may be enhanced where they combine. For example, a combination of large amounts of data ingested quickly, large data sets, and complex queries that retrieve and process a lot of data makes many of today’s analytic workloads very good candidates for sharding.

As previously noted, and discussed in more detail in Planning an InterSystems IRIS Sharded Cluster, combining InterSystems IRIS sharding with the use of vertical scaling to address some of the factors described in the foregoing may be most beneficial under many circumstances.

Note:

In the current release, sharding does not support workloads involving complex transactions requiring atomicity, and a sharded cluster cannot be used for such workloads.

Namespace-level Sharding Architecture

Previous versions of this document described a sharding architecture involving a larger and different set of node types (shard master data server, shard data server, shard master application server, shard query server). This namespace-level architecture remains in place as the transparent foundation of the new node-level architecture, and is fully compatible with it. In the node-level architecture, the cluster namespace (identically named across the cluster) provides transparent access to all sharded and nonsharded data and code on the cluster; the master namespace, now located on the first data node, still provides access to metadata, nonsharded data, and code, but is fully available to all data nodes. This arrangement provides a more uniform and straightforward model that is simpler and more convenient to deploy and use.

The %SYSTEM.Sharding API and the Sharding Configuration page of the Management Portal remain available for use in sharded cluster deployment based on the namespace-level architecture; see Deploying the Namespace-level Architecture for procedures.

Deploying the Sharded Cluster

This section provides procedures for deploying an InterSystems IRIS sharded cluster consisting of data nodes.

Note:

For an important discussion of performance planning, including memory management and scaling, CPU sizing and scaling, and other considerations, see the “Vertical Scaling” chapter of this guide.

HealthShare Health Connect does not support sharding.

There are several ways to deploy a sharded cluster, as follows:

Note:

The most typical sharded cluster configuration involves one InterSystems IRIS instance per node. When deploying using ICM, this configuration is the only option. The provided procedure for using the Sharding API assumes this configuration as well.

InterSystems recommends the use of an LDAP server to implement centralized security across the nodes of a sharded cluster. For information about using LDAP with InterSystems IRIS, see the “Using LDAP” chapter of the Security Administration Guide.

If you are considering deploying compute nodes, the best approach is typically to evaluate the operation of your basic sharded cluster before deciding whether the cluster can benefit from their addition. Compute nodes be easily added to an existing cluster by reprovisioning your ICM deployment, using the %SYSTEM.Cluster API, or configuring or deploying them using CPF settings. For more information on planning and adding compute nodes, see Plan Compute Nodes and Deploy Compute Nodes for Workload Separation and Increased Query Throughput.

A mirrored sharded cluster performs in exactly the same way as a nonmirrored cluster of the same number of data nodes, aside from the added failover capability of each node. If you are interested in deploying a mirrored sharded cluster, see Mirror Data Nodes for High Availability for procedures.

Regardless of the method you use to deploy the cluster, the first steps are to decide how many data nodes are to be included in the cluster and plan the sizes of the database caches and globals databases on those nodes, as listed in the following:

Plan Data Nodes

Depending on the anticipated working set of the sharded data you intend to store on the cluster and the nature of the queries you will run against it, as few as four data nodes may be appropriate for your cluster. Since you can always add data nodes to an existing cluster and rebalance the sharded data (see Add Data Nodes and Rebalance Data), erring on the conservative side is reasonable.

A good basic method for an initial estimate of the ideal number of data nodes needed for a production configuration (subject to resource limitations) is to calculate the total amount of database cache needed for the cluster and then determine which combination of server count and memory per server is optimal in achieving that, given your circumstances and resource availability. This is not unlike the usual sizing process, except that it involves dividing the resources required across multiple systems. (For an important discussion of performance planning, including memory management and scaling, CPU sizing and scaling, and other considerations, see the “Vertical Scaling” chapter of this guide.)

The size of the database cache required starts with your estimation of the total amount of sharded data you anticipate storing on the cluster, and of the amount of nonsharded data on the cluster that will be frequently joined with sharded data. You can then use these totals to estimate the working sets for both sharded data and frequently joined nonsharded data, which added together represent the total database caching capacity needed for all the data nodes in the cluster. This calculation is detailed in Planning an InterSystems IRIS Sharded Cluster.

Considering all your options regarding both number of nodes and memory per node, you can then configure enough data nodes so that the database cache (global buffer pool) on each data node equals, or comes close to equalling, its share of that capacity. Under many scenarios, you will be able to roughly determine the number of data nodes to start with simply by dividing the total cache size required by the memory capacity of the systems you available to deploy as cluster nodes.

All data nodes in a sharded cluster should have identical or at least closely comparable specifications and resources; parallel query processing is only as fast as the slowest data node. In addition, the configuration of all IRIS instances in the cluster should be consistent; database settings such as collation and those SQL settings configured at instance level (default date format, for example) should be the same on all nodes to ensure correct SQL query results. Standardized procedures and tools like ICM can help ensure this consistency.

The general recommended best practice is to load balance application connections across all of the data nodes in a cluster. ICM can automatically provision and configure a load balancer for the data nodes as needed when deploying in a public cloud; if deploying a sharded cluster by other means, a load balancing mechanism is required.

Estimate the Database Cache and Database Sizes

Before deploying your sharded cluster, determine the size of the database cache to be allocated on each data node. It is also useful to know the expected size of the data volume needed for the default globals database on each data node, so you can ensure that there is enough free space for expected growth.

When you deploy a sharded cluster using ICM, you can specify these settings by including properties in the configuration files. When you deploy manually using the Sharding API, you can specify database cache size before configuring the sharded cluster, and specify database settings in your calls. Both deployment methods provide default settings.

Bear in mind that the sizes below are guidelines, not requirements, and that your estimates for these numbers are likely to be adjusted in practice.

Database Cache Sizes

As described in Planning an InterSystems IRIS Sharded Cluster, the amount of memory that should ideally be allocated to the database cache on a data node is that node’s share of the total of the expected sharded data working set, plus the overall expected working set of nonsharded data frequently joined to sharded data.

Important:

Because the default setting is not appropriate for production use, database cache allocation is required after deployment, regardless of configuration.

Globals Database Sizes

As described in Planning an InterSystems IRIS Sharded Cluster, the target sizes of the default globals databases are as follows:

  • For the cluster namespace — Each server’s share of the total size of the sharded data, according to the calculation described in that section, plus a margin for greater than expected growth.

  • For the master namespace on node 1 — The total size of nonsharded data, plus a margin for greater than expected growth.

All deployment methods configure the sizes of these databases by default, so there is no need for you to do so. However, you should ensure that the storage on which these databases are located can accommodate their target sizes.

Deploy the Cluster Using InterSystems Cloud Manager

There are several stages involved in provisioning and deploying a containerized InterSystems IRIS configuration, including a sharded cluster, with ICM. The ICM Guide provides complete documentation of ICM, including details of each of the stages. This section briefly reviews the stages and provides links to the ICM Guide.

Launch ICM

ICM is provided as a Docker image. With the exception of InterSystems IRIS licenses and security-related files as described, everything required by ICM to carry out its provisioning, deployment, and management tasks is included in the ICM container, including a /Samples directory that provides you with samples of the elements required by ICM, customized to the supported cloud providers. To launch ICM, on a system on which Docker is installed, you use the docker run command with the ICM image from the InterSystems repository to start the ICM container.

For detailed information about launching ICM, see Launch ICM in the “Using ICM” chapter of the ICM Guide.

Obtain Security-Related Files

Before defining your deployment, you must obtain security-related files including cloud provider credentials and keys for SSH and TLS. For more information about these files and how to obtain them, see Obtain Security-Related Files in the “Using ICM” chapter.

Define the Deployment

ICM uses JSON files as input. To provide the needed parameters to ICM, you must represent your target configuration and the platform on which it is to be deployed in two of ICM’s JSON configuration files: the defaults.json file, which contains information about the entire deployment, and the definitions.json file, which contains information about the types and numbers of the nodes provisioned and deployed by ICM, as well as details specific to each node type. For example, the defaults file determines which cloud provider your sharded cluster nodes are provisioned on and the locations of the required security files and InterSystems IRIS license keys, while the definitions file determines how many data nodes and compute nodes are included in the sharded cluster and the specifications of their hosts. Most ICM parameters have defaults; a limited number of parameters can be specified on the ICM command line as well as in the configuration file.

For sample defaults and definitions files for sharded cluster deployment, see Define the Deployment in the “Using ICM” chapter of the ICM Guide. You can create your files by adapting the template defaults.json and definitions.json files provided with ICM in the /Samples directory (for example, /Samples/AWS for AWS deployments), or start with the contents of the samples provided in the documentation. For a complete list of the fields you can include in these files, see ICM Configuration Parameters in the “ICM Reference” chapter of the ICM Guide.

For a complete list of the fields you can include in these files, see ICM Configuration Parameters in the “ICM Reference” chapter of the ICM Guide.

Note:

All InterSystems IRIS instances in a sharded cluster must have sharding licenses.

ICM includes the node types DATA and COMPUTE for provisioning and deploying a cluster’s data and compute nodes along with the AR node type for arbiters for mirrored clusters, as well as such types as WS (web server) and LB (load balancer) for associated systems. Node types representing nodes in the namespace-level architecture are also included. For detailed descriptions of the node types (for use in the Role field in the definitions file) that ICM can provision, configure, and deploy services on, see ICM Node Types in the “ICM Reference” chapter of the ICM Guide.

When creating your configuration files, bear in mind that they must represent not only the number of data nodes you want to include but their database cache and database sizes, which you determined in Estimate the Database Cache and Database Sizes. This can be accomplished as follows:

  • Database cache size — Every InterSystems IRIS instance, including those running in the containers deployed by ICM, is installed with a predetermined set of configuration settings, recorded in its configuration parameters file (CPF). The UserCPF field specifies a CPF merge file containing one or more of these configuration settings; you can customize all of the InterSystems IRIS instances you deploy by including it in your defaults file. You can also customize settings for a specific node type by including it in the node definition. For example, to allocate a database cache of 150 GB of 8-kilobyte blocks on each data node, you would customize the value of the globals CPF setting by including UserCPF in the DATA node definition to specify a CPF merge file containing the following:

    [config]
    globals=0,0,150000,0,0,0
    Copy code to clipboard

    A sample CPF merge file, /Samples/cpf/iris.cpf, is included in the ICM container, and all of the sample defaults.json files contain the UserCPF property, specifying this file. For information about using a merge file to override CPF settings, see Deploying with Customized InterSystems IRIS Configurations in the “ICM Reference” chapter of the ICM Guide.

    Of course, the cloud nodes you provision as data nodes must have sufficient memory to accommodate the target database cache size. The instance type you specify in the defaults file, or in the DATA node definition in the definitions file, determines the characteristics of the provisioned cloud nodes that become data nodes, including memory. The name of this field is different for each cloud provider; the equivalent fields for the four cloud providers are shown in the table that follows (see Provider-Specific Parameters in the ICM Guide for more information).

    Provider Field For information see
    AWS InstanceType Amazon EC2 Instance Types in the AWS documentation
    GCP MachineType Machine types in the GCP documentation
    Azure Size see in the Azure documentation. Sizes for virtual machines in Azure in the Azure documentation
    Tencent InstanceType Instance Types in the Tencent documentation
    vSphere Memory Provider-Specific Parameters in the ICM Guide
    Important:

    The larger a cloud instance type or storage volume is, the more it costs. It is therefore advisable to size as accurately as possible, without wasting capacity.

  • Database size — The DataVolumeSize property (see General Parameters in the “ICM Reference” chapter of the ICM Guide) determines the size of the deployed InterSystems IRIS instance’s storage volume for data, which is where the default globals databases for the master and shard namespaces are located. This setting must be large enough to accommodate the target size of the default globals database, as described in Estimate the Database Cache and Database Sizes. In the case of AWS and GCP, this setting is limited by the field DataVolumeType (see Provider-Specific Parameters in the ICM Guide).

    Note:

    In some cases, it may be advisable to increase the size of the generic memory heap on the cluster members, which can be done by using a CPF merge file to override the gmheap setting. For information about allocating memory to the generic memory heap, see in the Configuration Parameter File Reference.

It is important is that you determine their values for the database cache and globals database sizes based on your particular situation, and include them as needed. In general, appropriate sizing of the data nodes in a sharded cluster and configuration of the InterSystems IRIS instances on them is a complex matter influenced by many factors, including experience; as your experience accumulates, you are likely to modify some of these settings.

Provision the Infrastructure

When your definitions files are complete, begin the provisioning phase by issuing the command icm provision on the ICM command line. This command allocates and configures the nodes specified in the definitions file. At completion, ICM also provides a summary of the nodes and associated components that have been provisioned, and outputs a command line which can be used to delete the infrastructure at a later date, for example:

Machine              IP Address      DNS Name                      
-------              ---------       -------                      
ACME-DATA-TEST-0001  00.53.183.209   ec2-00-53-183-209.us-west-1.compute.amazonaws.com
ACME-DATA-TEST-0002  00.56.59.42     ec2-00-56-59-42.us-west-1.compute.amazonaws.com
ACME-DATA-TEST-0003  00.67.1.11      ec2-00-67-1-11.us-west-1.compute.amazonaws.com
ACME-DATA-TEST-0004  00.193.117.217  ec2-00-193-117-217.us-west-1.compute.amazonaws.com
ACME-LB-TEST-0000    (virtual DATA)  ACME-LB-TEST-1546467861.amazonaws.com
To destroy: icm unprovision [-cleanUp] [-force]
Copy code to clipboard

Once your infrastructure is provisioned, you can use several infrastructure management commands. For detailed information about these and the icm provision command, including reprovisioning an existing configuration to scale out or in or to modify the nodes, see Provision the Infrastructure in the “Using ICM” chapter of the ICM Guide.

Deploy and Manage Services

ICM carries out deployment of InterSystems IRIS and other software services using Docker images, which it runs as containers by making calls to Docker. In addition to Docker, ICM also carries out some InterSystems IRIS-specific configuration over JDBC. There are many container management tools available that can be used to extend ICM’s deployment and management capabilities.

The icm run command downloads, creates, and starts the specified container on the provisioned nodes. The icm run command has a number of useful options, and also lets you specify Docker options to be included, so there are many versions on the command line depending on your needs. Here are just two examples:

  • When deploying InterSystems IRIS images, you must set the password for the predefined accounts on the deployed instances. The simplest way to do this is to omit a password specification from both the definitions files and the command line, which causes ICM to prompt you for the password (with typing masked) when you execute icm run. But this may not be possible in some situations, such as when running ICM commands with a script, in which case you need either the -iscPassword command line option or the iscPassword field in the defaults file.

  • You can deploy different containers on different nodes — for example, InterSystems IRIS on the DM and AM nodes and the InterSystems Web Gateway on the WS nodes — by specifying different values for the DockerImage field (such as intersystems/iris:stable and intersystems/webgateway:stable) in the different node definitions in the definitions.json file. To deploy multiple containers on a node or nodes, however, you can run the icm run command more than once — the first time to deploy the image(s) specified by the DockerImage field, and subsequent times using the -image and -container options (and possibly the -role or -machine option) to deploy a custom container.

For a full discussion of the use of the icm run command, including redeploying services on an existing configuration, see The icm run Command in the “Using ICM” chapter of the ICM Guide.

At deployment completion, ICM sends a link to the appropriate node’s Management Portal, for example:

Management Portal available at: http://ec2-00-153-49-109.us-west-1.compute.amazonaws.com:52773/csp/sys/UtilHome.csp 

In the case of a sharded cluster, the provided link is for node 1.

Once your containers are deployed, you can use a number of ICM commands to manage the deployed containers and interact with the containers and the InterSystems IRIS instances and other services running inside them; for more information, see Container Management Commands and Service Management Commands in the “Using ICM” chapter of the ICM Guide.

Unprovision the Infrastructure

Because public cloud platform instances continually generate charges and unused instances in private clouds consume resources to no purpose, it is important to unprovision infrastructure in a timely manner. The icm unprovision command deallocates the provisioned infrastructure based on the state files created during provisioning. As described in Provision the Infrastructure, the needed command line is provided when the provisioning phase is complete, and is also contained in the ICM log file, for example:

To destroy: icm unprovision [-cleanUp] [-force]
Copy code to clipboard

For more detailed information about the unprovisioning phase, see Unprovision the Infrastructure in the “Using ICM” chapter of the ICM Guide.

Deploy the Cluster Using the %SYSTEM.Cluster API

Use the following procedure to deploy a basic InterSystems IRIS sharded cluster of data nodes using the %SYSTEM.Cluster API. You will probably find it useful to refer the %SYSTEM.Cluster class documentation in the InterSystems Class Reference.

Note:

As with all classes in the %SYSTEM package, the %SYSTEM.Cluster methods are available through $SYSTEM.Cluster.

This procedure assumes you are deploying the InterSystems IRIS instances on the nodes before you begin configuring the cluster, but can be adapted for use with existing instances.

Note:

This procedure does not cover the deployment of mirrored data nodes; that procedure is provided in Mirror for High Availability. Similarly, see Deploy Compute Nodes for information about using the API to add compute nodes to a basic cluster.

Provision or Identify the Infrastructure

Provision or identify the needed number of networked host systems (physical, virtual, or cloud) — one host for each data node.

This procedure assumes that the data nodes are mutually accessible through TCP/IP. A minimum network bandwidth of 1 GB between all nodes is recommended, but 10 GB or more is preferred, if available; greater network throughput increases the performance of the sharded cluster.

Deploy InterSystems IRIS on the Data Nodes

This procedure assumes that each system hosts or will host a single InterSystems IRIS instance. All data nodes in a sharded cluster should have identical or at least closely comparable specifications and resources. (The same is true of compute nodes, although storage is not a consideration in their case.)

Important:

All InterSystems IRIS instances in a sharded cluster must be of the same version, and all must have sharding licenses.

All instances should have their database directories and journal directories located on separate storage devices, if possible. This is particularly important when high volume data ingestion is concurrent with running queries. For guidelines for file system and storage configuration, including journal storage, see “File System Recommendations” and “Storage Recommendations” in the “Preparing to Install” chapter of the Installation Guide and Journaling Best Practices in the “Journaling” chapter of the Data Integrity Guide.

On each host system, do the following:

  1. Deploy an instance of InterSystems IRIS, either by creating a container from an InterSystems-provided image (as described in Running InterSystems Products in Containers) or by installing InterSystems IRIS from a kit (as described in the Installation Guide).

  2. Ensure that the storage device hosting the instance’s databases is large enough to accommodate the target globals database size, as described in Estimate the Database Cache and Database Sizes.

  3. Allocate the database cache (global buffer pool) for the instance according to the size you determined in Estimate the Database Cache and Database Sizes. For the Management Portal procedure for allocating the database cache, see Memory and Startup Settings in the “Configuring InterSystems IRIS” chapter of the System Administration Guide; you can also allocate the cache using the globals parameter, either by editing the instance’s configuration parameter file (CPF) or, on UNIX® and Linux platforms, deploying the instance with the desired value using a CPF merge file.

    Important:

    The default automatic database cache setting is not appropriate for sharded cluster configurations.

    In some cases, it may also be advisable to increase the size of the generic memory heap on the cluster members. The generic memory heap can be configured either using the Management Portal, as described for the gmheap parameter in the Configuration Parameter File Reference, or in the instance’s CPF file using gmheap, as described above for globals.

    Note:

    For general guidelines for estimating the memory required for an InterSystems IRIS instance’s routine and database caches as well as the generic memory heap, see Calculating Initial Memory Requirements in the “Vertical Scaling” chapter.

  4. Ensure that the instance has both the MaxServerConn and MaxServers parameters set to values at least as great as the planned number of nodes in the sharded cluster. These values can be viewed and changed using the Maximum number of application servers and Maximum number of data servers settings on the ECP Settings page of the Management Portal (System Administration > Configuration > Connectivity > ECP Settings), or by editing the instance’s configuration parameter file (CPF). On UNIX® and Linux platforms, the instances can be deployed or installed with the desired values using a CPF merge file.

Important:

If you change any of the parameters described in the above procedure by editing an instance’s CPF, you must restart the instance after doing so.

Configure the Data Nodes

For each instance in the cluster and its host, perform the steps described for its role within the cluster.

Note:

Remember that the calls described in this procedure do not work with mirrored nodes; for information on deploying mirrored data nodes, see Mirror for High Availability.

Configure Node 1

A sharded cluster is initialized when you configure the first data node, which is referred to as data node 1 (or just node 1). This data node differs from the others in that it stores the cluster’s nonsharded data, metadata, and code, and hosts the master namespace that provides all of the data nodes with access to that data. This distinction is completely transparent to the user except for the fact that more data is stored on the first data node, a difference that is typically small.

To configure node 1, open the InterSystems Terminal for the instance and call the $SYSTEM.Cluster.Initialize() method, for example:

set status = $SYSTEM.Cluster.Initialize()
Copy code to clipboard
Note:

To see the return value (for example, 1 for success) for the each API call detailed in these instructions, enter:

zw status
Copy code to clipboard

Reviewing status after each call is a good general practice, as a call might fail silently under some circumstances. If a call does not succeed (status is not 1), display the user-friendly error message by entering:

do $SYSTEM.Status.DisplayError(status) 
Copy code to clipboard

The Initialize() call creates the master and cluster namespaces (IRISDM and IRISCLUSTER, respectively) and their default globals databases, and adds the needed mappings. Node 1 serves as a template for the rest of the cluster; the name of the cluster namespace, the characteristics of its default globals database (also called the shard database), and its mappings are directly replicated on the second data node you configure, and then directly or indirectly on all other data nodes. The SQL configuration settings of the instance are replicated as well.

To control the names of the cluster and master namespaces and the characteristics of their globals databases, you can specify existing namespaces as the cluster namespace, master namespace, or both by including one or both names as arguments. For example:

set status = $SYSTEM.Cluster.Initialize("CLUSTER","MASTER",,)
Copy code to clipboard

When you do this, the existing default globals database of each namespace you specify remains in place. This allows you to control the characteristics of the shard database, which are then replicated on other data nodes in the cluster.

By default, any host can become a cluster node; the third argument to Initialize() lets you specify which hosts can join the cluster by providing a comma-separated list of IP addresses or hostnames. Any node not in the list cannot join the cluster.

In some cases, the hostname known to InterSystems IRIS does not resolve to an appropriate address, or no hostname is available. If for this or any other reason, you want other cluster nodes to communicate with this node using its IP address instead, include the IP address as the fourth argument. (You cannot supply a hostname as this argument, only an IP address.) In either case, you will use the host identifier (hostname or IP address) to identify node 1 when configuring the second data node; you will also need the superserver (TCP) port of the instance.

Note:

From the perspective of another node (which is what you need in this procedure), the superserver port of a containerized InterSystems IRIS instance depends on which host port the superserver port was published or exposed as when the container was created. For details on and examples of this, see Running an InterSystems IRIS Container with Durable %SYS and Running an InterSystems IRIS Container: Docker Compose Example in Running InterSystems Products in Containers and Container networking in the Docker documentation.

The default superserver port number of a kit-installed InterSystems IRIS instance that is the only such on its host is 51773. To see or set the instance’s superserver port number, select System Administration > Configuration > System Configuration > Memory and Startup in the instance’s Management Portal. (For information about opening the Management Portal for the instance, see InterSystems IRIS Connection Information in InterSystems IRIS Basics: Connecting an IDE.)

The Initialize() method returns an error if the InterSystems IRIS instance is already a node in a sharded cluster, or is a mirror member.

Configure the Remaining Data Nodes

To configure each additional data node, open the Terminal for the InterSystems IRIS instance and call the $SYSTEM.Cluster.AttachAsDataNode() method, specifying the hostname of an existing cluster node (node 1, if you are configuring the second node) and the superserver port of its InterSystems IRIS instance, for example:

set status = $SYSTEM.Cluster.AttachAsDataNode("IRIS://datanode1:51773")
Copy code to clipboard

If you supplied an IP address as the fourth argument to Initialize() when initializing node 1, use the IP address instead of the hostname to identify node 1 in the first argument, for example:

set status = $SYSTEM.Cluster.AttachAsDataNode("IRIS://100.00.0.01:51773")
Copy code to clipboard
Note:

For important information about determining the correct superserver port to specify, see the previous step, Configure Node 1.

The AttachAsDataNode() call does the following:

  • Creates the cluster namespace and shard database, configuring them to match the settings on the template node (specified in the first argument), as described in Configure Node 1, and creating the needed mappings, including those to the globals and routines databases of the master namespace on node 1 (including any user-defined mappings).

  • Sets all SQL configuration options to match the template node.

  • Because this node may later be used as the template node for AttachAsDataNode(), sets the list of hosts eligible to join the cluster to those you specified (if any) in the Initialize() call on node 1.

Note:

If a namespace of the same name as the cluster namespace on the template node exists on the new data node, it and its globals database are used as the cluster namespace and shard database, and only the mappings are replicated. If the new node is subsequently used as the template node, the characteristics of these existing elements are replicated.

The AttachAsDataNode() call returns an error if the InterSystems IRIS instance is already a node in a sharded cluster or is a mirror member, or if the template node specified in the first argument is a mirror member.

As noted in the previous step, the hostname known to InterSystems IRIS may not resolve to an appropriate address, or no hostname is available. To have other cluster nodes communicate with this node using its IP address instead, include the IP address as the second argument. (You cannot supply a hostname as this argument, only an IP address.)

When you have configured all of the data nodes, you can call the $SYSTEM.Cluster.ListNodes() method to list them, for example:

set status = $system.Cluster.ListNodes()
NodeId  NodeType    Host          Port
1       Data        datanode1     51773
2       Data        datanode2     51773
3       Data        datanode3     51773
Copy code to clipboard

As shown, data nodes are assigned numeric IDs representing the order in which they are attached to the cluster.

The recommended best practice is to load balance application connections across all of the data nodes in a cluster.

For information about adding compute nodes to your cluster, see Deploy Compute Nodes for Workload Separation and Increased Query Throughput.

Configure or Deploy the Cluster Using CPF Settings

Every InterSystems IRIS instance is installed and operates with a file in the installation directory named iris.cpf, which contains most of its configuration settings. The instance reads this configuration parameter file, or CPF, at startup to obtain the values for these settings. One of the tasks you can accomplish by modifying the settings in an instance’s CPF is to configure it as a member of a sharded cluster.

There are two ways to use CPF settings to create a sharded cluster, as follows

  • Configure existing instances by manually modifying their CPFs

    Assuming they are installed on an appropriately networked group of hosts, you can configure a group of existing instances as a sharded cluster by modifying their iris.cpf files to set the needed parameters, and then restarting them so that the new settings can take effect. As described in Configure Node 1 in “Deploy the Cluster Using the %SYSTEM.Cluster API”, node 1 must be separately configured before other data nodes, so you must modify the node 1 instance’s CPF and restart it before the remaining data nodes.

  • Deploy new instances with customized CPFs

    On UNIX® and Linux platforms, you can deploy a cluster by customizing the CPF of each instance before it starts and reads its settings for the first time by using the CPF merge feature. You do this by using the ISC_CPF_MERGE_FILE environment variable to specify a separate file containing one or more settings to be merged into the CPF with which a new instance is installed or deployed; this allows you to deploy multiple instances with differing CPFs from the same source without having to manually modify each CPF individually, supporting automated deployment and a DevOps approach. Use of this feature is described in the following documentation:

    Because the instance with the CPF merge file configuring it as data node 1 must be running before the other data nodes can be configured, you must ensure that this instance is deployed and successfully started before other instances are deployed as the remaining data nodes.

The procedure provided here can be used either to configure existing instances as a cluster or deploy new instances. The CPF settings named in the following steps are linked to their respective entries in the Configuration Parameter File Reference. The steps involved are as follows:

  1. Provision or identify the infrastructure

  2. Configure or deploy data node 1

  3. Configure or deploy the remaining data nodes

If the names of the hosts of the cluster nodes match or will match a predictable pattern, you can also configure or deploy the cluster using a hostname pattern.

Note:

This procedure does not cover the deployment of mirrored data nodes; instructions for this are provided in Mirror for High Availability. Similarly, see Deploy Compute Nodes for information about using CPF settings to add compute nodes to a basic cluster.

Provision or Identify the Infrastructure

Provision or identify the needed number of networked host systems (physical, virtual, or cloud) — one host for each data node. This procedure assumes that the data nodes meet the following requirements:

  • They each host or will host a single InterSystems IRIS instance with a sharding-enabled license. All instances must be of the same version.

  • They are mutually accessible through TCP/IP. A minimum network bandwidth of 1 GB between all nodes is recommended, but 10 GB or more is preferred, if available; greater network throughput increases the performance of the sharded cluster.

Configure or Deploy Data Node 1

Data node 1 must be configured or deployed first. The following table includes the CPF settings required, to be modified for an existing instance or included in the CPF merge file for an instance you are deploying. Once the instance has been modified and restarted or deployed, you can verify that the settings are as desired by viewing the iris.cpf file.

CPF settings for data node 1
Section Setting Description Value for data node 1
[Startup] ShardRole Determines the node’s role in the cluster. NODE1
[config] MaxServerConn Sets the maximum number of concurrent connections from ECP clients that an ECP server can accept. Each of these settings must be equal to or greater than the number of nodes in the cluster. Optionally set them higher than the currently planned number of nodes, to allow for adding nodes later without having to modify them.
MaxServers Sets the maximum number of concurrent connections to ECP servers that an ECP client can maintain.
globals Allocates shared memory to the database cache for 8-kilobyte, 16-kilobyte, 32-kilobyte, and 64-kilobyte buffers. Specify the target cache size you determined for data nodes, as described in Estimate the Database Cache and Database Sizes. For example, to allocate a 200 GB database cache in 8-kilobyte buffers only, the value would be 0,0,204800,0,0,0.
gmheap Optionally configures the size of the generic memory heap. You probably need to increase this setting from the default of 37.5 MB to optimize the performance of the data nodes. For information about sizing the generic memory heap, see Calculating Initial Memory Requirements in the “Vertically Scaling InterSystems IRIS” chapter .
Note:

The default automatic database cache ( globals) setting is not appropriate for sharded cluster configurations.

Configure an existing instance as data node 1

To configure an existing instance as node 1, edit and modify the install-dir/iris.cpf file to insert the [Startup]setting and update the [config] settings as shown below, then restart the instance.

[ConfigFile]
Product=IRIS
Version=2020.1

[Databases]
...
[Startup] (insert the following settings]
ShardRole=NODE1
...
[config] (modify the following settings)
...
MaxServerConn=64
MaxServers=64
...
globals=0,0,204800,0,0,0
gmheap=393,216‬
Copy code to clipboard
Note:

The globals setting of 0,0,204800,0,0,0 illustrated in the example illustrates an 8–kilobyte buffer database cache of 200 GB.

The gmheap setting of 393,216‬ KB illustrated in the example represents the minimum recommended generic memory heap size, 384 MB, for systems with over 64 GB of memory.

Deploy an instance as data node 1

To deploy node 1 using the CPF merge feature, prepare and specify a CPF merge file with the following contents; the merge file contains only the settings that will be changed in the default CPF before the instance’s initial startup. (The values shown for the [config] settings are examples.)

[Startup]
ShardRole=NODE1
[config]
MaxServerConn=64
MaxServers=64
globals=0,0,204800,0,0,0
gmheap=393,216‬
Copy code to clipboard

Configure or deploy or the remaining data nodes

Once node 1 has been a) configured and restarted or b) deployed and successfully started, the other data node instances can be configured and restarted or deployed in any order. For each of the remaining data nodes, use the settings indicated in Deploy or configure data node 1, with two modifications:

  • Set ShardRole to DATA rather than NODE1.

  • Add ShardClusterURL to the [Startup] section, set to the cluster URL of node 1.

The following table and CPF excerpt explain and illustrate these settings:

CPF settings for remaining data nodes
Section Setting Description Value for remaining data nodes
[Startup] ShardRole (CHANGE) Determines the node’s role in the cluster. DATA
ShardClusterURL (ADD)
Identifies the existing node (typically node 1) to use as a template when adding a data node to the cluster, as described in Configure the remaining data nodes in the %SYSTEM.Cluster API procedure.
Hostname and superserver port of node 1 in the form IRIS://host:port.
[config] MaxServerConn
Sets the maximum number of concurrent connections from ECP clients that an ECP server can accept.
Each of these settings must be equal to or greater than the number of nodes in the cluster. Optionally set them higher than the currently planned number of nodes, to allow for adding nodes later without having to modify them.
MaxServers
Sets the maximum number of concurrent connections to ECP servers that an ECP client can maintain.
globals
Allocates shared memory to the database cache for 8-kilobyte, 16-kilobyte, 32-kilobyte, and 64-kilobyte buffers.
Specify the target cache size you determined for data nodes, as described in Estimate the Database Cache and Database Sizes. For example, to allocate a 200 GB database cache in 8-kilobyte buffers only, the value would be 0,0,204800,0,0,0.
gmheap
Optionally configures the size of the generic memory heap.
The default is 37.5 MB; you probably need to increase this setting from the default to optimize the performance of the data nodes. For information about sizing the generic memory heap, see Calculating Initial Memory Requirements in the “Vertically Scaling InterSystems IRIS” chapter .
[Startup]
ShardRole=DATA
ShardClusterURL=IRIS://datanode1:51773
[config]
MaxServerConn=64
MaxServers=64
globals=0,0,204800,0,0,0
gmheap=393,216‬
Copy code to clipboard

Configure the remaining existing data node instances by updating the indicated settings in their iris.cpf files and restarting them, or deploy the remaining data node instances using a CPF merge file (as illustrated above) to customize the settings. Once the instance has been deployed or modified and restarted, you can verify that the settings are as desired by viewing the iris.cpf file.

Configure or Deploy the Cluster Using a Hostname Pattern

If the names of the existing hosts you want to configure as a sharded cluster match a predictable pattern, or the new hosts you are provisioning for the cluster will do so, you can determine the cluster roles of the instances you configure or deploy based on those names. This approach allows you to use the same CPF modifications or CPF merge file for all instances, although they must still be configured in the right order. To do this, on all nodes, use the settings indicated in Configure or Deploy Data Node 1, with these modifications:

The following table and CPF excerpt explain and illustrate these settings:

CPF Settings when using a hostname pattern
Section Setting Description Value for remaining data nodes
[Startup] ShardRole (CHANGE)
Determines the node’s role in the cluster; when set to AUTO, the node to configure as node 1 is identified by matching its hostname to the regular expression specified by ShardMasterRegexp, while the others are configured as additional data nodes. Cannot be set to AUTO unless ShardMasterRegexp is specified.
AUTO
ShardMasterRegexp (ADD)
When ShardRole=AUTO, identifies the instance to configure as node 1 by matching the name of the instance’s host to the regular expression specified as its value. For example, if you specify ShardRole=AUTO and ShardMasterRegexp=-0$ for instances on hosts named data-0, data-1, and data-2, data-0 is configured as node 1 and the others as additional data nodes. If no hostnames match the specified regular expression, or more than one does, none of the nodes are configured as data nodes.
regular_expression matching cluster node hostnames
ShardRegexp (ADD)
When ShardRole=AUTO, validates that the hostname of the instance matches the regular expression provided. For example, if ShardRole=AUTO, ShardMasterRegexp=-0$, and ShardRegexp=-[0-9]+$ are specified for instances on hosts named data-0, data-11, data-22, and data-2b, the instances are configured as follows:
  • data-0 becomes node 1 (it matches ShardMasterRegexp)
  • data-11 and data-22 become additional data nodes (they don’t match ShardMasterRegexp but do match ShardRegexp)
  • data-2b does not join the cluster (it matches neither regular expression)
regular_expression matching cluster node hostnames
[config] MaxServerConn
Sets the maximum number of concurrent connections from ECP clients that an ECP server can accept.
Each of these settings must be equal to or greater than the number of nodes in the cluster. Optionally set them higher than the currently planned number of nodes, to allow for adding nodes later without having to modify them.
MaxServers
Sets the maximum number of concurrent connections to ECP servers that an ECP client can maintain.
globals
Allocates shared memory to the database cache for 8-kilobyte, 16-kilobyte, 32-kilobyte, and 64-kilobyte buffers.
Specify the target cache size you determined for data nodes, as described in Estimate the Database Cache and Database Sizes. For example, to allocate a 200 GB database cache in 8-kilobyte buffers only, the value would be 0,0,204800,0,0,0.
gmheap
Optionally configures the size of the generic memory heap.
You probably need to increase this setting from the default of 37.5 MB to optimize the performance of the data nodes. For information about sizing the generic memory heap, see Calculating Initial Memory Requirements in the “Vertically Scaling InterSystems IRIS” chapter .
[Startup]
ShardRole=DATA
ShardMasterRegexp=-0$
ShardRegexp=-[0-9]$
[config]
MaxServerConn=64
MaxServers=64
globals=0,0,204800,0,0,0
gmheap=393,216‬
Copy code to clipboard

Configure and restart node 1 or deploy it with a CPF merge file as illustrated, confirm that it is running, then configure the remaining existing data node instances by updating the indicated settings in their iris.cpf files and restarting, or deploy the remaining data node instances using the same CPF merge file. At any stage you can optionally verify an instance’s settings by viewing its iris.cpf file.

Creating Sharded Tables and Loading Data

Once the cluster is fully configured, you can plan and create the sharded tables and load data into them. The steps involved are as follows:

Evaluate Existing Tables for Sharding

Although the ratio of sharded to nonsharded data on a cluster is typically high, when planning the migration of an existing schema to a sharded cluster it is worth remembering that not every table is a good candidate for sharding. In deciding which of your application’s tables to define as sharded tables and which to define as nonsharded tables, your primary considerations should be improving query performance and/or the rate of data ingestion, based on the following factors (which are also discussed in Evaluating the Benefits of Sharding). As you plan, remember that the distinction between sharded and nonsharded tables is totally transparent to the application SQL; like index selection, sharding decisions have implications for performance only.

  • Overall size — All other things being equal, the larger the table, the greater the potential gain.

  • Data ingestion — Does the table receive frequent and/or large INSERT statements? Parallel data loading means sharding can improve their performance.

  • Query volume — Which tables are queried most frequently on an ongoing basis? Again, all other things being equal, the higher the query volume, the greater the potential performance improvement.

  • Query type — Among the larger tables with higher query volume, those that frequently receive queries that read a lot of data (especially with a high ratio of data read to results returned) or do a lot of computing work are excellent candidates for sharding. For example, is the table frequently scanned by broad SELECT statements? Does it receive many queries involving aggregate functions?

Having identified some good candidates for sharding, review the following considerations:

  • Frequent joins — As discussed in Choose a Shard Key, tables that are frequently joined can be sharded with equivalent shard keys to enable cosharded joins, so that joining can be performed locally on individual shards, enhancing performance. Review each frequently-used query that joins two large tables with an equality condition to evaluate whether it represents an opportunity for a cosharded join. If the queries that would benefit from cosharding the tables represent a sizeable portion of your overall query workload, these joined tables are good candidates for sharding.

    However, when a large table is frequently joined to a much smaller one, sharding the large one and making the small one nonsharded may be most effective. Careful analysis of the frequency and query context of particular joins can be very helpful in choosing which tables to shard.

  • Unique constraints — A unique constraint on a sharded table can have a significant negative impact on insert/update performance unless the shard key is a subset of the unique key; see Choose a Shard Key for more information.

Important:

Regardless of other factors, tables that are involved in complex transactions requiring atomicity should never be sharded.

Create Sharded Tables

Sharded tables (as well as nonsharded tables) can be created in the cluster namespace on any node, using a SQL CREATE TABLE statement containing a sharding specification, which indicates that the table is to be sharded and with what shard key — the field or fields that determine which rows of a sharded table are stored on which shards. Once the table is created with the appropriate shard key, which provides a deterministic method of evenly distributing the table’s rows across the shards, you can load data into it using INSERT and dedicated tools.

Choose a Shard Key

By default, when you create a sharded table and do not specify a shard key, data is loaded into it using the system-assigned RowID as the shard key; for example, with two shards, the row with RowID=1 would go on one shard and the one with RowID=2 would go on the other, and so on. This is called a system-assigned shard key, or SASK, and is often the simplest and most effective approach because it offers the best guarantee of an even distribution of data and allows the most efficient parallel data loading.

Note:

By default, the RowID field is named ID and is assigned to column 1. If a user-defined field named ID is added, the RowID field is renamed to ID1 when the table is compiled, and it is the user-defined ID field that is used by default when you shard without specifying a key.

You also have the option of specifying one or more fields as the shard key when you create a sharded table; this is called a user-defined shard key, or UDSK. You might have good opportunities to use UDSKs if your schema includes semantically meaningful unique identifiers that do not correspond to the RowID, for example when several tables in a schema contain an accountnumber field.

An additional consideration concerns queries that join large tables. Every sharded query is decomposed into shard-local queries, each of which is run independently and locally on its shard and needs to see only the data that resides on that shard. When the sharded query involves one or more joins, however, the shard-local queries typically need to see data from other shards, which requires more processing time and uses more of the memory allocated to the database cache. This extra overhead can be avoided by enabling a cosharded join, in which the rows from the two tables that will be joined are placed on the same shard. When a join is cosharded, a query involving that join is decomposed into shard-local queries that join only rows on the same shard and thus run independently and locally, as with any other sharded query.

You can enable a cosharded join using one of two approaches:

  • Specify equivalent UDSKs for two tables.

  • Use a SASK for one table and the coshard with keywords and the appropriate UDSK with another.

To use equivalent UDSKs, simply specify the frequently joined fields as the shard keys for the two tables. For example, suppose you will be joining the CITATION and VEHICLE tables to return the traffic citations associated with each vehicle, as follows:

SELECT * FROM citation, vehicle where citation.vehiclenumber = vehicle.vin
Copy code to clipboard

To make this join cosharded, you would create both tables with the respective equivalent fields as the shard keys:

CREATE TABLE VEHICLE (make VARCHAR(30) not null, model VARCHAR(20) not null, 
  year INT not null, vin VARCHAR(17) not null, shard key (vin))

CREATE TABLE CITATION(citationid VARCHAR(8) not null, date DATE not null, 
  licensenumber VARCHAR(12) not null, plate VARCHAR(10) not null,
  vehiclenumber VARCHAR(17) not null, shard key (vehiclenumber))
Copy code to clipboard

Because the sharding algorithm is deterministic, this would result in both the VEHICLE row and the CITATION rows (if any) for a given VIN (a value in the vin and vehiclenumber fields, respectively) being located on the same shard (although the field value itself does not in any way determine which shard each set of rows is on). Thus, when the query cited above is run, each shard-local query can execute the join locally, that is, entirely on its shard. A join cannot be cosharded in this manner unless it includes an equality condition between the two fields used as shard keys. Likewise, you can use multiple-field UDSKs to enable a cosharded join, as long as the shard keys for the respective tables have same number of fields, in the same order, of types that allow the field values to be compared for equality.

The other approach, which is effective in many cases, involves creating one table using a SASK, and then another by specifying the coshard with keywords to indicate that it is to be cosharded with the first table, and a shard key with values that are equivalent to the system-assigned RowID values of the first table. For example, you might be frequently joining the ORDER and CUSTOMER tables in queries like the following:

SELECT * FROM orders, customers where orders.customer = customers.%ID
Copy code to clipboard

In this case, because the field on one side of the join represents the RowID, you would start by creating that table, CUSTOMER, with a SASK, as follows:

CREATE TABLE CUSTOMER (firstname VARCHAR(50) not null, lastname VARCHAR(75) not null, 
  address VARCHAR(50) not null, city VARCHAR(25) not null, zip INT, shard)
Copy code to clipboard

To enable the cosharded join, you would then shard the ORDER table, in which the customer field is defined as a reference to the CUSTOMER table, by specifying a coshard with the CUSTOMER table on that field, as follows:

CREATE TABLE ORDER (date DATE not null, amount DECIMAL(10,2) not null, 
  customer CUSTOMER not null, shard key (customer) coshard with CUSTOMER)
Copy code to clipboard

As with the UDSK example previously described, this would result in each row from ORDER being placed on the same shard as the row from CUSTOMER with RowID value matching its customerid value (for example, all ORDER rows in which customerid=427 would be placed on the same shard as the CUSTOMER row with ID=427). A cosharded join enabled in this manner must include an equality condition between the ID of the SASK-sharded table and the shard key specified for the table that is cosharded with it.

Generally, the most beneficial cosharded joins can be enabled using either of the following, as indicated by your schema:

  • SASKs representing structural relationships between tables and the coshard with keywords, as illustrated in the example, in which customerid in the ORDER table is a reference to RowID in the CUSTOMER table.

  • UDSKs involving semantically meaningful fields that do not correspond to the RowID and so cannot be cosharded using coshard with, as illustrated by the use of the equivalent vin and vehiclenumber fields from the VEHICLE and CITATION tables. (UDSKs involving fields that happen to be used in many joins but represent more superficial or adhoc relationships are usually not as helpful.)

Like queries with no joins and those joining sharded and nonsharded data, cosharded joins scale well with increasing numbers of shards, and they also scale well with increasing numbers of joined tables. Joins that are not cosharded perform well with moderate numbers of shards and joined tables, but scale less well with increasing numbers of either. For these reasons, you should carefully consider cosharded joins at this stage, just as, for example, indexing is taken into account to improve performance for frequently-queried sets of fields.

When selecting shard keys, bear in mind these general considerations:

  • The shard key of a sharded table cannot be changed, and its values cannot be updated.

  • All other things being equal, a balanced distribution of a table’s rows across the shards is beneficial for performance, and the algorithms used to distribute rows achieve the best balance when the shard key contains large numbers of different values but no major outliers (in terms of frequency); this is why the default RowID typically works so well. A well-chosen UDSK with similar characteristics may also be effective, but a poor choice of UDSK may lead to an unbalanced data distribution that does not significantly improve performance.

  • When a large table is frequently joined to a much smaller one, sharding the large one and making the small one nonsharded may be more effective than enabling a cosharded join.

Evaluate Unique Constraints

When a sharded table has a unique constraint (see Field Constraint and Unique Fields Constraint in the “Create Table” entry in the InterSystems SQL Reference), uniqueness is guaranteed across all shards. Generally, this means uniqueness must be enforced across all shards for each row inserted or updated, which substantially slows insert/update performance. When the shard key is a subset of the fields of the unique key, however, uniqueness can be guaranteed across all shards by enforcing it locally on the shard on which a row is inserted or updated, avoiding any performance impact.

For example, suppose an OFFICES table for a given campus includes the buildingnumber and officenumber fields. While building numbers are unique within the campus, and office numbers are unique within each building, the two must be combined to make each employee’s office address unique within the campus, so you might place a unique constraint on the table as follows:

CREATE TABLE OFFICES (countrycode CHAR(3), buildingnumber INT not null, officenumber INT not null,
  employee INT not null, CONSTRAINT address UNIQUE (buildingname,officenumber))
Copy code to clipboard

If the table is to be sharded, however, and you want to avoid any insert/update impact on performance, you must use buildingnumber, officenumber, or both as the shard key. For example, if you shard on buildingnumber (by adding shard key (buildingnumber) to the statement above), all rows for each building are located on the same shard, so when inserting a row for the employee whose address is “building 10, office 27”, the uniqueness of the address can be enforced locally on the shard containing all rows in which buildingnumber=10; if you shard on officenumber, all rows in which officenumber=27 are on the same shard, so the uniqueness of “building 10, office 27” can be enforced locally on that shard. On the other hand, if you use a SASK, or employee as a UDSK, any combination of buildingnumber and officenumber may appear on any shard, so the uniqueness of “building 10, office 27” must be enforced across all shards, impacting performance.

For these reasons, you may want to avoid defining unique constraints on a sharded table unless one of the following is true:

  • All unique constraints are defined with the shard key as a subset (which may not be as effective generally as a SASK or a different UDSK).

  • Insert and update performance is considered much less important than query performance for the table in question.

Note:

Enforcing uniqueness in application code (for example, based on some counter) can eliminate the need for unique constraints within a table, simplifying shard key selection.

Create the Tables

Create the empty sharded tables using standard CREATE TABLE statements (see CREATE TABLE in the SQL Reference) in the cluster namespace on any data node in the cluster. As shown in the examples in Choose a Shard Key, there are two types of sharding specifications when creating a table:

  • To shard on the system-assigned shard key (SASK), include the shard keyword in the CREATE TABLE statement.

  • To shard on a user-defined shard key (UDSK), follow shard with the key keyword and the field or fields to shard on, for example shard key (customerid, purchaseid).

Note:

If the PK_IS_IDKEY option is set when you create a table, as described in Defining the Primary Key in the “Create Table” entry in the SQL Reference, the table’s RowID is the primary key; in such a case, using the default shard key means the primary key is the shard key. The best practice, however, if you want to use the primary key as the shard key, is to explicitly specify the shard key, so that there is no need to determine the state of this setting before creating tables.

You can display a list of all of the sharded tables on a cluster, including their names, owners, and shard keys, by navigating to the Sharding Configuration page of the Management Portal (System Administration > Configuration > System Configuration > Sharding Configuration) on node 1 or another data node, selecting the cluster namespace, and selecting the Sharded Tables tab. For a table you have loaded with data, you can click the Details link to see how many of the table’s rows are stored on each data node in the cluster.

Sharded Table Creation Constraints

The following constraints apply to sharded table creation:

  • You cannot use ALTER TABLE to make an existing nonsharded table into a sharded table (you can however use ALTER TABLE to alter a sharded table).

  • The SHARD KEY fields must be of numeric or string data types. The only collations currently supported for shard key fields are exact, SQLString, and SQLUpper, with no truncation.

  • All data types are supported except stream fields, the ROWVERSION field, and SERIAL (%Counter) fields.

  • A sharded table cannot include %CLASSPARAMETER VERSIONPROPERTY.

For further details on the topics and examples in this section, see CREATE TABLE in the InterSystems SQL Reference.

Defining Sharded Tables Using Sharded Classes

In addition to using DDL to define sharded tables, you can define classes as sharded using the Sharded class keyword; for details, see Defining a Sharded Table by Creating a Persistent Class in the “Defining Tables” chapter of Using InterSystems SQL. The class compiler has been extended to warn against using class definition features incompatible with sharding (such as customized storage definitions) at compile time. More developed workload mechanisms and support for some of these incompatible features, such as the use of stream properties, will be introduced in upcoming versions of InterSystems IRIS.

Load Data Onto the Cluster

Data can be loaded into sharded tables by using INSERT statements through any InterSystems IRIS interface that supports SQL, for example the Management Portal, the Terminal, or JDBC. Rapid bulk loading of data into sharded tables is supported by the transparent parallel load capability built into the InterSystems IRIS JDBC driver, as well as by the InterSystems IRIS Connector for Spark (see Deploy and Manage Services), which leverages the same capability. Java-based applications also transparently benefit from the InterSystems IRIS JDBC driver’s parallel loading capability.

Load Data Using INSERT

You can verify that a sharded table was created as intended by loading data using an INSERT or INSERT SELECT FROM through any InterSystems IRIS interface that supports SQL and then querying the table or tables in question.

Load Data Using the InterSystems IRIS Spark Connector

The InterSystems IRIS Spark Connector allows you to add Apache Spark capabilities to a sharded cluster. The recommended configuration is to locate Spark worker nodes on the data node hosts and a Spark master node on the node 1 host, connected to the corresponding InterSystems IRIS instances. When you deploy a sharded cluster using ICM, the Apache Spark image provided by InterSystems lets you easily and conveniently create this configuration (see The icm run Command in the “Using ICM” chapter of the ICM Guide). For more information about the Spark Connector and using it to load data, see Using the InterSystems IRIS Spark Connector.

Load Data Using the InterSystems IRIS JDBC Driver

Using the transparent parallel load capability of the InterSystems IRIS JDBC driver, you can construct a tool that retrieves data from a data source and passes it to the target table on the sharded cluster by means of JDBC connections, as follows:

  • Any database for which a suitable JDBC driver is available can act as the data source.

  • The InterSystems IRIS JDBC driver, which has been optimized for the parallel insertion of large numbers of records into the shards of a sharded table, is used to connect to the target table on the sharded cluster. (See Using Java JDBC with InterSystems IRIS for complete information about the InterSystems IRIS JDBC driver.)

    Note:

    For most data loading operations, including simple INSERTs, the JDBC driver uses direct connections to the data nodes brokered by the cluster. This requires the driver client to reach the data nodes at the IP addresses or hostnames with which they were assigned to the cluster, and means you cannot execute such queries if this is not possible. For example, when connecting from a local client to a sharded cluster provisioned in the cloud by ICM, the data node IP addresses known to and returned by the shard master will be on the cloud subnet and thus inaccessible from the local machine.

For your convenience, InterSystems provides the Simple Data Transfer utility, a Java command-line utility for massive data transfer from a JDBC data source or CSV file to a JDBC-compliant database. While the utility works with any supported target or source, it is optimized to work with InterSystems IRIS as the target, and is intended primarily for extremely fast relocation of huge datasets. The utility works with both nonsharded and sharded namespaces, and takes full advantage of parallelization when the target table is sharded. For more information about Simple Data Transfer, see “Using the Simple Data Transfer Utility”.

Create and Load Nonsharded Tables

You can create nonsharded tables in the master namespace on the shard master data server, and load data into them, using your customary methods. These tables are immediately available to the cluster for both nonsharded queries and sharded queries that join them to sharded tables. (This is in contrast to architectures in which nonsharded tables must be explicitly replicated to each node that may need them.) See Evaluate Existing Tables for Sharding for guidance in choosing which tables to load as nonsharded.

Querying the Sharded Cluster

The master namespace and the sharded tables it contains are fully transparent, and SQL queries involving any mix of sharded and nonsharded tables in the master namespace, or the corresponding namespace on a shard master app server, are no different from any SQL queries against any tables in an InterSystems IRIS namespace. No special query syntax is required to identify sharded tables or shard keys. Queries can join multiple sharded tables, as well as sharded and nonsharded tables. Everything is supported except what is specified in the following, which represent limitations and restrictions in the initial version of the InterSystems IRIS sharded cluster; the goal is that they will all be removed.

  • The only referential integrity constraints that are enforced for sharded tables are foreign keys when the two tables are cosharded, and the only referential action supported is NO ACTION.

  • Shard key fields must be of numeric or string data types. The only collations currently supported for shard key fields are exact, SQLString, and SQLUpper, with no truncation.

  • Row-level security for sharded tables is not currently supported.

  • Linked tables sourcing their content through a SQL Gateway connection cannot be sharded.

  • Queries with stream fields in the SELECT list are not currently supported.

  • Use of the following InterSystems IRIS SQL extensions is not currently supported:

    • Aggregate function extensions including %FOREACH, and %AFTERHAVING.

    • Nested aggregate functions.

    • Queries with both a nonaggregated field and an aggregate function, unless the GROUP BY clause is used.

    • The FOR SOME %ELEMENT predicate condition.

    • The %INORDER keyword.

Note:

If you want to explicitly purge cached queries on the data nodes, you can either purge all cached queries from the master namespace, or purge cached queries for a specific table. Both of these actions propagate the purge to the data nodes. Purging of individual cached queries is never propagated to the data nodes. For more information about purging cached queries, see Purging Cached Queries in the “Cached Queries” chapter of the SQL Optimization Guide.

Additional Sharded Cluster Options

Sharding offers many configurations and options, suitable to your needs. This section provides brief coverage of additional options of interest, including:

For further assistance in evaluating the benefits of these options for your cluster, please contact the InterSystems Worldwide Response Center (WRC).

Add Data Nodes and Rebalance Data

As described in Planning an InterSystems IRIS Sharded Cluster, the number of data nodes you include in a cluster when first deployed is influenced by a number of factors, including but not limited to the estimated working set for sharded tables and the compute resources you have available. Over time, you may want to increase the number of data nodes because the size of your sharded data may grow significantly enough to make a higher shard count desirable, for example, or because a resource constraint has been removed. Data nodes can be added by reprovisioning and redeploying the cluster using ICM (see Reprovisioning the Infrastructure and Redeploying Services in the ICM Guide), or using the by repeating the steps outlined in Configure the Shard Data Servers.

When you add data nodes to a cluster, there is no data stored on them. Sharded data that is already on the cluster and data that is loaded onto the cluster after they are added is distributed as follows:

  • Existing rows of all existing sharded tables remain (roughly) evenly distributed across the original set of data nodes.

  • The rows of sharded tables created after data nodes are added are evenly distributed across all of the data nodes, including the new ones.

  • Rows added to existing sharded tables after the data nodes are added are evenly distributed across the original set of data nodes; that is, none are placed on the new data nodes.

You can, however, use the $SYSTEM.Sharding.Rebalance() API call to rebalance existing sharded data across the expanded set of data nodes. For example, if you go from four data nodes to eight, rebalancing takes you from four existing data nodes with one fourth of the sharded data on each, plus four empty new servers, to eight servers with one eighth of the data on each. Rebalancing also allows rows added to existing sharded tables after data nodes are added to be evenly distributed across all of the shard servers. Thus, after you rebalance, all sharded data — including existing tables, rows added to existing tables, and new tables — is evenly distributed across all shard servers.

Rebalancing cannot coincide with queries and updates, and so can take place only when the sharded cluster is offline and no other sharded operations are possible. (In a future release, this limitation will be removed.) For this reason, the $SYSTEM.Sharding.Rebalance() call places the sharded cluster in a state in which queries and updates of sharded tables are not permitted to execute, and return an error if attempted.

Each rebalancing call can specify a time limit, however, so that the call can be scheduled in a maintenance window, move as much data as possible within the window, and return the sharded cluster to a fully-usable state before the window ends. By using this approach with repeated calls, you can fully rebalance the cluster over a series of scheduled maintenance outages without otherwise interfering with its operation. You can also specify the minimum amount of data to be moved by the call; if it is not possible to move that much data within the specified time limit, no rebalancing occurs.

Note:

Query and update operations execute correctly before rebalancing is performed (when new shard servers are still empty), in between the calls of a multicall rebalancing operation, and after rebalancing is complete, but they are most efficient after all of the data has been rebalanced across all of the shard servers.

The illustration that follows show the process of adding shards and rebalancing data using a multicall rebalancing operation.

Adding a Shard and Rebalancing Data
generated description: shard rebalancing

Mirror Data Nodes for High Availability

An InterSystems IRIS mirror is a logical grouping of physically independent InterSystems IRIS instances simultaneously maintaining exact copies of production databases, so that if the instance providing access to the databases becomes unavailable, another can take over. A mirror can provide high availability through automatic failover, in which a failure of the InterSystems IRIS instance providing database access (or its host system) causes another instance to take over automatically. The “Mirroring” chapter of the High Availability Guide contains detailed information about InterSystems IRIS mirroring.

The data nodes in a sharded cluster can be mirrored; the recommended general best practice is that either all of these nodes are mirrored, or that none are. Note that when data nodes are mirrored, sharded queries can transparently complete successfully even if one or more data nodes fail over during query execution.

You can deploy a mirrored sharded cluster using ICM, the %SYSTEM.Cluster API, or CPF settings. You can also convert an existing nonmirrored cluster to a mirrored cluster.

Because they do not store persistent data, compute nodes never require mirroring.

Note:

This release does not support the use of async members in mirrors serving as data nodes in sharded clusters.

Deploy a Mirrored Cluster Using ICM

To deploy a fully mirrored using InterSystems Cloud Manager, refer to Define the Deployment and make the following changes:

  1. Add “Mirror”: “true” to the defaults file.

  2. Define an even number of DATA nodes in the definitions file. (If you define an odd number of DATA nodes when Mirror is set to true, provisioning fails.)

  3. If you want to configure a mirror arbiter, include an AR node in the definitions file.

For more information on deploying mirrored configurations with ICM, see ICM Cluster Topology and Mirroring in the “ICM Reference” chapter of the ICM Guide.

Deploy a Mirrored Cluster Using the %SYSTEM.Cluster API

To deploy each data node in a mirrored cluster using the API, you can begin with either the primary failover member in an existing mirror or a nonmirrored instance, as follows:

  • If you start with an existing mirror primary, the API adds it to the cluster as a data node without any changes to its mirror configuration.

  • If you start with a nonmirrored instance, the API configures it as a mirror primary, based on the settings you provide in arguments, before adding it to the cluster as a data node.

Either way, the primary of the mirrored data node is attached to the cluster, and requires the addition of a backup failover member to complete the mirror, as follows:

  • If you started with an existing mirror primary, you can add the existing backup to the cluster by specifying it as the backup of that primary; the backup is attached to the cluster as a data node without any change to the mirror configuration. (If the primary you added does not have a backup, you can add a nonmirrored instance as the backup and the API automatically configures it as the backup before attaching it to the cluster.)

  • If you started with a nonmirrored instance that the API configured as a mirror primary, add a nonmirrored instance by specifying it as the backup of that primary; the API automatically configures it as the backup before attaching it to the cluster.

The recommended best practice is that all mirrored data nodes be completed, with both members of the mirror attached to the cluster, before any data is stored on the cluster.

The procedure for deploying a mirrored cluster is similar to that for deploying a nonmirrored cluster, as described in Deploy the Cluster Using the %SYSTEM.Cluster API. After you provision or identify the infrastructure and install InterSystems IRIS on the cluster nodes as described in that section, you may find it convenient to configure all of the mirrors before deploying them as a sharded cluster. The Management Portal and %SYSTEM.MIRROR API allow you to specify more settings than the %SYSTEM.Cluster calls described in the following procedure; see Configuring Mirroring in the “Mirroring” chapter of the High Availability Guide for details. Even if you plan to use the API calls to configure the mirrors, it is a good idea to review the procedures and settings in Creating a Mirror in the “Mirroring” chapter before you do so.

To initialize the node 1 (first data node) mirror and configure the remaining mirrored data nodes, do the following:

  1. On the intended node 1 primary, open the InterSystems Terminal for the instance and call the $SYSTEM.Cluster.InitializeMirrored() method, for example:

    set status = $SYSTEM.Cluster.InitializeMirrored()
    Copy code to clipboard
    Note:

    To see the return value (for example, 1 for success) for the each API call detailed in these instructions, enter:

    zw status
    Copy code to clipboard

    If a call does not succeed, display the user-friendly error message by entering:

    do $SYSTEM.Status.DisplayError(status) 
    Copy code to clipboard

    This call initializes the cluster on the node in the same way as $SYSTEM.Cluster.Initialize(), as described in Configure Node 1) in “Deploy the Cluster Using the %SYSTEM.Cluster API”; review that section for explanations of the first four arguments (none required) to InitializeMirrored(), which are the same as for Initialize(). If the instance is not already a mirror primary, you can use the next five arguments to configure it as one; if it is already a primary, these are ignored. The mirror arguments are as follows:

    • Arbiter host

    • Arbiter port

    • Directory containing the Certificate Authority certificate, local certificate, and private key file required to secure the mirror with TLS, if desired. The call expects the files to be named CAFile.pem, CertificateFile.pem, and PrivateKeyFile.pem, respectively.

    • Name of the mirror.

    • Name of this mirror member.

    The InitializeMirrored() call returns an error if the InterSystems IRIS instance belongs to an existing sharded cluster or is a mirror member other than primary.

  2. On the intended node 1 backup, open the Terminal for the InterSystems IRIS instance and call $SYSTEM.Cluster.AttachAsMirroredNode(), specifying the host and superserver port of the node 1 primary as the cluster URL in the first argument, and the mirror role backup in the second, for example:

    set status = $SYSTEM.Cluster.AttachAsMirroredNode("IRIS://node1prim:51773","backup")
    Copy code to clipboard

    If you supplied an IP address as the fourth argument to InitializeMirrored() when initializing the node 1 primary, use the IP address instead of the hostname to identify node 1 in the first argument, for example:

    set status = $SYSTEM.Cluster.AttachAsMirroredNode("IRIS://100.00.0.01:51773","backup")
    Copy code to clipboard
    Note:

    The default superserver port number of an InterSystems IRIS instance that is the only such on its host is 51773. To see or set the instance’s superserver port number, select System Administration > Configuration > System Configuration > Memory and Startup in the instance’s Management Portal. (For information about opening the Management Portal for the instance, see InterSystems IRIS Connection Information in InterSystems IRIS Basics: Connecting an IDE.)

    This call attaches the node as a data node in the same way as $SYSTEM.Cluster.AttachAsDataNode(), as described in Configure the Remaining Data Nodes in “Deploy the Cluster Using the %SYSTEM.Cluster API”, and ensures that it is the backup member of the node 1 mirror. If the node is backup to the node 1 primary before you issue the call — that is, you are initializing an existing mirror as node 1 — the mirror configuration is unchanged; if it is not a mirror member, it is added to the node 1 primary’s mirror as backup. Either way, the namespace, database, and mappings configuration of the node 1 primary are replicated on this node. (The third argument to AttachAsMirroredNode is the same as the second for AttachAsDataNode, that is, the IP address of the host, included if you want the other cluster members to use it in communicating with this node.)

  3. To configure mirrored data nodes other than node 1, use $SYSTEM.Cluster.AttachAsMirroredNode() to attach both the primary and the backup to the cluster, as follows:

    1. When adding the primary, specify the node 1 primary in the cluster URL and primary as the second argument. If the instance is not already the primary in a mirror, use the fourth argument and the four that follow to configure it as one; the arguments are as listed for the InitializeMirrored() call in the preceding. If the instance is already a mirror primary, the mirror arguments are ignored if provided.

    2. When adding a backup, specify the intended primary in the cluster URL and backup as the second argument. If the instance is already configured as backup in the mirror in which the node you specify is primary, its mirror configuration is unchanged; if it is not yet a mirror member, it is configured as backup.

  4. When you have configured all of the data nodes, you can call the $SYSTEM.Cluster.ListNodes() method to list them. When a cluster is mirrored, the list indicates the mirror name and role for each member of a mirrored data node, for example:

    set status = $system.Cluster.ListNodes()
    NodeId  NodeType    Host          Port    Mirror  Failover
    1       Data        node1prim     51773   MIRROR1 Primary
    1       Data        node1back     51773   MIRROR1 Backup
    2       Data        node2prim     51773   MIRROR2 Primary
    2       Data        node2back     51773   MIRROR2 Backup
    
    Copy code to clipboard
Note:

The recommended best practice is to load balance application connections across all of the mirrored data nodes in the cluster.

The InitializeMirrored() call returns an error if

  • The current InterSystems IRIS instance is already a node of a sharded cluster.

  • The current instance is already a mirror member, but not the primary.

  • You specify (in the first two arguments) a cluster namespace or master namespace that already exists, and its globals database is not mirrored.

The AttachAsMirroredNode() call returns an error if

  • The current InterSystems IRIS instance is already a node in a sharded cluster.

  • The role primary is specified and the cluster node specified in the first argument is not the node 1 primary, or the current node is not either primary in a mirror or nonmirrored.

  • The role backup is specified and the cluster node specified in the first argument is not a primary mirror member, or is primary in a mirror that already has a backup failover member.

  • The cluster namespace (or master namespace, when adding the node 1 backup) already exists on the current instance and its globals database is not mirrored.

Configure or Deploy a Mirrored Cluster Using CPF Settings

There are two ways to configure or deploy a mirrored cluster using CPF settings, as follows:

  • Configure or deploy the data node mirrors manually

    The manual method of configuring or deploying mirrored data nodes requires several different sets of CPF modifications or merge files, because different combinations of the ShardMirrorMember setting, which specifies the failover role (primary or backup) of the node, and ShardClusterURL must be applied to different nodes in the correct order.

  • Configure or deploy the mirrored data nodes automatically using a host name pattern

    If the names of the existing hosts you want to configure as a sharded cluster match a predictable pattern, or the new hosts you are provisioning for the cluster will do so, you can determine which node becomes the node 1 primary, which becomes its backup, and which of the others become primaries and backups based on those names. This approach allows you to use the same CPF modifications or CPF merge file for all instances, although they must still be configured in the right order.

Configure or deploy the data node mirrors manually

To configure or deploy a mirrored cluster manually using CPF settings, follow these steps:

  1. Configure and restart or deploy the node 1 primary using the procedure and settings indicated in Configure or Deploy Data Node 1 in “Configure or Deploy the Cluster Using CPF Settings”, with this modification:

    The following table and CPF excerpt explain and illustrate these settings:

    CPF settings for data node 1 mirror primary
    Section Setting Description Value for data node 1
    [Startup] ShardRole
    Determines the node’s role in the cluster.
    NODE1
    ShardMirrorMember (ADD)
    Determines the node’s role in the mirror.
    primary
    [config] MaxServerConn
    Sets the maximum number of concurrent connections from ECP clients that an ECP server can accept.
    Each of these settings must be equal to or greater than the number of nodes in the cluster. Optionally set them higher than the currently planned number of nodes, to allow for adding nodes later without having to modify them.
    MaxServers
    Sets the maximum number of concurrent connections to ECP servers that an ECP client can maintain.
    globals
    Allocates shared memory to the database cache for 8-kilobyte, 16-kilobyte, 32-kilobyte, and 64-kilobyte buffers.
    Specify the target cache size you determined for data nodes, as described in Estimate the Database Cache and Database Sizes. For example, to allocate a 200 GB database cache in 8-kilobyte buffers only, the value would be 0,0,204800,0,0,0.
    gmheap
    Optionally configures the size of the generic memory heap.
    You probably need to increase this setting from the default of 37.5 MB to optimize the performance of data nodes. For information about sizing the generic memory heap, see Calculating Initial Memory Requirements in the “Vertically Scaling InterSystems IRIS” chapter .
    [Startup]
    ShardRole=NODE1
    ShardMirrorMember=primary
    [config]
    MaxServerConn=64
    MaxServers=64
    globals=0,0,204800,0,0,0
    gmheap=393,216‬
    
    Copy code to clipboard

    Setting for the remaining nodes are provided separately in the steps that follow, but for a summary of the parameter values used to configure a mirrored cluster, see the table CPF parameters for configuring a mirrored sharded cluster after the last step.

  2. When the node 1 primary has been configured and restarted, or has been deployed and has started successfully, configure or deploy the node 1 backup using the settings in the previous step, with these modifications:

    The following CPF excerpt illustrates these settings:

    [Startup]
    ShardRole=DATA
    ShardClusterURL=IRIS://primarynode1:51773
    ShardMirrorMember=backup
    [config]
    MaxServerConn=64
    MaxServers=64
    globals=0,0,204800,0,0,0
    gmheap=393,216‬
    Copy code to clipboard

    When the node 1 backup has been configured and restarted, or has been deployed and has started successfully, you have created the node 1 mirror.

  3. Configure or deploy the primaries of the remaining data nodes using the settings indicated in the previous step, with this modification:

    The following CPF excerpt illustrates these settings:

    [Startup]
    ShardRole=DATA
    ShardClusterURL=IRIS://primarynode1:51773
    ShardMirrorMember=primary
    [config]
    MaxServerConn=64
    MaxServers=64
    globals=0,0,204800,0,0,0
    gmheap=393,216‬
    Copy code to clipboard

    After you have configured and restarted or deployed and successfully started half of the data nodes as mirror primaries, proceed to the next step.

  4. To configure or deploy a data node as a mirror backup, you must specify the mirror primary it is joining using the ShardClusterURL setting. For this reason, each of the remaining data nodes must be configured or deployed with a separate set of CPF modifications or CPF merge file. Do this using the settings indicated in the previous step, with these modifications:

    The following CPF excerpt illustrates these settings:

    [Startup]
    ShardRole=DATA
    ShardClusterURL=IRIS://primarynode4:51773
    ShardMirrorMember=backup
    [config]
    MaxServerConn=64
    MaxServers=64
    globals=0,0,204800,0,0,0
    gmheap=393,216‬
    Copy code to clipboard

    When each data node backup has been configured and restarted, or has been deployed and has started successfully, you have created all of the remaining data node mirrors and the sharded cluster is complete.

The following table summarizes the use of the relevant parameters to configure the entire cluster of mirrored data nodes:

CPF parameters for configuring a mirrored sharded cluster
hostname ShardRole value ShardMirrorRole value ShardClusterURL value Node Configuration
data-0 NODE1 primary n/a Node 1 primary
data-1 DATA backup IRIS://data-0:51773 Backup of node 1 primary
data-2 primary IRIS://data-0:51773 Second data node primary
data-3 backup IRIS://data-2:51773 Backup of primary on data-2
data-4 primary IRIS://data-0:51773 Third data node primary
data-5 backup IRIS://data-4:51773 Backup of primary on data-4

The data node primaries other than node 1 can be configured or deployed in any order; a backup cannot be configured or deployed until its primary is running.

Configure or deploy the data node mirrors automatically using a host name pattern

To configure or deploy a mirrored cluster automatically with CPF settings using a host name pattern, use the procedure and settings indicated in Configure or Deploy the Cluster Using a Hostname Pattern in “Configure or Deploy the Cluster Using CPF Settings”, with these modifications:

The following table and CPF excerpt explain and illustrate these settings:

CPF Settings when using a hostname pattern
Section Setting Description Value for remaining data nodes
[Startup] ShardRole
Determines the node’s role in the cluster; when set to AUTO, the node to configure as node 1 is identified by matching its hostname to the regular expression specified by ShardMasterRegexp, while the others are configured as additional data nodes. Cannot be set to AUTO unless ShardMasterRegexp is specified.
AUTO
ShardMirrorMember (ADD)
When used with ShardRole=AUTO, ShardMirrorMember=auto configures or deploys mirror failover roles based on the hostnames of the nodes on which the instances are deployed: if the integer following the final hyphen (-) in the hostname is even, the instance is configured as a primary, and if odd, as a backup. For example, if instances are deployed on four nodes with the hostnames data-0, data-1, data-2, and data-3, and ShardRole=AUTO, ShardMirrorMember=auto, and ShardMasterRegexp=-0$ are specified, the instances are configured as follows:
  • data-0 becomes the node 1 primary (matches ShardMasterRegexp; digit after final hyphen in hostname is even)
  • data-1 becomes the node 1 backup (hostname ends with odd digit and is next after node 1 primary hostname)
  • data-2 becomes a data node primary (hostname ends with even digit)
  • data-3 becomes the backup for the data node primary on data-2 (hostname ends with odd digit)
auto
ArbiterURL (ADD)
Identifies the arbiter used by the mirrors in a mirrored sharded cluster; include in any CPF merge file containing ShardMirrorMember to configure the specified mirrors with an arbiter.
host:port
When ShardRole=AUTO, identifies the instance to configure as node 1 by matching the name of the instance’s host to the regular expression specified as its value; see ShardMirrorMember row for example. If no hostnames match the specified regular expression, or more than one does, none of the nodes are configured as data nodes.
regular_expression matching cluster node hostnames
ShardRegexp
When ShardRole=AUTO, validates that the hostname of the instance matches the regular expression provided. For example, if ShardRegexp=-[0-9]$ and a fifth host called data-2b is added to the example provided for ShardMirrorMember above, data-2b is not configured as a data node because it matches neither ShardMasterRegExp (required for node 1) or ShardRegExp.(required for remaining data nodes).
regular_expression matching cluster node hostnames
[config] MaxServerConn
Sets the maximum number of concurrent connections from ECP clients that an ECP server can accept.
Each of these settings must be equal to or greater than the number of nodes in the cluster. Optionally set them higher than the currently planned number of nodes, to allow for adding nodes later without having to modify them.
MaxServers
Sets the maximum number of concurrent connections to ECP servers that an ECP client can maintain.
globals
Allocates shared memory to the database cache for 8-kilobyte, 16-kilobyte, 32-kilobyte, and 64-kilobyte buffers.
Specify the target cache size you determined for data nodes, as described in Estimate the Database Cache and Database Sizes. For example, to allocate a 200 GB database cache in 8-kilobyte buffers only, the value would be 0,0,204800,0,0,0.
gmheap
Optionally configures the size of the generic memory heap.
You probably need to increase this setting from the default of 37.5 MB to optimize the performance of data nodes. For information about sizing the generic memory heap, see Calculating Initial Memory Requirements in the “Vertically Scaling InterSystems IRIS” chapter .
[Startup]
ShardRole=AUTO
ShardMirrorMember=auto
ArbiterURL=arbiter:2188
ShardMasterRegexp=-0$
ShardRegexp=-[0-9]$
[config]
MaxServerConn=64
MaxServers=64
globals=0,0,204800,0,0,0
gmheap=393,216‬
Copy code to clipboard

Configure and restart the node 1 primary or deploy it with a CPF merge file as illustrated, confirm that it is running, then configure the remaining existing data node instances by updating the same settings in their iris.cpf files and restarting or deploy the remaining data node instances using the same CPF merge file. At any stage you can optionally verify an instance’s settings by viewing its iris.cpf file.

Convert a Nonmirrored Cluster to a Mirrored Cluster

You can convert an existing nonmirrored sharded cluster to a mirrored cluster using the procedure outlined in this section. The following is an overview of the tasks involved:

  • Provision and prepare enough new nodes to provide a backup for each existing data node in the cluster.

  • Create a mirror on each existing data node and then call $SYSTEM.Sharding.AddDatabasesToMirrors on node 1 to automatically convert the cluster to a mirrored configuration.

  • Back up the now-mirrored master and shard databases on the existing data nodes (the first failover member in each mirror).

  • For each intended second failover member (new node), select the first failover member (existing data node) to be joined, then create databases on the new node corresponding to the mirrored databases on the first failover member, add the new node to the mirror as second failover member, and restore the databases from the backup made on the first failover member to automatically add them to the mirror.

  • Call $SYSTEM.Sharding.VerifyShards() on any of the mirror primaries (original data nodes) to validate information about the backups and add it to the sharding metadata.

You can perform the entire procedure within a single maintenance window (that is, a scheduled period of time during which the application is offline and there is no user activity on the cluster), or you can split it between two maintenance windows, as noted in the instructions.

The detailed steps are provided in the following. If you are not already familiar with it, review the section Deploy the Cluster Using the %SYSTEM.Cluster API before continuing. Familiarity with mirror configuration procedures, as described in the Configuring Mirroring chapter of the High Availability Guide, is also helpful but not required; the steps in this procedure provide links to that chapter where appropriate.

  1. Prepare the nodes that are to be added to the cluster as backup failover members according to the instructions in the first two steps of “Deploy the Cluster Using the %SYSTEM.Cluster API”, Provision or identify the infrastructure and Deploy InterSystems IRIS on the data nodes. The host characteristics and InterSystems IRIS configuration of the prospective backups should be the same as the existing data nodes in all respects (see Mirror Configuration Guidelines in the High Availability Guide).

    Note:

    It may be helpful to make a record, by hostnames or IP addresses, of the intended first failover member (existing data node) and second failover member (newly added node) of each failover pair.

  2. Begin a maintenance window for the sharded cluster.

  3. On each current data node, start the ISCAgent, then create a mirror and configure the first failover member.

  4. To convert the cluster to a mirrored configuration, open the InterSystems Terminal for the instance on node 1 and in the master namespace (IRISDM by default) call the $SYSTEM.Sharding.AddDatabasesToMirrors() method (see %SYSTEM.Sharding API) as follows:

    set status = $SYSTEM.Sharding.AddDatabasesToMirrors()
    Copy code to clipboard
    Note:

    To see the return value (for example, 1 for success) for the each API call detailed in these instructions, enter:

    zw status
    Copy code to clipboard

    Reviewing status after each call is a good general practice, as a call might fail silently under some circumstances. If a call does not succeed (status is not 1), display the user-friendly error message by entering:

    do $SYSTEM.Status.DisplayError(status) 
    Copy code to clipboard

    The AddDatabasesToMirrors() call does the following:

    • Adds the master and shard databases on node 1 (see Initialize node 1 in “Deploy the Cluster Using the %SYSTEM.Cluster API”) and the shard databases on the other data nodes to their respective mirrors.

    • Reconfigures all ECP connections between nodes as mirror connections, including those between compute nodes (if any) and their associated data nodes.

    • Reconfigures remote databases on all data nodes and adjusts all related mappings accordingly.

    • Updates the sharding metadata to reflect the reconfigured connections, databases, and mappings.

    When the call has successfully completed, the sharded cluster is in a fully usable state (although failover is not yet possible because the backup failover members have not yet been added).

  5. Perform a coordinated backup of the data nodes (that is, one in which all nodes are backed up at the same logical point in time). Specifically, on each of the first failover members (the existing data nodes), back up the shard database (IRISCLUSTER by default), and on node 1, also back up the master database (IRISDM by default).

  6. Optionally, end the current maintenance window and allow application activity while you prepare the prospective backup mirror members in the next step.

  7. On each node to be added to the cluster as a second failover member, start the ISCAgent, then create the cluster namespace and shard database (IRISCLUSTER by default), using the same name and database directory as on the intended first failover member. On the intended node 1 second failover member, also add the master namespace and database (IRISDM by default) in the same manner.

  8. If not in a maintenance window, start a new one.

  9. On each new node, perform the tasks required to add any nonmirrored instance as the second failover member of an existing mirror that includes mirrored databases containing data, as follows:

    Note:

    Sharding automatically creates all the mappings it needs and propagates to the shards any user-defined mappings in the master namespace. Therefore, the only mappings that must be manually created during this process are any user-defined mappings in the master namespace, which must be created only in the master namespace on the node 1 second failover member.

  10. Open the InterSystems Terminal for the instance on any of the primaries (original data nodes) and in the cluster namespace (or the master namespace on node 1) call the $SYSTEM.Sharding.VerifyShards() method (see %SYSTEM.Sharding API) as follows:

    set status = $SYSTEM.Sharding.VerifyShards()
    Copy code to clipboard

    This call automatically adds the necessary information about the second failover members of the mirrors to the sharding metadata.

    Note:

    All of the original cluster nodes must be the current primary of their mirrors when this call is made,. Therefore, if any mirror has failed over since the second failover member was added, arrange a planned failover back to the original failover member before performing this step. (For one procedure for planned failover, see Maintenance of Primary Failover Member in the High Availability Guide; for information using the iris stop command to exit the graceful shutdown referred to in that procedure, see Controlling InterSystems IRIS Instances in the System Administration Guide.)

Important:

With the completion of the last step above, the maintenance window can be terminated. However, InterSystems strongly recommends testing each mirror by executing a planned failover (see above) before the cluster goes into production.

Deploy Compute Nodes for Workload Separation and Increased Query Throughput

For advanced use cases in which extremely low query latencies are required, potentially at odds with a constant influx of data, compute nodes can be added to provide a transparent caching layer for servicing queries. Each compute node caches the sharded data on the data node it is associated with, as well as nonsharded data when necessary. When a cluster includes compute nodes, read-only queries are automatically executed in parallel on the compute nodes, rather than on the data nodes; all write operations (insert, update, delete, and DDL operations) continue to be executed on the data nodes. This division of labor separates the query and data ingestion workloads while maintaining the advantages of parallel processing and distributed caching, improving the performance of both. Assigning multiple compute nodes per data node can further improve the query throughput and performance of the cluster.

When compute nodes are added to a cluster, they are automatically distributed as evenly as possible across the data nodes. Adding compute nodes yields significant performance improvement only when there is at least one compute node per data node. Because compute nodes support query execution only and do not store any data, their hardware profile can be tailored to suit those needs, for example by emphasizing memory and CPU and keeping storage to the bare minimum.

Sharded cluster with compute nodes
generated description: data compute nodes

For information about planning compute nodes and load balancing application connections to clusters with compute nodes, see Plan Compute Nodes.

The following sections describe procedures for deploying compute nodes using ICM, the %SYSTEM.Cluster API, and CPF settings.

Deploy Compute Nodes Using ICM

To include compute nodes in the sharded cluster when deploying using InterSystems Cloud Manager (ICM), include the desired number of COMPUTE nodes in the definitions file, covered in Define the Deployment. As described in that section, you can use the instance type field to define the compute nodes’ total memory and a CPF merge file to determine their database cache size. Compute nodes are automatically assigned to data nodes in round robin fashion, distributing them as evenly as possible. The recommend best practice is to deploy the same number of compute nodes per data node, so define the same number of COMPUTE nodes as DATA nodes, or twice as many, and so on.

To add compute nodes to an existing cluster of data nodes, add a COMPUTE node definition to the definitions.json file and then reprovision and redeploy, as described in Reprovisioning the Infrastructure and Redeploying Services in the “Using ICM” chapter of the ICM Guide).

Note:

If the number of DATA nodes in the definitions file is greater than the number of COMPUTE nodes, ICM issues a warning.

Deploy Compute Nodes Using the %SYSTEM.Cluster API

To add an instance on a networked system to your cluster as a compute node, open the InterSystems Terminal for the instance and call the $SYSTEM.Cluster.AttachAsComputeNode() method specifying the hostname of an existing cluster node and the superserver port of its InterSystems IRIS instance, for example:

set status = $SYSTEM.Cluster.AttachAsComputeNode("IRIS://datanode2:51773")
Copy code to clipboard
Note:

To see the return value (for example, 1 for success) for the each API call detailed in these instructions, enter:

zw status
Copy code to clipboard

If a call does not succeed, display the user-friendly error message by entering:

do $SYSTEM.Status.DisplayError(status) 
Copy code to clipboard

If you provided the IP address of the template node when configuring it (see Configure node 1 in “Deploy the Cluster Using the %SYSTEM.Cluster API”), use the IP address instead of the hostname.

set status = $SYSTEM.Cluster.AttachAsComputeNode("IRIS://100.00.0.01:51773")
Copy code to clipboard

If you want other nodes to communicate with this one using its IP address, specify the IP address as the second argument.

Note:

From the perspective of another node (which is what you need in this procedure), the superserver port of a containerized InterSystems IRIS instance depends on which host port the superserver port was published or exposed as when the container was created. For details on and examples of this, see Running an InterSystems IRIS Container with Durable %SYS and Running an InterSystems IRIS Container: Docker Compose Example in Running InterSystems Products in Containers and Container networking in the Docker documentation.

The default superserver port number of a kit-installed InterSystems IRIS instance that is the only such on its host is 51773. To see or set the instance’s superserver port number, select System Administration > Configuration > System Configuration > Memory and Startup in the instance’s Management Portal. (For information about opening the Management Portal for the instance, see InterSystems IRIS Connection Information in InterSystems IRIS Basics: Connecting an IDE.)

If the cluster node you identify in the first argument is a data node, it is used as the template; if it is a compute node, the data node to which it is assigned is used as the template. The AttachAsComputeNode() call does the following:

  • Enables the ECP and sharding services

  • Associates the new compute node with a data node that previously had the minimum number of associated compute nodes, so as to automatically balance compute nodes across the data nodes.

  • Creates the cluster namespace, configuring it to match the settings on the template node (specified in the first argument), as described in Configure Node 1, and creating all needed mappings.

  • Sets all SQL configuration options to match the template node.

If a namespace of the same name as the cluster namespace already exists on the new compute node, it is used as the cluster namespace, and only the mappings are replicated.

If you want other cluster nodes to communicate with this node using its IP address instead of its hostname, supply the IP address as the second argument.

The AttachAsComputeNode() call returns an error if the InterSystems IRIS instance is already a node in a sharded cluster.

When you have configured all of the compute nodes, you can call the $SYSTEM.Cluster.ListNodes() method to list them, for example:

set status = $system.Cluster.ListNodes()
NodeId  NodeType    DataNodeId    Host          Port
1       Data                      datanode1     51773
2       Data                      datanode2     51773
3       Data                      datanode3     51773
1001    Compute     1             computenode1  51773
1002    Compute     2             computenode2  51773
1003    Compute     3             computenode3  51773
Copy code to clipboard

When compute nodes are deployed, the list indicates the node ID of the data node that each compute node is assigned to. You can also use the $SYSTEM.Cluster.GetMetadata() retrieve metadata for the cluster, including the names of the cluster and master namespaces and their default globals databases and settings for the node on which you issue the call.

Configure or Deploy Compute Nodes Using CPF Settings

To configure an existing instance or deploy a new instance as a compute node in a sharded cluster of data nodes, follow the procedure in Configure or deploy the remaining data nodes in “Configure or Deploy the Cluster Using CPF Settings”, using the CPF settings listed there, with this modification:

The following table and CPF excerpt explain and illustrate these settings:

CPF settings for compute nodes
Section Setting Description Value for remaining data nodes
[Startup] ShardRole (CHANGE)
Determines the node’s role in the cluster.
COMPUTE
Identifies the existing node to use as a template when adding a node to the cluster, as described in Configure the remaining data nodes in the %SYSTEM.Cluster API procedure. When ShardRole=COMPUTE, If a data node is specified, it is used as the template; if a compute node is specified, the data node to which it is assigned is used as the template.
Hostname and superserver port of an existing node in the form IRIS://host:port.
[config] MaxServerConn
Sets the maximum number of concurrent connections from ECP clients that an ECP server can accept.
Each of these settings must be equal to or greater than the number of nodes in the cluster. Optionally set them higher than the currently planned number of nodes, to allow for adding nodes later without having to modify them.
MaxServers
Sets the maximum number of concurrent connections to ECP servers that an ECP client can maintain.
globals
Allocates shared memory to the database cache for 8-kilobyte, 16-kilobyte, 32-kilobyte, and 64-kilobyte buffers.
Specify the same target cache size you specified for the data nodes when you created the cluster, as described in Estimate the Database Cache and Database Sizes. For example, to allocate a 200 GB database cache in 8-kilobyte buffers only, the value would be 0,0,204800,0,0,0.
gmheap
Optionally configures the size of the generic memory heap.
You probably need to increase this setting from the default of 37.5 MB to optimize the performance of compute nodes. For information about sizing the generic memory heap, see Calculating Initial Memory Requirements in the “Vertically Scaling InterSystems IRIS” chapter .
[Startup]
ShardRole=COMPUTE
ShardClusterURL=IRIS://datanode1:51773
[config]
MaxServerConn=64
MaxServers=64
globals=0,0,204800,0,0,0
gmheap=393,216‬
Copy code to clipboard

Configure each compute node instance by updating the indicated settings in its iris.cpf file and restarting it, or deploy the compute node instances using a CPF merge file (as illustrated above) to customize the settings. Once an instance has been deployed or modified and restarted, you can verify that the settings are as desired by viewing the iris.cpf file.

Compute nodes are automatically assigned to data nodes in round robin fashion, distributing them as evenly as possible. The recommended best practice is to deploy the same number of compute nodes per data node, so define the same number of COMPUTE nodes as DATA nodes, or twice as many, and so on.

Install Multiple Data Nodes per System

With a given number of systems hosting data nodes, configuring multiple data node instances per system using the %SYSTEM.Sharding API can significantly increase data ingestion throughput. Therefore, when achieving the highest data ingestion throughput at the lowest cost is a concern, this may be achieved by configuring two or three data node instances per host. The gain achieved will depend on server type, server resources, and overall workload. While adding to the total number of systems might achieve the same throughput gain, or more (without dividing a host system’s memory among multiple database caches), adding instances is less expensive than adding systems.

InterSystems IRIS Sharding Reference

This section contains additional information about planning, deploying, and using a sharded configuration, including the following:

Planning an InterSystems IRIS Sharded Cluster

This section provides some first-order guidelines for planning a basic sharded cluster, and for adding compute nodes if appropriate. It is not intended to represent detailed instructions for a full-fledged design and planning process. The following tasks are addressed:

Combine Sharding with Vertical Scaling

Planning for sharding typically involves considering the tradeoff between resources per system and number of systems in use. At the extremes, the two main approaches can be stated as follows:

  • Scale vertically to make each system and instance as powerful as feasible, then scale horizontally by adding additional powerful nodes.

  • Scale horizontally using multiple affordable but less powerful systems as a cost-effective alternative to one high-end, heavily-configured system.

In practice, in most situations, a combination of these approaches works best. Unlike other horizontal scaling approaches, InterSystems IRIS sharding is easily combined with InterSystems IRIS’s considerable vertical scaling capacities. In many cases, a cluster hosted on reasonably high-capacity systems with a range of from 4 to 16 shard servers will yield the greatest benefit.

Plan a Basic Cluster of Data Nodes

To use these guidelines, you need to estimate several variables related to the amount of data to be stored on the cluster.

  1. First, review the data you intend to store on the cluster to estimate the following:

    1. Total size of all the sharded tables to be stored on the cluster, including their indexes.

    2. Total size of the nonsharded tables (including indexes) to be stored on the cluster that will be frequently joined with sharded tables.

    3. Total size of all of the nonsharded tables (including indexes) to be stored on the cluster. (Note that the previous estimate is a subset of this estimate.)

  2. Translate these totals into estimated working sets, based on the proportion of the data that is regularly queried.

    Estimating working sets can be a complex matter. You may be able to derive useful information about these working sets from historical usage statistics for your existing database cache(s). In addition to or in place of that, divide your tables into the three categories and determine a rough working set for each by doing the following:

    • For significant SELECT statements frequently made against the table, examine the WHERE clauses. Do they typically look at a subset of the data that you might be able to estimate the size of based on table and column statistics? Do the subsets retrieved by different SELECT statements overlap with each other or are they additive?

    • Review significant INSERT statements for size and frequency. It may be more difficult to translate these into working set, but as a simplified approach, you might estimate the average hourly ingestion rate in MB (records per second * average record size * 3600) and add that to the working set for the table.

    • Consider any other frequent queries for which you may be able to specifically estimate results returned.

    • Bear in mind that while queries joining a nonsharded table and a sharded table count towards the working set NonshardSizeJoinedWS, queries against that same nonsharded data table that do not join it to a sharded table count towards the working set NonshardSizeTotalWS; the same nonsharded data can be returned by both types of queries, and thus would count towards both working sets.

    You can then add these estimates together to form a single estimate for the working set of each table, and add those estimates to roughly calculate the overall working sets. These overall estimates are likely to be fairly rough and may turn out to need adjustment in production. Add a safety factor of 50% to each estimate, and then record the final total data sizes and working sets as the following variables:

    Cluster Planning Variables
    Variable
    Value
    ShardSize, ShardSizeWS
    Total size and working set of sharded tables (plus safety factor)
    NonshardSizeJoined, NonshardSizeJoinedWS
    Total size and working set of nonsharded tables that are frequently joined to sharded tables (plus safety factor)
    NonshardSizeTotal, NonshardSizeTotalWS
    Total size and working set of nonsharded tables (plus safety factor)
    NodeCount
    Number of data node instances

In reviewing the guidelines in the table that follows, bear the following in mind:

  • Generally speaking and all else being equal, more shards will perform faster due to the added parallelism, up to a point of diminishing returns due to overhead, which typically occurs at around 16 data nodes.

  • The provided guidelines represent the ideal or most advantageous configuration, rather than the minimum requirement.

    For example, as noted in Evaluating the Benefits of Sharding, sharding improves performance in part by caching data across multiple systems, rather than all data being cached by a single nonsharded instance, and the gain is greatest when the data in regular use is too big to fit in the database cache of a nonsharded instance. As indicated in the guidelines, for best performance the database cache of each data node instance in a cluster would equal at least the combined size of the sharded data working set and the frequently joined nonsharded data working set, with performance decreasing as total cache size decreases (all else being equal). But as long as the total of all the data node caches is greater than or equal to the cache size of a given single nonsharded instance, the sharded cluster will outperform that nonsharded instance. Therefore, if it is not possible to allocate database cache memory on the shard servers equal to what is recommended, for example, get as close as you can. Furthermore, your initial estimates may turn out to need adjustment in practice.

  • Database cache refers to the database cache (global buffer pool) memory allocation that must be made for each instance. The manual procedure is described in Memory and Startup Settings in the “Configuring InterSystems IRIS” chapter of the System Administration Guide; when deploying with ICM, you can override the globals setting in the instance’s configuration parameters file (CPF) by specifying a CPF merge file, as described in Deploying with Customized InterSystems IRIS Configurations in the “ICM Reference” chapter of the ICM Guide. For guidelines for allocating memory to an InterSystems IRIS instance’s routine and database caches as well as the generic memory heap, see Calculating Initial Memory Requirements in the “Vertical Scaling” chapter of this book.

    Important:

    Because the default setting is not appropriate for production use, database cache allocation is required after deployment, regardless of configuration.

  • Default globals database indicates the target size of the database in question, which is the maximum expected size plus a margin for greater than expected growth. The file system hosting the database should be able to accommodate this total, with a safety margin there as well. For general information about InterSystems IRIS database size and expansion and the management of free space relative to InterSystems IRIS databases, and procedures for specifying database size and other characteristics when configuring instances manually, see Configuring Databases in the “Configuring InterSystems IRIS” chapter and Maintaining Local Databases in the “Managing InterSystems IRIS” chapter of the System Administration Guide.

    When deploying using ICM, you can use the DataVolumeSize parameter (see General Parameters in the ICM Guide) to determine the size of the instance’s storage volume for data, which is where the default globals databases for the master and cluster namespaces are located; this must be large enough to accommodate the target size of the default globals database. On some platforms, this setting is limited by the field DataVolumeType (see Provider-Specific Parameters in the ICM Guide).

    Important:

    When deploying manually, ensure that all instances have database directories and journal directories located on separate storage devices (ICM arranges this automatically). This is particularly important when high volume data ingestion is concurrent with running queries. For guidelines for file system and storage configuration, including journal storage, see “File System Recommendations” and “Storage Recommendations” in the “Preparing to Install” chapter of the Installation Guide and Journaling Best Practices in the “Journaling” chapter of the Data Integrity Guide.

  • The number of data nodes (NodeCount) and the database cache size on each data node are both variables. The desired relationship between the sum of the data nodes’ database cache sizes and the total working set estimates can be created by varying the number of shards and the database cache size per shard server. This choice can be based on factors such as cost tradeoffs between system costs and memory costs; where more systems with lower memory resources are available, you can allocate smaller amounts of memory to the database caches, and when memory per system is higher, you can configure fewer servers. Generally speaking and all else being equal, more shards will perform faster due to the added parallelism, up to a point of diminishing returns (caused in part by increased sharding manager overhead). The most favorable configuration is typically in the 4-16 shard range, so other factors aside, for example, 8 data nodes with M memory each are likely to perform better than 64 shards with M/8 memory each.

  • Bear in mind that if you need to add data nodes after the cluster has been loaded with data, you can automatically redistribute the sharded data across the new servers (although this must be done with the cluster offline); see Add Data Nodes and Rebalance Data for more information. On the other hand, you cannot remove a data node with sharded data on it, and a server’s sharded data cannot be automatically redistributed to other data nodes, so adding data nodes to a production cluster involves considerably less effort than reducing the number of data nodes, which requires dropping all sharded tables before removing the data nodes, then reloading the data after.

  • Parallel query processing is only as fast as the slowest data node, so the best practice is for all data nodes in a sharded cluster to have identical or at least closely comparable specifications and resources. In addition, the configuration of all IRIS instances in the cluster should be consistent; database settings such as collation and those SQL settings configured at instance level (default date format, for example) should be the same on all nodes to ensure correct SQL query results. Standardized procedures and tools like ICM can help ensure this consistency.

Note:

The recommended best practice is to load balance application connections across all of the data nodes in a cluster. ICM can automatically provision and configure a load balancer for the data nodes as needed when deploying in a public cloud; if deploying a sharded cluster by other means, a load balancing mechanism is required.

The recommendations in the following table assume that you have followed the procedures for estimating total data and working set sizes described in the foregoing, including adding a 50% safety factor to the results of your calculations for each variable.

Cluster Planning Guidelines
Size of ...
should be at least ...
Notes
Database cache on data nodes
(ShardSizeWS / NodeCount) + NonshardSizeJoinedWS
This recommendation assumes that your application requires 100% in-memory caching. Depending on the extent to which reads can be made from fast storage such as solid-state drives instead, the size of the cache can be reduced.
Default globals database for cluster namespace on each data node
ShardSize / NodeCount plus space for expected growth
When data ingestion performance is a major consideration, consider configuring initial size of the database to equal the expected maximum size, thereby avoiding the performance impact of automatic database expansion. However, if running in a cloud environment, you should also consider the cost impact of paying for storage you are not using.
Default globals database for master namespace on node 1 (see Configuring Namespaces)
NonshardSizeTotal and possibly space for expected growth
Nonsharded data is likely to grow less over time than sharded data, but of course this depends on your application.
IRISTEMP database on shard master data server
No specific guideline. The ideal initial size depends on your data set, workload, and query syntax, but will probably be in excess of 100 GB and could be considerably more.
Ensure that the database is located on the fastest possible storage, with plenty of space for significant expansion. T
CPU
No specific recommendations.
All InterSystems IRIS servers can benefit by greater numbers of CPUs, whether or not sharding is involved. Vertical scaling of CPU, memory, and storage resources can always be used in conjunction with sharding to provide additional benefit, but is not specifically required, and is governed by the usual cost/performance tradeoffs.

All data nodes in a sharded cluster should have identical or at least closely comparable specifications and resources; parallel query processing is only as fast as the slowest data node. In addition, the configuration of all IRIS instances in the cluster should be consistent; database settings such as collation and those SQL settings configured at instance level (default date format, for example) should be the same on all nodes to ensure correct SQL query results. Standardized procedures and tools like ICM can help ensure this consistency.

Plan Compute Nodes

As described in Overview of InterSystems IRIS Sharding, compute nodes cache the data stored on data nodes and automatically process read-only queries, while all write operations (insert, update, delete, and DDL operations) are executed on the data nodes. The scenarios most likely to benefit from the addition of compute nodes to a cluster are as follows:

  • When high volume data ingestion is concurrent with high query volume, one compute node per data node can improve performance by separating the query workload (compute nodes) from the data ingestion workload (data nodes)

  • When high multiuser query volume is a significant performance factor, multiple compute nodes per data node increases overall query throughput (and thus performance) by permitting multiple concurrent sharded queries to run against the data on each underlying data node. (Multiple compute nodes do not increase the performance of individual sharded queries running one at a time, which is why they are not beneficial unless multiuser query workloads are involved.) Multiple compute nodes also maintain workload separation when a compute node fails, as queries can still be processed on the remaining compute nodes assigned to that data node.

When planning compute nodes, consider the following factors:

  • If you are considering deploying compute nodes, the best approach is typically to evaluate the operation of your basic sharded cluster before deciding whether the cluster can benefit from their addition. Compute nodes be easily added to an existing cluster by reprovisioning your ICM deployment or using the %SYSTEM.Cluster API. For information adding compute nodes, see Deploy Compute Nodes for Workload Separation and Increased Query Throughput.

  • When compute nodes are added to a cluster, they are automatically distributed as evenly as possible across the data nodes. Bear in mind that adding compute nodes yields significant performance improvement only when there is at least one compute node per data node. (If your definitions file specifies fewer COMPUTE nodes than DATA nodes, ICM issues a warning.)

  • The recommended best practice is to assign the same number of compute nodes to each data node. Therefore, if you are planning eight data nodes, for example, recommended choices for the number of compute nodes include zero, eight, sixteen, and so on.

  • Because compute nodes support query execution only and do not store any data, their hardware profile can be tailored to suit those needs, for example by emphasizing memory and CPU and keeping storage to the bare minimum. All compute nodes in a sharded cluster should have closely comparable specifications and resources.

  • Follow the data node database cache size recommendations (see Plan a Basic Cluster of Data Nodes) for compute nodes; ideally, each compute node should have the same size database cache as the data node to which it is assigned.

The distinction between data and compute nodes is completely transparent to applications, which can connect to any node's cluster namespace. Application connections can therefore be load balanced across all of the data and compute nodes in a cluster, and under most applications scenarios this is the most advantageous approach. What is actually best for a particular scenario depends on whether you would prefer to optimize query processing or data ingestion. If sharded queries are most important, you can prioritize them by load balancing across the data nodes, so applications are not competing with shard-local queries for compute node resources; if high-speed ingestion using parallel load is most important, load balance across the compute nodes to avoid application activity on the data nodes. If queries and data ingestion are equally important, or you cannot predict the mix, load balance across all nodes.

ICM allows you to automatically add a load balancer to your DATA node or COMPUTE node definitions; to load balance across all DATA and COMPUTE nodes, you can provision WS nodes (see ICM Node Types in the “ICM Reference” chapter of the ICM Guide), which automatically add all DATA and COMPUTE nodes to their remote server lists. You can also create your own load balancing arrangement.

Coordinated Backup and Restore of Sharded Clusters

When data is distributed across multiple systems, as in an InterSystems IRIS sharded cluster, backup and restore procedures may involve additional complexity. Where strict consistency of the data across a sharded cluster is required, independently backing up and restoring individual nodes is insufficient, because the backups may not all be created at the same logical point in time. This makes it impossible to be certain, when the entire cluster is restored following a failure, that ordering is preserved and the logical integrity of the restored databases is thereby ensured.

For example, suppose update A of data on data node S1 was committed before update B of data on data node S2. Following a restore of the cluster from backup, logical integrity requires that if update B is visible, update A must be visible as well. But if backups of S1 and S2 are taken independently, it is impossible to guarantee that the backup of S1 was made after A was committed, even if the backup of S2 was made after B was committed, so restoring the backups independently could lead to S1 and S2 being inconsistent with each other.

If, on the other hand, the procedures used coordinate either backup or restore and can therefore guarantee that all systems are restored to the same logical point in time — in this case, following update B — ordering is preserved and the logical integrity that depends on it is ensured. This is the goal of coordinated backup and restore procedures.

To greatly reduce the chances of having to use any of the procedures described here to restore your sharded cluster, you can deploy it with mirrored data servers, as described in Mirror for High Availability. Even if the cluster is unmirrored, most data errors (data corruption, for example, or accidental deletion of data) can be remedied by restoring the data server on which the error occurred from the latest backup and then recovering it to the current logical point in time using its journal files. The procedures described here are for use in much rarer situations requiring a cluster-wide restore.

This section covers the following topics:

Coordinated Backup and Restore Approaches for Sharded Clusters

Coordinated backup and restore of a sharded cluster always involves all of the data servers in the cluster — that is, the shard master data server and the data nodes. The InterSystems IRIS Backup API includes a Backup.ShardedCluster class that supports three approaches to coordinated backup and restore of a sharded cluster’s data servers.

Bear in mind that the goal of all approaches is to restore all data servers to the same logical point in time, but the means of doing so varies. In one, it is the backups themselves that share a logical point in time, but in the others, InterSystems IRIS database journaling provides the common logical point in time, called a journal checkpoint, to which the databases are restored. The approaches include:

  • Coordinated backups

  • Uncoordinated backups followed by coordinated journal checkpoints

  • A coordinated journal checkpoint included in uncoordinated backups

To understand how these approaches work, it is important that you understand the basics of InterSystems IRIS data integrity and crash recovery, which are discussed in the “Introduction to Data Integrity” chapter of the Data Integrity Guide. Database journaling, a critical feature of data integrity and recovery, is particularly significant for this topic. Journaling records all updates made to an instance’s databases in journal files. This makes it possible to recover updates made between the time a backup was taken and the moment of failure (or another selected point) by restoring updates from the journal files following restore from backup. Journal files are also used to ensure transactional integrity by rolling back transactions that were left open by the failure. For detailed information about journaling, see the “Journaling” chapter of the Data Integrity Guide.

Considerations when selecting an approach to coordinated backup and restore include the following:

  • The degree to which activity is interrupted by the backup procedure.

  • The complexity of the required restore procedure.

  • The frequency with which the backup procedure should be performed to guarantee sufficient recoverability.

These issues are discussed in detail later in this section.

Coordinated Backup and Restore API Calls

The methods in the Backup.ShardedCluster class can be invoked on a sharded cluster’s shard master data server or on one of its shard master application servers (if they exist). All of the methods take a ShardMasterNamespace argument; this is the name of either the master namespace on the shard master data server, or the namespace on a shard master application server that is mapped to the default globals database of the master namespace. (For information about how this relationship is configured with the API, see Configure the Shard Master App Servers; ICM creates this configuration automatically, but the result is the same.)

The available methods are as follows:

  • Backup.ShardedCluster.Quiesce()

    Blocks all activity on all data servers of the sharded cluster, and waits until all pending writes have been flushed to disk. Backups of the cluster’s data servers taken under Quiesce() represent a logical point in time.

  • Backup.ShardedCluster.Resume()

    Resumes activity on the data servers after they are paused by Quiesce().

  • Backup.ShardedCluster.JournalCheckpoint()

    Creates a coordinated journal checkpoint and switches each data server to a new journal file, then returns the checkpoint number and the names of the precheckpoint journal files. The precheckpoint files are the last journal files on each data server that should be included in a restore; later journal files contain data that occurred after the logical point in time represented by the checkpoint.

  • Backup.ShardedCluster.ExternalFreeze

    Freezes physical database writes, but not application activity, across the cluster, and then creates a coordinated journal checkpoint and switches each data server to a new journal file, returning the checkpoint number and the names of the precheckpoint journal files. The backups taken under ExternalFreeze() do not represent a logical point in time, but they include the precheckpoint journal files, thus enabling restore to the logical point in time represented by the checkpoint.

  • Backup.ShardedCluster.ExternalThaw

    Resumes disk writes after they are suspended by ExternalFreeze().

You can review the technical documentation of these calls in the InterSystems Class Reference.

Procedures for Coordinated Backup and Restore

The steps involved in the three coordinated backup and restore approaches provided by the Sharding API are described in the following sections.

Data server backups should, in general, include not only database files but all files used by InterSystems IRIS, including the journal directories, write image journal, and installation data directory, as well as any needed external files. The locations of these files depend in part on how the cluster was deployed (see Deploying the Sharded Cluster); the measures required to include them in backups may have an impact on your choice of approach.

Important:

The restore procedures described here assume that the data server being restored has no mirror failover partner available, and would be used with a mirrored data server only in a disaster recovery situation in which mirror recovery procedures (see Disaster Recovery Procedures in the “Mirroring” chapter of the High Availability Guide) are insufficient.  If the data server being restored is part of a mirror, remove it from the mirror, complete the restore procedure described, and then rebuild it as described in Rebuilding a Mirror Member in the “Mirroring” chapter.

Create Coordinated Backups
  1. Call Backup.ShardedCluster.Quiesce, which pauses activity on all data servers in the cluster (and thus all application activity) and waits until all pending writes have been flushed to disk. When this process is completed and the call returns, all databases and journal files across the cluster are at the same logical point in time.

  2. Create backups of all data servers in the cluster. Although the database backups are coordinated, they may include open transactions; when the data servers are restarted after being restored from backup, InterSystems IRIS recovery uses the journal files to restore transactional integrity by rolling back these back.

  3. When backups are complete, call Backup.ShardedCluster.Resume to restore normal data server operation.

    Important:

    Resume() must be called within the same job that called Quiesce(). A failure return may indicate that the backup images taken under Quiesce() were not reliable and may need to be discarded.

  4. Following a failure, on each data server:

    1. Restore the backup image.

    2. Verify that the only journal files present are those in the restored image from the time of the backup.

      Caution:

      This is critically important because at startup, recovery restores the journal files and rolls back any transactions that were open at the time of the backup. If journal data later than the time of the backup exists at startup, it could be restored and cause the data server to be inconsistent with the others.

    3. Restart the data server.

    The data server is restored to the logical point in time at which database activity was quiesced.

Note:

As an alternative to the first three steps in this procedure, you can gracefully shut down all data servers in the cluster, create cold backups, and restart the data servers.

Create Uncoordinated Backups Followed by Coordinated Journal Checkpoints
  1. Create backups of the databases on all data servers in the cluster while the data servers are in operation and application activity continues. These backups may be taken at entirely different times using any method of your choice and at any intervals you choose.

  2. Call Backup.ShardedCluster.JournalCheckpoint() on a regular basis, preferably as a scheduled task. This method creates a coordinated journal checkpoint and returns the names of the last journal file to include in a restore on each data server in order to reach that checkpoint. Bear in mind that it is the time of the latest checkpoint and the availability of the precheckpoint journal files that dictate the logical point in time to which the data servers can be recovered, rather than the timing of the backups.

    Note:

    Before switching journal files, JournalCheckpoint() briefly quiesces all data servers in the sharded cluster to ensure that the precheckpoint files all end at the same logical moment in time; as a result, application activity may be very briefly paused during execution of this method.

  3. Ensure that for each data server, you store a complete set of journal files from the time of its last backup to the time at which the most recent coordinated journal checkpoint was created, ending with the precheckpoint journal file, and that all of these files will remain available following a server failure (possibly by backing up the journal files regularly). The databases backups are not coordinated and may also include partial transactions, but when the data servers are restarted after being restored from backup, recovery uses the coordinated journal files to bring all databases to the same logical point in time and to restore transactional integrity.

  4. Following a failure, identify the latest checkpoint available as a common restore point for all data servers. This requires means that for each data server you have a database backup that preceding the checkpoint and all intervening journal files up to the precheckpoint journal file.

    Caution:

    This is critically important because at startup, recovery restores the journal files and rolls back any transactions that were open at the time of the backup. If journal files later than the precheckpoint journal file exist at startup, they could be restored and cause the data server to be inconsistent with the others.

  5. On each data server, restore the databases from the backup preceding the checkpoint, restoring journal files up to the checkpoint. Ensure that no journal data after that checkpoint is applied. The simplest way to ensure that is to check if the server has any later journal files, and if so move or delete them, and then delete the journal log.

    The data server is now restored to the logical point in time at which the coordinated journal checkpoint was created.

Include a Coordinated Journal Checkpoint in Uncoordinated Backups
  1. Call Backup.ShardedCluster.ExternalFreeze(). This method freezes all activity on all data servers in the sharded cluster by suspending their write daemons; application activity continues, but updates are written to the journal files only and are not committed to disk. Before returning, the method creates a coordinated journal checkpoint and switches each data server to a new journal file, then returns the checkpoint number and the names of the precheckpoint journal files. At this point, the precheckpoint journal files represent a single logical point in time.

  2. Create backups of all data servers in the cluster. The databases backups are not coordinated and may also include partial transactions, but when restoring the data servers you will ensure that they are recovered to the journal checkpoint, bringing all databases to the same logical point in time and to restoring transactional integrity.

    Note:

    By default, when the write daemons have been suspended by Backup.ShardedCluster.ExternalFreeze() for 10 minutes, application processes are blocked from making further updates (due to the risk that journal buffers may become full). However, this period can be extended using an optional argument to ExternalFreeze() if the backup process requires more time.

  3. When all backups are complete, call Backup.ShardedCluster.ExternalThaw() to resume the write daemons and restore normal data server operation.

    Important:

    A failure return may indicate that the backup images taken under ExternalFreeze() were not reliable and may need to be discarded.

  4. Following a failure, on each data server:

    1. Restore the backup image.

    2. Remove any journal files present in the restored image that are later than the precheckpoint journal file returned by ExternalFreeze().

    3. Follow the instructions in Starting InterSystems IRIS Without Automatic WIJ and Journal Recovery in the “Backup and Restore” chapter of the Data Integrity Guide to manually recover the InterSystems IRIS instance. When you restore the journal files, start with the journal file that was switched to by ExternalFreeze() and endi with the precheckpoint journal file returned by ExternalFreeze(). (Note that these may be the same file, in which case this is the one and only journal file to restore.)

      Note:

      If you are working with containerized InterSystems IRIS instances, see Upgrading When Manual Startup is Required in Running InterSystems Products in Containers for instructions for doing a manual recovery inside a container.

    The data server is restored to the logical point in time at which the coordinated journal checkpoint was created by the ExternalFreeze() method.

Note:

This approach requires that the databases and journal files on each data server be located such that a single backup can include them both.

Sharding APIs

At this release, InterSystems IRIS provides two APIs for use in configuring and managing a sharded cluster:

%SYSTEM.Cluster API

For more detail on the %SYSTEM.Cluster API methods and instructions for calling them, see the %SYSTEM.Cluster class documentation in the InterSystems Class Reference.

Use the %SYSTEM.Cluster API methods in the following ways:

%SYSTEM.Cluster methods include the following:

%SYSTEM.Sharding API

For more detail on the %SYSTEM.Sharding API methods and instructions for calling them, see the %SYSTEM.Sharding class documentation in the InterSystems Class Reference.

Use the %SYSTEM.Sharding API methods in the following ways:

  • Enable an InterSystems IRIS instance to act as a shard master or shard server by calling the EnableSharding method.

  • Define the set of shards belonging to a master namespace by making repeated calls to AssignShard in the master namespace, one call for each shard.

  • Once shards have been assigned, verify that they are reachable and correctly configured by calling VerifyShards.

  • If additional shards are assigned to a namespace that already contains sharded tables, and the new shards can't be reached for automatic verification during the calls to AssignShard, you can call ActivateNewShards to activate them once they are reachable.

  • List all the shards assigned to a master namespace by calling ListShards.

  • When converting a nonmirrored cluster to a mirrored cluster, after creating a mirror on each existing data node, add the master database and shard databases to their respective mirrors by calling AddDatabasesToMirrors.

  • Rebalance existing sharded data across the cluster after adding data nodes/shard data servers with $SYSTEM.Sharding.Rebalance() (see Add Shard Data Servers and Rebalance Data).

  • Assign a shard data server to a different shard namespace at a different address by calling ReassignShard.

  • Remove a shard from the set belonging to a master namespace by calling DeassignShard.

  • Set sharding configuration options by calling SetOption, and retrieve their current values by calling GetOption.

%SYSTEM.Sharding methods include the following:

Deploying the Namespace-level Architecture

Use the following procedure to deploy an InterSystems IRIS sharded cluster with the older namespace-level architecture, consisting of a shard master, shard data servers, and optionally shard master application servers using the %SYSTEM.Sharding API. Instructions are also provided for deploying the cluster using the Sharding Configuration page in the Management Portal (System Administration > Configuration > System Configuration > Sharding Configuration).

Note:

As with all classes in the %SYSTEM package, the %SYSTEM.Sharding methods are available through $SYSTEM.Sharding.

This procedure assumes each InterSystems IRIS instance is installed on its own system.

Provision or Identify the Infrastructure

Identify the needed number of networked host systems (physical, virtual, or cloud) — one host each for the shard master, shard data servers, and shard master app servers (if any).

A minimum network bandwidth of 1 GB is recommended, but 10 GB or more is preferred, if available; greater network throughput increases the performance of the sharded cluster.

Install InterSystems IRIS on the Cluster Nodes

  1. Install an instance of InterSystems IRIS on each system, as described in the Installation Guide.

    Note:

    All InterSystems IRIS instances in a sharded cluster must be of the same version.

    All InterSystems IRIS instances in a sharded cluster must have sharding licenses.

  2. Ensure that the storage device hosting each instance’s databases is large enough to accommodate the target globals database size, as described in Estimate the Database Cache and Database Sizes.

    All instances should have database directories and journal directories located on separate storage devices, if possible. This is particularly important when high volume data ingestion is concurrent with running queries. For guidelines for file system and storage configuration, including journal storage, see “File System Recommendations” and “Storage Recommendations” in the “Preparing to Install” chapter of the Installation Guide and Journaling Best Practices in the “Journaling” chapter of the Data Integrity Guide.

  3. Allocate the database cache (global buffer pool) for each instance, depending on its eventual role in the cluster, according to the sizes you determined in Estimate the Database Cache and Database Sizes. For the procedure for allocating the database cache, see Memory and Startup Settings in the “Configuring InterSystems IRIS” chapter of the System Administration Guide.

    Note:

    In some cases, it may be advisable to increase the size of the generic memory heap on the cluster members. For information on how to allocate memory to the generic memory heap, see gmheap in the Configuration Parameter File Reference.

    For guidelines for allocating memory to an InterSystems IRIS instance’s routine and database caches as well as the generic memory heap, see Calculating Initial Memory Requirements in the “Vertical Scaling” chapter.

Configure the Cluster Nodes

Perform the following steps on the instances with each role in the cluster.

Configure IP Addresses for Cluster Communications (Optional)

Under some circumstances, the API may be unable to resolve the hostnames of one or more nodes into IP addresses that are usable for interconnecting the nodes of a cluster. When this is the case, you can call $SYSTEM.Sharding.SetNodeIPAddress() (see %SYSTEM.Sharding API) to specify the IP address to be used for each node. To use $SYSTEM.Sharding.SetNodeIPAddress(), you must call it on every intended cluster node before making any other %SYSTEM.Sharding API calls on those nodes, for example:

set status = $SYSTEM.Sharding.SetNodeIPAddress("00.53.183.209")
Copy code to clipboard

When this call is used, you must use the IP address you specify for each node, rather than the hostname, as the shard-host argument when calling $SYSTEM.Sharding.AssignShard() on the shard master to assign the node to the cluster, as described in the following procedure.

Configure the Shard Data Servers

On each shard data server instance:

  1. Create the shard namespace using the Management Portal, as described in Create/Modify a Namespace in the “Configuring InterSystems IRIS” chapter of the System Administration Guide. (The namespace need not be interoperability-enabled.)

    Create a new database for the default globals database, making sure that it is located on a device with sufficient free space to accommodate its target size, as described in Estimate the Database Cache and Database Sizes. If data ingestion performance is a significant consideration, set the initial size of the database to its target size.

    Select the globals database you created for the namespace’s default routines database.

    Note:

    As noted in the Estimate the Database Cache and Database Sizes, the shard master data server and shard data servers all share a single default globals database physically located on the shard master and known as the master globals database. The default globals database created when a shard namespace is created remains on the shard, however, becoming the local globals database, which contains the data stored on the shard. Because the shard data server does not start using the master globals database until assigned to the cluster, for clarity, the planning guidelines and instructions in this document refer to the eventual local globals database as the default globals database of the shard namespace.

    A new namespace is automatically created with IRISTEMP configured as the temporary storage database; do not change this setting for the shard namespace.

  2. For a later step, record the DNS name or IP address of the host system, the superserver (TCP) port of the instance, and the name of the shard namespace you created.

    Note:

    From the perspective of another node (which is what you need in this procedure), the superserver port of a containerized InterSystems IRIS instance depends on which host port the superserver port was published or exposed as when the container was created. For details on and examples of this, see Running an InterSystems IRIS Container with Durable %SYS and Running an InterSystems IRIS Container: Docker Compose Example in Running InterSystems Products in Containers and Container networking in the Docker documentation.

    The default superserver port number of a kit-installed InterSystems IRIS instance that is the only such on its host is 51773. To see or set the instance’s superserver port number, select System Administration > Configuration > System Configuration > Memory and Startup in the instance’s Management Portal. (For information about opening the Management Portal for the instance, see InterSystems IRIS Connection Information in InterSystems IRIS Basics: Connecting an IDE.)

  3. In an InterSystems Terminal window, in any namespace, call $SYSTEM.Sharding.EnableSharding (see %SYSTEM.Sharding API) to enable the instance to participate in a sharded cluster, as follows:

    set status = $SYSTEM.Sharding.EnableSharding()
    Copy code to clipboard

    No arguments are required.

    Note:

    To see the return value (for example, 1 for success) for the each API call detailed in these instructions, enter:

    zw status
    Copy code to clipboard

    Reviewing status after each call is a good general practice, as a call might fail silently under some circumstances. If a call does not succeed (status is not 1), display the user-friendly error message by entering:

    do $SYSTEM.Status.DisplayError(status) 
    Copy code to clipboard

    After making this call, restart the instance, unless you had previously changed the values of the MaxServerConn and MaxServers CPF settings as described in Deploy InterSystems IRIS on the Data Nodes in the procedure for deploying a sharded cluster using the %SYSTEM.Cluster API.

Management Portal

Take the following steps to deploy using the Management Portal instead of the API:

  • Create the shard namespace by following the instructions in step 1, and make sure you have recorded the needed information about the instance as detailed in step 2.

  • Navigate to the Sharding Configuration page (System Administration > Configuration > System Configuration > Sharding Configuration) and use the Enable Sharding button to enable sharding. Then restart the instance, unless you had previously changed the values of the MaxServerConn and MaxServers CPF settings as described in Deploy InterSystems IRIS on the Data Nodes in the procedure for deploying a sharded cluster using the %SYSTEM.Cluster API.

Configure the Shard Master Data Server

On the shard master data server instance:

  1. Create the master namespace using the Management Portal, as described in Create/Modify a Namespace in the “Configuring InterSystems IRIS” chapter of the Administration Guide. (The namespace need not be interoperability-enabled.)

    Ensure that the default globals database you create is located on a device with sufficient free space to accommodate its target size, as described in Estimate the Database Cache and Database Sizes. If data ingestion performance is a significant consideration, set the initial size of the database to its target size.

    Select the globals database you created for the namespace’s default routines database.

    Note:

    A new namespace is automatically created with IRISTEMP configured as the temporary storage database; do not change this setting for the master namespace. Because the intermediate results of sharded queries are stored in IRISTEMP, this database should be located on the fastest available storage with significant free space for expansion, particularly if you anticipate many concurrent sharded queries with large result sets.

  2. In an InterSystems Terminal window, in any namespace, do the following:

    1. Call $SYSTEM.Sharding.EnableSharding() (see %SYSTEM.Sharding API) to enable the instance to participate in a sharded cluster (no arguments are required), as follows:

      set status = $SYSTEM.Sharding.EnableSharding()
      Copy code to clipboard

      After making this call, restart the instance, unless you had previously changed the values of the MaxServerConn and MaxServers CPF settings as described in Deploy InterSystems IRIS on the Data Nodes in the procedure for deploying a sharded cluster using the %SYSTEM.Cluster API.

    2. Call $SYSTEM.Sharding.AssignShard() (see %SYSTEM.Sharding API) once for each shard data server, to assign the shard to the master namespace you created, as follows:

      set status = $SYSTEM.Sharding.AssignShard("master-namespace","shard-host",shard-superserver-port,
          "shard_namespace")
      Copy code to clipboard

      where the arguments represent the name of the master namespace you created and the information you recorded for that shard data server in the previous step, for example:

      set status = $SYSTEM.Sharding.AssignShard("master","shardserver3",51773,"shard3")
      Copy code to clipboard
    3. To verify that you have assigned the shards correctly, you can issue the following command and verify the hosts, ports, and namespace names:

      do $SYSTEM.Sharding.ListShards()
      Shard   Host                       Port    Namespc  Mirror  Role    VIP
      1       shard1.internal.acme.com   56775   SHARD1
      2       shard2.internal.acme.com   56777   SHARD2
      ...
      Copy code to clipboard
      Note:

      For important information about determining the superserver port of an InterSystems IRIS instance, see step 2 of the procedure in Configure the Shard Data Servers.

    4. To confirm that the ports are correct and all needed configuration of the nodes is in place so that the shard master can communicate with the shard data servers, call $SYSTEM.Sharding.VerifyShards() (see %SYSTEM.Sharding API) as follows:

      do $SYSTEM.Sharding.VerifyShards() 
      Copy code to clipboard

      The $SYSTEM.Sharding.VerifyShards() call identifies a number of errors. For example, if the port provided in a $SYSTEM.Sharding.AssignShard() call is a port that is open on the shard data server host but not the superserver port for an InterSystems IRIS instance, the shard is not correctly assigned; the $SYSTEM.Sharding.VerifyShards() call indicates this.

      After configuring shard master application servers as described in the next section, you can call $SYSTEM.Sharding.VerifyShards() on each of them as well to confirm that they can communicate with the shard master data server and the shards.

Management Portal:

Take the following steps to deploy using the Management Portal instead of the API:

  • Create the master namespace by following the instructions in step 1.

  • Navigate to the Sharding Configuration page (System Administration > Configuration > System Configuration > Sharding Configuration) and use the Enable Sharding button to enable sharding, Then restart the instance, unless you had previously changed the values of the MaxServerConn and MaxServers CPF settings as described in Deploy InterSystems IRIS on the Data Nodes in the procedure for deploying a sharded cluster using the %SYSTEM.Cluster API.

  • Return to the Sharding Configuration page (reloading if necessary) and for each shard data server,

    • Click the Assign Shard button and enter the shard data server’s host, the instance’s superserver port, and the name of the shard namespace in the Assign Shard dialog. Leave the drop-down set to Data Shard, and leave the Mirrored checkbox cleared. Click Finish to assign the shard data server to the cluster.

    • Click the Verify Shards button to verify that the shards have been correctly configured and that the shard master can communicate with them. If the operation reports an error, you can use the Edit link to review and if necessary correct the information you entered, or the Deassign link to deassign the shard data server and repeat the Assign Shard operation.

      Note:

      If you have many shard data servers to assign, you can make the verification operation automatic by clicking the Advanced Settings button and selecting the Automatically verify shards on assignment in the Advanced Settings dialog. Other settings in this dialog should be left at the defaults when you deploy a sharded cluster.

Configure the Shard Master App Servers

On each shard master app server (if you are configuring them):

  1. In a Terminal window, in any namespace, call $SYSTEM.Sharding.EnableSharding() (see %SYSTEM.Sharding API) to enable the instance to participate in a sharded cluster, as follows:

    set status = $SYSTEM.Sharding.EnableSharding()
    Copy code to clipboard

    No arguments are required. After making this call, restart the instance, unless you had previously changed the values of the MaxServerConn and MaxServers CPF settings as described in Deploy InterSystems IRIS on the Data Nodes in the procedure for deploying a sharded cluster using the %SYSTEM.Cluster API.

  2. As described in Configuring an Application Server in the “Horizontally Scaling Systems for User Volume with InterSystems Distributed Caching” chapter of this guide:

    • Add the shard master data server as a data server.

      Note:

      Do not change the Maximum number of data servers and Maximum number of application servers settings on the ECP Settings page, which were specified by the $SYSTEM.Sharding.EnableSharding() call.

    • Create a namespace on the shard master data server and configure the default globals and routines databases of the master namespace on the shard master data server as the default globals and routines databases of the namespace on the shard master app server, thereby adding them as remote databases. This will be the namespace in which to execute queries, rather than the master namespace on the shard master data server.

If you have configured shard master app servers, configure the desired mechanism to distribute application connections across them.

Management Portal:

Take the following steps to deploy using the Management Portal instead of the API:

  • Navigate to the Sharding Configuration page (System Administration > Configuration > System Configuration > Sharding Configuration) and use the Enable Sharding button to enable sharding. Then restart the instance, unless you had previously changed the values of the MaxServerConn and MaxServers CPF settings as described in Deploy InterSystems IRIS on the Data Nodes in the procedure for deploying a sharded cluster using the %SYSTEM.Cluster API.

  • Add the shard master data server as a data server, create a namespace, and configure the master namespace’s globals and routines databases as the databases for the new namespace, as described in step 2.

Reserved Names

The following names are used by InterSystems IRIS and should not be used in the names of user-defined elements:

  • The package name IRIS.Shard is reserved for system-generated shard-local classnames and should not be used for user-defined classes.

  • The schema name IRIS_Shard is reserved for system-generated shard-local table names and should not be used for user-defined tables.

  • The prefixes IRIS.Shard., IS., and BfVY. are reserved for globals of shard-local tables, and in shard namespaces are mapped to the shard’s local databases. User-defined global names and global names for nonsharded tables should not begin with these prefixes. Using these prefixes for globals other than those of shard-local tables can result in unpredictable behavior.