Skip to main content

Manage Clusters, Instances, and Alert Rules

To monitor InterSystems IRIS instances with System Alerting and Monitoring (SAM) version 2.0, you must first create at least one SAM cluster (that is, a group of instances) and then define each instance as part of a cluster. You can then set alerts for a cluster to receive a notification if a metric for any instances in the cluster cross a given threshold. This page describes how to manage clusters, instances, and alert rules using the SAM web portal; you can also manage clusters, instances, and alert rules using the SAM REST API.

Note:

Clusters are sets of instances that do not intersect. In other words, an instance can only be assigned to one cluster. Attempting to add the same instance to a second cluster yields an error.

Organize Clusters for Related Instances

Grouping instances into SAM clusters allows you to monitor your system with greater precision. If your system contains instances for which you have different performance expectations, organize them into different clusters so that you can apply different sets of alert rules to them. For example, you may organize instances running on servers configured to handle high volumes of traffic into a cluster named highTraffic and other instances into a cluster named lowTraffic. You can then set an alert for when the number of SQL queries a lowTraffic instance receives per second exceeds a certain threshold, but set a higher threshold for alerts on instances in the highTraffic cluster.

Add or Edit a Cluster

To add a SAM cluster, do the following:

  1. Navigate to the main (Monitor Clusters) page in the SAM web application. Selecting System Alerting and Monitoring from anywhere within the application returns you to the main page.

  2. Select + New Cluster to open the Add New Cluster dialog.

  3. Fill in the following fields:

    • Cluster Name — A unique identifier for the cluster. SAM ignores case when it evaluates cluster names. If you attempt to assign a name to a cluster which differs from the name of an existing cluster in case alone (for example, transactarchive and transactArchive), you will receive an error.

    • Description — An optional description for the cluster.

  4. Select Add Cluster. This opens the Edit Cluster dialog, where you can continue to define your new cluster by adding instances and setting alert rules.

    Note:

    You can edit or delete an existing SAM cluster by navigating to the Monitor Clusters page and selecting the edit icon on the card for that cluster.

    When you change the Cluster Name for an existing cluster, the old instance metrics retained by the Prometheus database from before the name change remain associated with the previous cluster name. It has no effect on the availability of this data to queries using instance names.

Define the Instances You Want to Monitor

System Alerting and Monitoring can collect metrics and alerts from any InterSystems IRIS instance version 2020.1 or higher. To add an instance to SAM, first prepare the instance for monitoring (as described in the initial set up instructions) and then add it to a SAM cluster.

Note:

There is no specific limit to the number of instances SAM can monitor, but this number is constrained by the available memory in the SAM database. The maximum database size for the SAM Community Edition is 10GB, which is enough to support monitoring of about 40 instances with the default settings.

You can monitor the SAM Manager to ensure the SAM database does not run out of space. If you need to monitor more instances from SAM, consider using a different license.

Add or Edit an Instance within a Cluster

To add or edit an InterSystems IRIS instance within a SAM cluster, do the following:

  1. Ensure that you have prepared the instance for monitoring by SAM.

  2. Navigate to the main (Monitor Clusters) page in the SAM web application. Clicking System Alerting & Monitoring from anywhere within the application returns you to this page.

  3. Select the cluster to which the instance belongs (or will belong). If there are no clusters, you must create one.

  4. Click Edit Cluster to open the Edit Cluster dialog.

  5. Click the + Add Instance button at the top of the Instances table.

    Note:

    To edit or delete an existing instance, select it from the Instances table.

  6. Fill in the following fields:

    • Host Name – The fully qualified domain name or IP address of the machine hosting the target InterSystems IRIS instance.

      InterSystems recommends using domain names whenever possible, as IP addresses may change.

      Note:

      If the instance you are monitoring is located on the same system as SAM, you may enter host.docker.internal in this field.

    • Web Server Port – The web server port of the target InterSystems IRIS instance.

    • Dashboard — The Grafana dashboard you wish to display on the single instance page for this instance. SAM includes one pre-configured dashboard (SAM Dashboard); you can create others within Grafana.

    • Instance name and Description – Optional text descriptors to help you identify the instance.

    • URL Prefix — An optional URL prefix for the instance address (useful for systems which host multiple instances at the same domain). URL prefixes must include a slash at the beginning, but not at the end.

  7. Click Save to begin monitoring the instance with SAM.

Set Rules for when to Receive Alerts

System Alerting and Monitoring automatically collects InterSystems IRIS alertsOpens in a new tab from the instances it monitors. If you want to specify additional events that generate alerts, you can do so by defining Prometheus alert rules.

An alert displays information about the instance that generated it; the time the alert fired; and the alert name, message, and severity. A Prometheus alert rule indicates when SAM should fire an alert by evaluating an alert expression written in Prometheus Query Language (PromQL) using data from your instances collected in real time.

Alert rules are defined at the cluster level, but are evaluated distinctly for each instance within the cluster. This means instances within a cluster share the same alert rules, but generate alerts individually. By sorting your instances into clusters based on your performance expectations for them and setting different alert rules for each cluster, you can calibrate your monitoring strategy more precisely.

Note:

By default, alerts display in the SAM web portal, in Alerts tables. You can specify additional actions for System Alerting and Monitoring to perform when an alert fires, such as sending a text or email. To do so, refer to the instructions for writing and importing custom alert handlers in a later section on this page.

Create an Alert Rule for a Cluster

To create a new alert rule for a cluster, do the following:

  1. Navigate to the main (Monitor Clusters) page in the SAM web application. Clicking System Alerting & Monitoring from anywhere within the application navigates to this page.

  2. Select the cluster for which you would like to create an alert rule. If there are no clusters, you must create one.

  3. Click Edit Cluster to open the Edit Cluster dialog.

  4. Click the + Add Alert Rule button at the top of the Alert Rules table.

    Note:

    You may select an existing alert rule from this table to edit or delete it.

  5. Fill in the following fields:

    • Name – Any name for the alert rule. It is often useful to include the metric the rule uses in the name.

    • Severity – Either Critical or Warning. The severity of the alert determines the impact it will have on the instance state (described in a later section on this page)

      Note:

      You can give multiple alert rules the same name, but different severities. If both rules fire at the same time, SAM suppresses the rule with lower severity. This behavior reduces duplicate alerts firing for the same event.

    • Expression – An expression that defines when the alert fires, written in Prometheus Query Language.

      The alert expression syntax section contains an overview of the Prometheus Query Language syntax and several examples.

    • Message – A text description of the alert rule, which SAM displays when the alert fires.

      Note:

      The Alert message supports the $value variable, which contains the evaluated value of an alert expression. The syntax is:

      {{ $value }}

      The $value variable only holds one value; as such, you should not use it for alert rules that evaluate to multiple values (such as a rule that uses the and operator).

  6. Click Save. SAM validates the alert expression and then adds the alert rule to the cluster.

Below is an example of an alert rule:

When adding a New Alert Rule, every field is required.

Basic Syntax for Alert Expressions

This section provides an overview of how to write alert expressions and some examples.

Note:

If you want to learn how to write advanced alert expressions, read about the full capabilities of PromQL on the “Querying Prometheus” page in the Prometheus Documentation (https://prometheus.io/docs/prometheus/latest/querying/basics/Opens in a new tab).

A simple alert expression compares a metric to a value. For example:

# Greater than 80 percent of InterSystems IRIS licenses are in use:
iris_license_percent_used{cluster="production"}>80

# There are less than 5 active InterSystems IRIS processes:
iris_process_count{cluster="test"}<5

# The disk storing the MYDATA database is over 75% full:
iris_disk_percent_full{cluster="test",id="MYDATA"}>75

# Same as above, but specifying directory instead of database name:
iris_disk_percent_full{cluster="production",dir="/IRIS/mgr/MYDATA"}>75

The basic format for an alert expression based on a single metric is:

metric_name{cluster="cluster_name",label(s)}>value
metric_name The metric that the alert rule uses.
See Metric DescriptionsOpens in a new tab in the “Monitoring InterSystems IRIS Using REST API” section of Monitoring Guide for a table of all the default metrics you can use.
cluster_name The cluster to which the alert rule applies.
SAM applies the alert rule to all instances in the specified cluster. If any instance in the cluster triggers the rule, SAM generates an alert for that instance.
additional_labels If a metric contains labels, you may include these after the cluster label. Multiple labels are separated by commas.
All metrics must include the cluster label, described above.
operator The following comparison operators are available:
  • > (greater than) or >= (greater than or equal to)
  • < (less than) or <= (less than or equal to)
  • == (equal to) or != (not equal to)
value The value can be positive or negative, and may include a decimal component.

Alert Examples

The examples which follow demonstrate some of the capabilities of PromQL.

Example 1: Basic Alert Rule

The simplest alert expressions directly compare a single metric to a value.

The following alert expression evaluates iris_cpu_usage, which measures the total percent of CPU in use on the machine running InterSystems IRIS. If the value of iris_cpu_usage exceeds 90 for any InterSystems IRIS instance in the test cluster, the alert fires.

iris_cpu_usage{cluster="test"}>90

Example 2: Arithmetic Operators

PromQL supports the following arithmetic operators, ordered by precedence:

  1. ^ (exponentiation)

  2. * (multiplication), / (division), % (modulo)

  3. + (addition), - (subtraction)

Arithmetic operators are particularly useful when writing an alert expression that contains two or more metrics.

The following expression is triggered when the USER database in the test cluster is greater than 90 percent full. The expression calculates the percent by dividing the database size (iris_db_size_mb) by the database maximum size (iris_db_max_size_mb).

(iris_db_size_mb{cluster="test",id="USER"}/iris_db_max_size_mb{cluster="test",id="USER"})*100>90

Example 3: Logical OR Operator

PromQL supports logical operators for writing more complex rules. When using the or operator, the expression evaluates two conditions and fires if either is true.

One use for the or operator is to check whether a metric falls outside of a certain range. The following alert expression is triggered when either of the following conditions is true:

  • There are greater than 20 active ECP connections in the production cluster.

  • There is less than one active ECP connection in the production cluster.

iris_ecp_conn{cluster="production"}<1 or iris_ecp_conn{cluster="production"}>20

Example 4: Logical AND Operator

PromQL also supports the and operator. When using the and operator, the expression evaluates two conditions and fires if both are true.

The following example shows an alert rule that fires when both conditions are true:

  • There are unread alerts in the test cluster.

  • The system health state of an instance in the test cluster is something other than 0.

iris_system_alerts_new{cluster="test"}>=1 and iris_system_monitor_health_state{cluster="test"}!=0
FeedbackOpens in a new tab