Skip to main content

System Alerting and Monitoring Guide

System Alerting and Monitoring (SAM) is a cluster monitoring solution for InterSystems IRIS® data platform version 2020.1 and later. Whether your application runs on a local mirrored pair or in the cloud with multiple application and data servers, you can use SAM to monitor your application.

Group your InterSystems IRIS instances into one or more clusters, then observe a high-level summary of your application performance. Use the SAM web portal to view real-time performance metrics for your clusters and instances. Keep an eye out for the alerts SAM sends when a metric crosses a user-defined threshold. SAM packages all your monitoring needs behind an easy-to-use web interface.

The core System Alerting and Monitoring application is built on an InterSystems IRIS instance called the SAM Manager. Using cloud-native software, the SAM Manager collects and stores performance data and alerts from the InterSystems IRIS instances you care about.

Note:

You can also interface with SAM using its REST API. For details, see the System Alerting and Monitoring API Reference.

Deploying SAM

This section covers the following topics:

SAM Component Breakdown

System Alerting and Monitoring is made up of multiple open-source technologies, augmenting their features with enterprise resiliency. The SAM application consists of the following containers:

  • Alertmanager v0.20.0

  • Grafana v6.7.1

  • Nginx 1.17.9

  • Prometheus v2.17.1

  • SAM Manager 1.0.0.115

Each container performs a different role in the SAM application. Prometheus, an efficient cloud-native monitoring tool, collects time series data at a regular interval from all your target InterSystems IRIS instances. The SAM Manager stores these metrics, enabling high availability and scalability features not present in default Prometheus databases. Grafana, a world-class metrics visualization tool, presents these metrics in graphs that make it easy to examine the state of your application. The Alertmanager aggregates InterSystems IRIS alerts, which are pre-configured on the target instances, and Prometheus alerts, which you configure from the SAM web application.

These containers communicate over the Nginx web server, which is set to port 8080 by default. When trying to access any component of SAM, such as Grafana or the SAM Manager, do so using the Nginx port. Nginx also serves the SAM web application, which provides a graphical interface for configuring SAM and monitoring your instances.

Docker Compose makes it possible to run all these containers simultaneously. When you run SAM, Docker Compose starts each of these containers, which are listed in the SAM docker-compose.yml file.

Note:

For more information about the benefits of containerized applications, see Why Containers? in Running InterSystems Products in Containers.

First-Time Preparations

The first time you deploy System Alerting and Monitoring, perform the following steps to prepare your machine:

Ensure Docker Compose is installed

For instructions, see “Install Docker Compose” in the Docker documentation (https://docs.docker.com/compose/install/). The following versions are required:

  • Docker Engine version 19.03.098 or higher

  • Docker Compose version 1.25 or higher

Acquire System Alerting and Monitoring distribution files

InterSystems provides several files that define the container configuration necessary for SAM. These files include the following:

config/ A directory that contains settings for the SAM application.
docker-compose.yml A file that defines the SAM Components, which Docker Compose uses to deploy SAM.
readme.txt A brief text document for getting started.
start.sh, stop.sh Scripts to facilitate starting and stopping SAM.

You can obtain these files from either:

Unzip the distribution tarball

If you obtain the distribution files as a tarball, use the following command to uncompress it while preserving permissions:

tar zpxvf sam-<version-number>-unix.tar.gz

Replace <version-number> with the version of the SAM tarball you have.

Set up your license key

SAM comes with a free, built-in Community Edition license with the capacity to monitor approximately 40 instances. If using the community edition license, you may skip this step.

Note:

The Community Edition license for SAM limits its container to using eight cores.

To specify a different SAM license, you must edit the docker-compose.yml file obtained in the previous step. To do so:

  1. Open docker-compose.yml in a text editor.

  2. Locate the iris service in the docker-compose.yml file.

  3. Add a new command directly beneath the image line that specifies the desired license key to use.

    For example, with a key named iris.key located in the SAM /config directory, add:

    [...]
    image:intersystems/SAM:1.0.0.100
    command: ["--key","/config/iris.key"]
    init:true
    [...]
Configure your firewall for System Alerting and Monitoring

By default, SAM deploys on port 8080 of the host system. On Linux machines, you can check whether the port 8080 is available by using the netcast command:

$ nc -zv localhost 8080
Connection to localhost 8080 port [tcp/http-alt] succeeded!
Change the default System Alerting and Monitoring port

If necessary, you can change the host port mapping in the nginx section of the docker-compose.yml file. To do so:

  1. Open docker-compose.yml in a text editor.

  2. Locate the nginx service in the docker-compose.yml file.

  3. In the ports section, enter the desired port on your host machine. For example, if you would like to access SAM on port 9999, edit the section to look like:

    [...]
    ports:
       - 9999:8080
    [...]

For more information, see the “ports” section of the Docker Compose File Reference (https://docs.docker.com/compose/compose-file/#ports).

Starting and Stopping SAM

InterSystems provides two scripts that make it easy start or stop System Alerting and Monitoring.

To start SAM:

  1. Using the cd command in the command line, navigate to the directory containing the SAM docker-compose.yml file, which was acquired during initial setup.

  2. Next, run the start.sh script:

    ./start.sh

    This runs a Docker Compose command to start the SAM application.

  3. Optionally, you can use the docker ps command to confirm that all the containers are running. The output should look similar to the following:

    $ docker ps
    CONTAINER ID   IMAGE                       COMMAND                  CREATED             STATUS                       PORTS                                                  NAMES
    2aaa06f06a9c   nginx:1.17.9-alpine         "nginx -g 'daemon of..." About an hour ago   Up About an hour             80/tcp, 0.0.0.0:8080->8080/tcp                         sam_nginx_1
    0e2b30fcb376   grafana/grafana:6.7.1       "/run.sh"                About an hour ago   Up About an hour             3000/tcp                                               sam_grafana_1
    d2c825f9d220   prom/alertmanager:v0.20.0   "/bin/alertmanager -..." About an hour ago   Up About an hour             9093/tcp                                               sam_alertmanager_1 
    4851893bc369   prom/prometheus:v2.17.1     "/bin/prometheus --w..." About an hour ago   Up About an hour             9090/tcp                                               sam_prometheus_1
    61120be391df   intersystems/sam:1.0.0.83   "/iris-main"             About an hour ago   Up About an hour (healthy)   2188/tcp, 51773/tcp, 52773/tcp, 53773/tcp, 54773/tcp   sam_iris_1

Once SAM is up and running, you can access it from a web browser or using the SAM API.

To stop SAM:

  1. Using the cd command in the command line, navigate to the directory containing the SAM docker-compose.yml file.

  2. Next, run the stop.sh script:

    ./stop.sh

    This runs a Docker Compose command to stop the SAM application.

  3. Optionally, you can use the docker ps command to confirm that all the containers have stopped. Use the -a flag to view all containers, even those that are not running:

    docker ps -a

Accessing SAM from a Web Browser

When System Alerting and Monitoring is running, you can access it from a web browser at the following address:

http://<sam-domain-name>:<port>/api/sam/app/index.csp

where <sam-domain-name> is the DNS name or IP address of the system SAM is running on, and <port> is the configured Nginx port (8080 by default). You may want to bookmark this address.

When accessing SAM, you must log in using a valid User Name and Password. Like InterSystems IRIS, SAM includes several predefined accounts with the default password SYS. Choose any of these accounts with login permissions (such as Admin or SuperUser) and log in using the default password SYS.

The first time you sign in with one of the predefined accounts, SAM prompts you to enter a new password. To secure the SAM application, be sure to set a new password for all the predefined accounts. For a list of all the predefined accounts, see Predefined User Accounts in the “Users” chapter of the Security Administration Guide.

Advanced Configuration

System Alerting and Monitoring is designed to begin monitoring InterSystems IRIS instances with minimal setup. Each SAM component is configured by the docker-compose.yml file, reducing the necessary setup work.

While the default SAM configuration is usually sufficient, it is possible to adjust the docker-compose.yml file settings. This section describes configuration changes you may want to consider.

When constantly running queries for data older than two hours, increase the Prometheus retention time.

SAM includes two databases: the Prometheus database (used for short-term metrics storage) and an InterSystems IRIS database (used for longer-term storage). The Prometheus database retains data for two hours in a cache optimized for rapid querying, while the InterSystems IRIS database retains the data for long term analysis.

If you constantly run queries for data older than two hours, increasing the Prometheus retention time may increase performance. Adjust this setting by changing the “--storage.tsdb.retention.time” flag in the docker-compose.yml file. For more information, see “Operational aspects” in the Prometheus documentation (https://prometheus.io/docs/prometheus/latest/storage/#operational-aspects).

Setting Up SAM

Once System Alerting and Monitoring is deployed, you can specify the InterSystems IRIS instances you want to monitor, which must be grouped into SAM clusters. You can also perform additional setup actions in order to maximize the utility and performance of SAM.

These actions can be performed as necessary to establish your desired System Alerting and Monitoring configuration:

Creating a New Cluster

Within System Alerting and Monitoring, you must group InterSystems IRIS instances into clusters, which are unique sets of instances. Once you have created a cluster, you can view the alerts and statuses of all instances within the cluster.

To create a new cluster:

  1. Navigate to the main System Alerting and Monitoring page. Clicking System Alerting & Monitoring from anywhere within the SAM application navigates to the main page.

  2. Open the Add New Cluster dialog. To do this, click the + New Cluster button (if this is the first cluster, click Create Your First Cluster instead).

  3. Fill in the following information about your cluster:

    • Cluster name — The name can be any combination of numbers and letters. Cluster names must be unique.

      Note:

      Cluster names are converted to lowercase before they are saved. This means it is not possible to have two clusters with the same name but different casing.

    • Description (optional)

  4. Click Add Cluster to create the cluster.

After creating a cluster, SAM immediately displays the Edit Cluster dialog, allowing you to continue to define the cluster.

Adding an Instance to SAM

System Alerting and Monitoring can collect metrics and alerts from any InterSystems IRIS instance version 2020.1 or higher. To add an instance to SAM, first prepare the instance and then add it to a SAM cluster.

Note:

There is no specific limit to the number of instances SAM can monitor, but this number is constrained by the available memory in the SAM database. The maximum database size for the SAM Community Edition is 10GB, which is enough to support monitoring of about 40 instances with the default settings.

You can monitor the SAM Manager to ensure the SAM database does not run out of space. If you need to monitor more instances from SAM, consider using a different license.

Preparing the Instance for Monitoring

In order for SAM to monitor an instance, the following must be true:

The /api/monitor endpoint allows unauthenticated access

Each InterSystems IRIS instance version 2020.1 or higher contains the /api/monitor web application. In order for SAM to collect metrics and alerts from the /api/monitor endpoint, the endpoint must allow for unauthenticated access. To make sure this is the case:

  1. Open the Management Portal of the InterSystems IRIS instance you would like to add to SAM.

  2. Go to the Web Applications page (System Administration > Security > Applications > Web Applications).

  3. Select /api/monitor to open the Edit Web Application page.

  4. In the Security Settings section, select Unauthenticated.

For more information about the /api/monitor web application, see Monitoring InterSystems IRIS Using REST API in Monitoring Guide.

Necessary monitoring tools are enabled for the instance

InterSystems IRIS instances have built-in monitoring tools, and SAM uses these tools to collect information about the instance. Check that the following tools are enabled on the InterSystems IRIS instance:

  • System Monitor, which SAM uses when determining the instance state. By default, System Monitor is enabled.

  • Log Monitor, which enables SAM to see alerts from the instance. By default, Log Monitor is enabled and writes alerts to the alerts log.

    Important:

    SAM is only able to view instance alerts if Log Monitor writes them to the alerts log. If Log Monitor sends alerts by email instead of by writing them to the alerts log, SAM cannot view alerts for the instance.

The instance has a unique IP and Port combination

In order for SAM to monitor an InterSystems IRIS instance, the instance must be uniquely identifiable by an IP (or domain name) and Port.

Note:

SAM does not support connecting to an InterSystems IRIS instance with a URL prefix; URL prefixes are most common when multiple InterSystems IRIS instances are located on the same system.

For more information about configuring multiple InterSystems IRIS instances on the same system and URL prefixes, see the Connecting to Remote Servers topic in System Administration Guide.

Adding the Instance to a Cluster

To add an InterSystems IRIS instance to a SAM cluster, do the following:

  1. Ensure that the instance has been prepared be monitored by SAM.

  2. Navigate to the main System Alerting and Monitoring page. Clicking System Alerting & Monitoring from anywhere within the SAM application navigates to the main page.

  3. Select the cluster to which you would like to add the instance. If there are no clusters, you must create one.

  4. Click Edit Cluster to open the Edit Cluster dialog.

  5. Click the +New button at the top to the Instances table.

    Note:

    You may select an existing instance from this table to edit or delete it.

  6. Fill in the following fields:

    • IP – The fully qualified domain name or IP address of the machine hosting the target InterSystems IRIS instance.

      InterSystems recommends using domain names whenever possible, as IP addresses may change.

      Note:

      If the instance you are monitoring is located on the same system as SAM, you may enter host.docker.internal in this field.

    • Port – The web server port of the target InterSystems IRIS instance.

    • Cluster – The cluster to add the target instance to. When first adding an instance, this defaults to the current cluster.

      Note:

      Moving an instance between clusters may produce temporary irregularities in the reported alerts, dashboard, and state for that instance. These irregularities should resolve within a few hours.

    • Instance name and Description – Optional text descriptors to help you identify the instance.

  7. Click Add Instance to begin monitoring the instance with SAM.

Defining Cluster Alert Rules

System Alerting and Monitoring automatically collects InterSystems IRIS alerts from the instances it monitors. If you want to specify additional events that generate alerts, you can do so by defining Prometheus alert rules.

An alert displays information about the instance that generated it; the time the alert fired; and the alert name, message, and severity. A Prometheus alert rule indicates when SAM should fire an alert.

Alert rules are defined on a cluster level, but evaluated distinctly for each instance within the cluster. This means instances within a cluster share the same alert rules, but generate alerts individually.

To create a new alert rule for a cluster, do the following:

  1. Navigate to the main System Alerting and Monitoring page. Clicking System Alerting & Monitoring from anywhere within the SAM application navigates to the main page.

  2. Select the cluster for which you would like to create an alert rule. If there are no clusters, you must create one.

  3. Click Edit Cluster to open the Edit Cluster dialog.

  4. Click the +New button at the top to the Alert Rules table.

    Note:

    You may select an existing alert rule from this table to edit or delete it.

  5. Fill in the following fields:

    • Alert rule name – Any name for the alert rule. It is often useful to include the metric the rule uses in the name.

    • Alert severity – Either Critical or Warning. The severity of the alert determines the impact it will have on the instance state; see Understanding Instance State for more details.

      Note:

      You can give multiple alert rules the same name, but different severities. If both rules fire at the same time, SAM suppresses the rule with lower severity. This behavior reduces duplicate alerts firing for the same event.

    • Alert expression – An expression that defines when the alert fires, written in Prometheus Query Language.

      The Alert Expression Syntax section below contains an overview of the Prometheus Query Language syntax and several examples.

    • Alert message – A text description of the alert rule, which SAM displays when the alert fires.

      Note:

      The Alert message supports the $value variable, which contains the evaluated value of an alert expression. The syntax is:

      {{ $value }}

      The $value variable only holds one value; as such, you should not use it for alert rules that evaluate to multiple values (such as a rule that uses the and operator).

  6. Click Add Alert Rule. SAM validates the alert expression and then adds the alert rule to the cluster.

Below is an example of an alert rule:

When adding a New Alert Rule, every field is required.

Alert Expression Syntax

To write an alert expression, you must use Prometheus Query Language (PromQL). This section provides an overview of how to write alert expressions and some examples.

Note:

If you want to learn how to write advanced alert expressions, read about the full capabilities of PromQL on the “Querying Prometheus” page in the Prometheus Documentation (https://prometheus.io/docs/prometheus/latest/querying/basics/).

A simple alert expression compares a metric to a value. For example:

# Greater than 80 percent of InterSystems IRIS licenses are in use:
iris_license_percent_used{cluster="production"}>80

# There are less than 5 active InterSystems IRIS processes:
iris_process_count{cluster="test"}<5

# The disk storing the MYDATA database is over 75% full:
iris_disk_percent_full{cluster="test",id="MYDATA"}>75

# Same as above, but specifying directory instead of database name:
iris_disk_percent_full{cluster="production",dir="/IRIS/mgr/MYDATA"}>75

The basic format for an alert expression based on a single metric is:

metric_name{cluster="cluster_name",label(s)}>value
metric_name The metric that the alert rule uses.
See Metric Descriptions in the “Monitoring InterSystems IRIS Using REST API” section of Monitoring Guide for a table of all the default metrics you can use.
cluster_name The cluster to which the alert rule applies.
SAM applies the alert rule to all instances in the specified cluster. If any instance in the cluster triggers the rule, SAM generates an alert for that instance.
additional_labels If a metric contains labels, you may include these after the cluster label. Multiple labels are separated by commas.
All metrics must include the cluster label, described above.
operator The following comparison operators are available:
  • > (greater than) or >= (greater than or equal to)
  • < (less than) or <= (less than or equal to)
  • == (equal to) or != (not equal to)
value The value can be positive or negative, and may include a decimal component.

Alert Examples

Below are several examples of alert expressions that demonstrate some of the capabilities of PromQL.

Example 1: Basic Alert Rule

The simplest alert expressions directly compares a single metric to a value.

The following alert expression evaluates iris_cpu_usage, which measures the total percent of CPU in use on the machine running InterSystems IRIS. If the value of iris_cpu_usage exceeds 90 for any InterSystems IRIS instance in the test cluster, the alert fires.

iris_cpu_usage{cluster="test"}>90
Example 2: Arithmetic Operators

PromQL supports the following arithmetic operators, ordered by precedence:

  1. ^ (exponentiation)

  2. * (multiplication), / (division), % (modulo)

  3. + (addition), - (subtraction)

Arithmetic operators are particularly useful when writing an alert expression that contains two or more metrics.

The following expression is triggered when the USER database in the test cluster is greater than 90 percent full. The expression calculates the percent by dividing the database size (iris_db_size_mb) by the database maximum size (iris_db_max_size_mb).

(iris_db_size_mb{cluster="test",id="USER"}/iris_db_max_size_mb{cluster="test",id="USER"})*100>90
Example 3: Logical OR Operator

PromQL supports logical operators for writing more complex rules. When using the or operator, the expression evaluates two conditions and fires if either is true.

One use for the or operator is to check whether a metric falls outside of a certain range. The following alert expression is triggered when either of the following conditions is true:

  • There are greater than 20 active ECP connections in the production cluster.

  • There is less than one active ECP connection in the production cluster.

iris_ecp_conn{cluster="production"}<1 or iris_ecp_conn{cluster="production"}>20
Example 4: Logical AND Operator

PromQL also supports the and operator. When using the and operator, the expression evaluates two conditions and fires if both are true.

The following example shows an alert rule that fires when both conditions are true:

  • There are unread alerts in the test cluster.

  • The system health state of an instance in the test cluster is something other than 0.

iris_system_alerts_new{cluster="test"}>=1 and iris_system_monitor_health_state{cluster="test"}!=0

Adjusting Configuration Settings

The main System Alerting and Monitoring page contains a gear icon, located near the top of the screen. Click this icon to access the Configuration Settings dialog.

From this dialog, you can set the number of days (between 1 and 30) for SAM to store alert and metric data.

Note:

The Advanced Configuration section describes how to change other SAM settings.

Tuning the SAM Manager

The SAM Manager is the InterSystems IRIS instance that powers the System Alerting and Monitoring application. You can open the SAM Manager from a web browser using the following address:

http://<sam-domain-name>:<port>/csp/sys/UtilHome.csp

where <sam-domain-name> is the DNS name or IP address of the system SAM is running on, and <port> is the configured Nginx port (8080 by default).

Important:

The SAM Manager should not be used to develop or run any application; it is strictly for use by SAM. This section describes the appropriate uses and interactions with the SAM Manager.

For a general purpose InterSystems IRIS instance, see the InterSystems IRIS community edition.

You can do the following actions with the SAM Manager:

Adjusting Startup Settings

The SAM Manager initially allocates memory on startup as follows:

  • 2,000 MB of 8KB blocks for the database cache

  • 300 MB for the routines cache

This allocation should be sufficient when monitoring a modest number (30 or fewer) of InterSystems IRIS instances. If you are monitoring a large number of instances, or find that the SAM Manager is regularly using the full amount of allocated memory, you can increase these limits.

For details on adjusting these settings, see the Allocating Memory to the Database and Routine Caches topic in the System Administration Guide.

Clearing the SAM Database

System Alerting and Monitoring Community Edition has a maximum database size limit of 10 GB. If this limit is met, SAM may exhibit unexpected behavior, and it becomes necessary to clear the database.

In the SQL page of the SAM Manager (System Explorer > SQL), enter the following command to delete all SAM metric data:

DELETE FROM %SAM.PrometheusSample

To prevent the SAM database from filling up again, consider using a difference license or lowering the number of days that SAM stores metrics (as described in Adjusting Configuration Settings).

Monitoring the SAM Manager

It is possible to use System Alerting and Monitoring to monitor the SAM Manager, as the SAM Manager is itself an InterSystems IRIS instance. This allows you to keep track of whether the SAM Database is at risk of filling up, and make sure the configured cache sizes are sufficient for SAM operations.

Monitoring the SAM Manager is similar to monitoring any other instance, as described in the Adding Instances to a Cluster section, with the following difference:

For the IP and Port fields, specify the fully qualified DNS name and port (8080 by default) where SAM runs. You can see these values in the address bar of your browser when accessing SAM. For example, if the URL for SAM is:

http://<sam-domain-name>:<port>/api/sam/app/index.csp

Specify <sam-domain-name> in the IP field, and <port> in the Port field.

Note:

It does not work to specify localhost in the IP field; you must enter a fully qualified DNS name.

Adding Alert Handlers

You can create alert handlers that specify additional actions for System Alerting and Monitoring to perform when an alert fires, such as sending a text or email. Setting up an alert handler is a two step process:

  1. Writing the Alert Handler

  2. Importing the Alert Handler

Writing the Alert Handler

To create an alert handler, you must create a class using an ObjectScript IDE. Connect this IDE to an InterSystems IRIS instance that is not part of SAM.

Important:

You cannot use the SAM Manager to create the alert handler, as SAM is not a development platform.

Instead, you must connect the IDE to a different InterSystems IRIS instance (such as the InterSystems IRIS Community Edition), and later import the alert handler into the SAM Manager.

After setting up the IDE, create a class with the following characteristics:

  • The class extends the %SAM.AbstractAlertsHandler class.

  • The class implements the HandleAlerts() class method. Within this method, specify the desired behavior when an alert fires.

When SAM detects a new alert (or multiple new alerts), SAM calls the HandleAlerts() method of all alert handlers. The HandleAlerts() method receives a %DynamicArray packet of alerts with the following format:

[
  {
    "labels":{
      "alertname":"High CPU Usage",
      "cluster":"1",
      "instance":"10.0.0.24:9092",
      "job":"SAM",
      "severity":"critical"
    },
    "annotations":{
      "description":"CPU usage exceeded the 95% threshold."
    },
    "ts": "2020-04-17 18:07:42.536"
  },
  {
    "labels":{
      "alertname":"iris_system_alert",
      "cluster":"1",
      "instance":"10.0.0.24:9092",
      "job":"SAM",
      "severity":"critical"
    },
    "annotations":{
      "description":"Previous system shutdown was abnormal, system forced down or crashed"
    },
    "ts":"2020-04-17 18:07:36.926"
  }
]
Note:

Alerts generated by InterSystems IRIS are all named iris_system_alert.

Below is an example of an alert handler class. This example writes a message to the messages log (or Console Log) whenever an alert fires:

/// An example Alert Handler class, which writes messages to the messages log.
Class User.AlertHandler Extends %SAM.AbstractAlertsHandler
{

ClassMethod HandleAlerts(packet As %DynamicArray) As %Status
{
      set iter = packet.%GetIterator()
      while iter.%GetNext(.idx, .alert) {
            set msg = alert.annotations.description
            if alert.labels.severity = "critical" {set severity = 2} else {set severity = 1}
            do ##class(%SYS.System).WriteToConsoleLog(msg, 1, severity)
         }
      q $$$OK
}

}

Importing the Alert Handler into SAM

After creating the alert handler, the next step is to import it into SAM.

  1. First, export the alert handler in XML format. How to do this depends on the IDE you are using.

  2. Next, log in to the SAM Manager from a web browser, using the following address:

    http://<sam-domain-name>:8080/csp/sys/UtilHome.csp

    where <sam-domain-name> is the DNS name or IP address of the system SAM is running on.

  3. From the SAM Manager, navigate to the Classes page (System Explorer > Classes).

  4. Make sure the SAM namespace is selected, then click Import. This brings up the Import Classes dialog.

  5. In the Import Classes dialog:

    • For The import file resides on, select My Local Machine.

    • For Select the path and name of the import file, click the Choose File button and select the alert handler XML file from your file system.

  6. At the bottom of the dialog, click Next, then Import. A result dialog should appear to tell you the status of your import.

After the import is complete, you have successfully added the alert handler to SAM. From now on, any time SAM detects a new alert, it calls the HandleAlerts() method of your class.

If you ever need to update an alert handler, simply repeat the steps above with the newer version. This replaces the previous version with the new one.

Monitoring with SAM

Once System Alerting and Monitoring is fully set up, you can use it to see real-time metrics and alerts for your InterSystems IRIS instances. The SAM application consists of multiple pages that display this information at different levels of detail.

These pages are:

  • Monitor Cluster Page – the “home page” of SAM, which displays an overview of all clusters.

  • Single Cluster Page – a more focused view, which displays only the information for instances in a single cluster.

  • Single Instance Page – the narrowest and most detailed view, which displays the instance’s details, alerts, and metrics dashboard.

The following sections describe various details of SAM:

Monitor Clusters Page

The Monitor Clusters page displays an overview of all your clusters. You can navigate to the Monitor Clusters page at any time by clicking the System Alerting & Monitoring title at the top of any SAM page.

Each SAM cluster appears as a circle depicting the state of all the cluster’s instances. The Monitor Clusters page also includes an Alerts table, showing the recent alerts from all monitored instances, and provides access to the configuration settings`.

To see detailed information about a specific cluster or instance, simply click on it.

Single Cluster Page

To view details about a cluster, click on the cluster card on the Monitor Clusters page.

The Cluster Page displays an Alerts table, showing the recent alerts from all instances in that cluster. There is also an Instances table with details about the target instances. The Instances table shows the following details:

  • IP:Port – The IP address and Port which specify where a target instance is located. You can click this to “zoom in” to the Instance page.

  • State – The state of the instance, which can be OK, Warning, Critical, or Unresponsive. See the UnderStanding Instance State section below for a description of how SAM determines instance state.

  • Name – The name of the instance.

  • Description – The description of the instance.

Single Instance Page

To see the Instance Page, click on an instance’s IP:Port. The Instance Page contains the following sections:

  • A Details table, which contains the instance’s IP:Port, State, Name, Description, and a link to the Management Portal. For details about how SAM calculates State, see the Understanding Instance State section below.

  • An Alerts table, showing the recent alerts for the current instance.

  • A Dashboard, which shows an overview of the Grafana Dashboard for the instance.

The page also has an Edit Instance button, which allows you to modify some of the instance details, and Delete Instance button, which allows you to remove the instance from SAM.

Note:

If you edit an instance and change its network address, SAM purges all existing alerts tied to that instance. This is because SAM assumes different network address refer to different instance.

Grafana Dashboard

The Dashboard displays several graphs of metrics, providing a snapshot of recent activity on the instance. This section describes the information visible in the dashboard by default

The Dashboard is generated using Grafana, an open-sourced metrics visualization tool. You can click View in Grafana to edit the dashboard. For more information about customizing the dashboard, check out the Grafana documentation (https://grafana.com/docs/guides/getting_started/).

Note:

If you edit the dashboard to display metrics older than two hours, you may want to increase the Prometheus database retention time.

The default dashboard contains the following information:

Dashboard Graph Metric(s) used Description
CPU Utilization
iris_cpu_usage
The CPU usage of the system running the instance for the past 30 minutes.
Glorefs
iris_glo_ref_per_sec
iris_glo_ref_rem_per_sec
The global references to local (blue line) and remote (orange line) databases for the past 30 minutes
Global Updates
iris_glo_update_per_sec
Updates to globals located on local databases per second for the past 30 minutes
IRIS Disk Percent
iris_disk_percent_full
Percent of used space on the storage volume for the IRISSYS database
IRIS Disk Remaining
iris_directory_space
Free space available on the storage volume for the IRISSYS database
Database Reads
iris_phys_reads_per_sec
Physical database block reads from disk per second for the past 30 minutes
IRIS Database Latency
iris_db_latency
Milliseconds to complete a random read from the database for the past 30 minutes
IRIS Pri Jnl Size
iris_jrn_size{id="primary"}
Current size of the primary journal file
Pri Jnl Free
iris_jrn_free_space{id="primary"}
Free space available on the primary journal directory’s storage volume
WIJ Free
iris_jrn_free_space{id="WIJ"}
Free space available on the WIJ journal directory’s storage volume
License Current Pct
iris_license_percent_used
Percent of licenses currently in use
Licenses Available
iris_license_available
Number of licenses currently not in use
System Alerts
iris_system_alerts
The number of alerts posted to the messages log since system startup

Viewing the Alerts Table

Multiple pages in System Alerting and Monitoring include an alerts table. By default, an alerts table displays alerts form the last hour; to view all alerts, select Show All.

Alerts tables contains the following information:

  • Last Reported – The most recent time the alert was reported.

  • Cluster – The cluster containing the instance that generated the alert.

  • IP:Port – The IP address and Port of the instance that generated the alert.

  • Severity – The severity of the alert: either Critical or Warning.

  • Source – The source that generated the alert: either IRIS or Prometheus.

    • An IRIS alert is generated by an InterSystems IRIS instance. The instance’s log monitor scans the messages log and posts notifications with severity 2 or higher to the alerts log, where SAM collects them. For more information, see the Monitoring Guide.

    • A Prometheus alert is generated by SAM according to user-defined alert rules. For more information, see the Defining Cluster Alert Rules section above.

  • Name – The name of the alert.

  • Message – The message associated with the alert.

Understanding Instance Metrics

All InterSystems IRIS instances collect metrics that describe the status and operation of the instance. System Alerting and Monitoring allows you to monitor those metrics over time, and use them to configure alert rules.

For a list of all these metrics, see Metrics Description in the “Monitoring InterSystems IRIS Using REST API” section of the Monitoring Guide that corresponds to your version of InterSystems IRIS. The Create Application Metrics section on the same page describes how to create your own metrics.

Understanding Instance State

Instance state indicates whether an InterSystems IRIS instance has fired any alerts recently. There are four possible values for instance state: OK, Warning, Critical, or Unreachable. A state of OK means there have been no recent alerts. When an instance fires an alert fires, System Alerting and Monitoring elevates that instance’s state to Warning or Critical. Unreachable means that, for some reason, SAM cannot access the instance.

Note:

A state of OK does not necessarily mean there are no problems with an instance. Likewise, you may determine that no action is required for an instance with a Critical state. The instance state reflects the number of recent alerts, but does not provide comprehensive information about the instance.

Instance state is a combination of two factors: the InterSystems IRIS instance’s System Health State (which SAM obtains from the iris_system_state metric), and recent Prometheus alerts generated by the instance. For information about the System Health State, see System Monitor Health State in the “Using System Monitor” chapter of Monitoring Guide. For more information about Prometheus alerts, see the Manage Cluster Alert Rules section above.

System Alerting and Monitoring determines instance state as follows:

  • The state is Critical if either of the following is true:

    • A Prometheus alert with severity Critical fired within the past 30 minutes.

    • The System Health State is 2 or -1.

  • Otherwise, the state is Warning if any of the following are true:

    • A Prometheus alert with severity Critical fired between 30 and 60 minutes ago.

    • A Prometheus alert with severity Warning fired within the past 30 minutes.

    • The System Health State is 1.

  • Finally, the state is OK if:

    • No Prometheus alerts have fired in the past hour.

    • The System Health State is 0.

  • Unreachable means SAM cannot access the instance. See the section below for more information.

Troubleshooting an Unreachable Instance

There are many reasons the state of an instance could become Unreachable. This section provides several potential causes and solutions.

If none of these steps resolve the Unreachable status, contact the InterSystems Worldwide Response Center (WRC) for further troubleshooting help.

Target instance has an IP address in the 172.17.x.x range

System Alerting and Monitoring may not be able to reach an instance with an IP address in the 172.17.x.x range (for example, 172.17.123.123). This is because Docker uses this IP range for its own networks.

You can resolve this issue by changing the Docker IP address range. To do this, specify a different range (e.g. 10.10.x.x) in the Docker daemon using the default-address-pools option.

Target instance is not outputting metrics

The instance you are monitoring with SAM may not be outputting metrics properly. You can check this by using the curl command in the command window, or by viewing the metrics endpoint for the target instance in your web browser at the following URL:

http://<instance-host>:<port>/api/monitor/metrics

If this displays a list of metrics, the instance is outputting metrics properly.

Otherwise, the instance may not be properly configured. In that case, ensure that the instance is on InterSystems IRIS version 2020.1 or higher and that the /api/monitor application allows for unauthenticated access, as described in the Adding an Instance to SAM section.

The SAM database is full

If the SAM database fills up, instances may show up as Unreachable and stop reporting metrics. To check whether this is the case:

  1. Open the SAM Manager from a web browser, using the following address:

    http://<sam-domain-name>:8080/csp/sys/UtilHome.csp
  2. Navigate to the Databases page (System Operation > Databases).

  3. Select Free Space View.

  4. Check the % Free column for the SAM database to see whether the value is 0.

If the database is full, you should free some space by deleting data, as described in the Clearing the SAM Database section. Once you have done so, shut down System Alerting and Monitoring using the stop.sh script, and restart it using start.sh.

To prevent this from happening again, consider lowering the number of days SAM stores data using the Configuration Settings menu.

SAM 1.1 Release Notes

Overview

Version 1.1 of InterSystems System Alerting and Monitoring (SAM) provides performance improvements for the graphs in the Grafana dashboard and the underlying Prometheus queries, especially when displaying metrics over a longer period of time.

Upgrade notes and requirements:

  • You must deploy SAM 1.1 as an upgrade to an existing SAM 1.0 installation.

  • The upgrade requires a new docker-compose.yml file.

    Note:

    The upgrade does not update or overwrite the files in the /config directory created by the SAM 1.0 installation.

  • Performance improvements to SAM use an index that may include up to 50% more data. You must have the space available to accommodate this index.

Performing the Upgrade

To upgrade from SAM 1.0 to SAM 1.1, perform the following steps:

  1. Shut down the existing SAM installation.

  2. Copy version 1.1 of the docker-compose.yml file into the SAM installation directory.

  3. Restart SAM.

Restarting SAM pulls a SAM 1.1 image, and then uses that image to upgrade and run the SAM container.

The new SAM image uses InterSystems IRIS 2021.1.2. To accommodate this, the new docker-compose.yml includes a new ‘iris-init’ service, which runs briefly at startup and then exits.

About Rebuilding the Index

The upgrade creates a new index for any existing data at the initial startup of the new version. Depending on the amount of data, this may take several minutes or longer.

In the SAM messages.log file, there are entries marking the start and completion of creating the index:

[Utility.Event] SAM Manager starting Index rebuild for PrometheusSample class.
...
[Utility.Event] SAM Manager completed Index rebuild for PrometheusSample class.

Note that older data may not be available in the SAM dashboards until this process has completed.

Feedback