Skip to main content
************* PRERELEASE CONTENT *************

System Alerting and Monitoring Guide

System Alerting and Monitoring (SAM) is a cluster monitoring solution for InterSystems IRIS® data platform version 2020.1 and later. Whether your application runs on a local mirrored pair or in the cloud with multiple application and data servers, you can use SAM to monitor your application.

Group your InterSystems IRIS instances into one or more clusters, then observe a high-level summary of your application performance. Use the SAM web portal to view real-time performance metrics for your clusters and instances. Keep an eye out for the alerts SAM sends when a metric crosses a user-defined threshold. SAM packages all your monitoring needs behind an easy-to-use web interface.

The core SAM application is built on an InterSystems IRIS instance called the SAM Manager. Using cloud-native software, the SAM Manager collects and stores performance data and alerts from the InterSystems IRIS instances you care about.

Learn more about SAM from the following topics:

Note:

You can also interface with SAM using its REST API. For details, see the System Alerting and Monitoring API Reference.

Deploying SAM

This section covers the following topics:

SAM Component Breakdown

SAM is made up of multiple open-source technologies, augmenting their features with enterprise resiliency. The SAM application consists of the following containers:

  • Alertmanager v0.20.0

  • Grafana v6.7.1

  • Nginx 1.17.9

  • Prometheus v2.17.1

  • SAM Manager 1.0.0.115

Each container performs a different role in the SAM application. Prometheus, an efficient cloud-native monitoring tool, collects time series data at a regular interval from all your target InterSystems IRIS instances. The SAM Manager stores these metrics, enabling high availability and scalability features not present in default Prometheus databases. Grafana, a world-class metrics visualization tool, presents these metrics in graphs that make it easy to examine the state of your application. The Alertmanager aggregates InterSystems IRIS alerts, which are pre-configured on the target instances, and Prometheus alerts, which you configure from the SAM web application.

These containers communicate over the Nginx web server, which is set to port 8080 by default. When trying to access any component of SAM, such as Grafana or the SAM Manager, do so using the Nginx port. Nginx also serves the SAM web application, which provides a graphical interface for configuring SAM and monitoring your instances.

Docker Compose makes it possible to run all these containers simultaneously. When you run SAM, Docker Compose starts each of these containers, which are listed in the SAM docker-compose.yml file.

Note:

For more information about the benefits of containerized applications, see Why Containers? in Running InterSystems Products in Containers.

First-Time Preparations

The first time you deploy SAM, perform the following steps to prepare your machine:

Ensure Docker Compose is installed

For instructions, see “Install Docker Compose” in the Docker documentation (https://docs.docker.com/compose/install/). The following versions are required:

  • Docker Engine version 19.03.098 or higher

  • Docker Compose version 1.25 or higher

Acquire SAM distribution files

InterSystems provides several files that define the container configuration necessary for SAM. These files include the following:

config/ A directory that contains settings for the SAM application.
docker-compose.yml A file that defines the SAM Components, which Docker Compose uses to deploy SAM.
readme.txt A brief text document for getting started.
start.sh, stop.sh Scripts to facilitate starting and stopping SAM.

You can obtain these files from either:

Unzip the distribution tarball

If you obtain the distribution files as a tarball, use the following command to uncompress it while preserving permissions:

tar zpxvf sam-1.0-unix.tar.gz
Copy code to clipboard
Configure your firewall for SAM

By default, SAM deploys on port 8080 of the host system. On Linux machines, you can check whether the port 8080 is available by using the netcast command:

$ nc -zv localhost 8080
Connection to localhost 8080 port [tcp/http-alt] succeeded!
Copy code to clipboard
Change the default SAM port

If necessary, you can change the host port mapping in the nginx section of the docker-compose.yml file. To do so:

  1. Open docker-compose.yml in a text editor.

  2. Locate the nginx service in the docker-compose.yml file.

  3. In the ports section, enter the desired port on your host machine. For example, if you would like to access SAM on port 9999, edit the section to look like:

    [...]
    ports:
       - 9999:8080
    [...]
    Copy code to clipboard

For more information, see the “ports” section of the Docker Compose File Reference (https://docs.docker.com/compose/compose-file/#ports).

Starting and Stopping SAM

InterSystems provides two scripts that make it easy start or stop SAM.

To start SAM:

  1. Using the cd command in the command line, navigate to the directory containing the SAM docker-compose.yml file, which was acquired during initial setup.

  2. Next, run the start.sh script:

    ./start.sh
    Copy code to clipboard

    This runs a Docker Compose command to start the SAM application.

  3. Optionally, you can use the docker ps command to confirm that all the containers are running. The output should look similar to the following:

    $ docker ps
    CONTAINER ID   IMAGE                       COMMAND                  CREATED             STATUS                       PORTS                                                  NAMES
    2aaa06f06a9c   nginx:1.17.9-alpine         "nginx -g 'daemon of..." About an hour ago   Up About an hour             80/tcp, 0.0.0.0:8080->8080/tcp                         sam_nginx_1
    0e2b30fcb376   grafana/grafana:6.7.1       "/run.sh"                About an hour ago   Up About an hour             3000/tcp                                               sam_grafana_1
    d2c825f9d220   prom/alertmanager:v0.20.0   "/bin/alertmanager -..." About an hour ago   Up About an hour             9093/tcp                                               sam_alertmanager_1 
    4851893bc369   prom/prometheus:v2.17.1     "/bin/prometheus --w..." About an hour ago   Up About an hour             9090/tcp                                               sam_prometheus_1
    61120be391df   intersystems/sam:1.0.0.83   "/iris-main"             About an hour ago   Up About an hour (healthy)   2188/tcp, 51773/tcp, 52773/tcp, 53773/tcp, 54773/tcp   sam_iris_1
    Copy code to clipboard

Once SAM is up and running, you can access it from a web browser or using the SAM API.

To stop SAM:

  1. Using the cd command in the command line, navigate to the directory containing the SAM docker-compose.yml file.

  2. Next, run the stop.sh script:

    ./stop.sh
    Copy code to clipboard

    This runs a Docker Compose command to stop the SAM application.

  3. Optionally, you can use the docker ps command to confirm that all the containers have stopped. Use the -a flag to view all containers, even those that are not running:

    docker ps -a
    Copy code to clipboard

Accessing SAM from a Web Browser

When SAM is running, you can access it from a web browser at the following address:

http://<sam-domain-name>:<port>/api/sam/app/index.csp
Copy code to clipboard

where <sam-domain-name> is the DNS name or IP address of the system SAM is running on, and <port> is the configured Nginx port (8080 by default). You may want to bookmark this address.

When accessing SAM, you must log in using a valid User Name and Password. Like InterSystems IRIS, SAM includes several predefined accounts with the default password SYS. Choose any of these accounts with login permissions (such as Admin or SuperUser) and log in using the default password SYS.

The first time you sign in with one of the predefined accounts, SAM prompts you to enter a new password. To secure the SAM application, be sure to set a new password for all the predefined accounts. For a list of all the predefined accounts, see Predefined User Accounts in the “Users” chapter of the Security Administration Guide.

Setting Up SAM

Once SAM is deployed, you can specify the InterSystems IRIS instances you want to monitor, which must be grouped into SAM clusters. You can also perform additional setup actions in order to maximize the utility and performance of SAM.

These actions can be performed as necessary to establish your desired SAM configuration:

Creating a New Cluster

Within SAM, you must group InterSystems IRIS instances into clusters, which are unique sets of instances. Once you have created a cluster, you can view the alerts and statuses of all instances within the cluster.

To create a new cluster:

  1. Navigate to the main SAM page. Clicking System Alerting & Monitoring from anywhere within the SAM application navigates to the main page.

  2. Open the Add New Cluster dialog. To do this, click the + New Cluster button (if this is the first cluster, click Create Your First Cluster instead).

  3. Fill in the following information about your cluster:

    • Cluster name — The name can be any combination of numbers and letters. Cluster names must be unique.

    • Description (optional)

  4. Click Add Cluster to create the cluster.

After creating a cluster, SAM immediately displays the Edit Cluster dialog, allowing you to continue to define the cluster.

Adding an Instance to SAM

SAM can collect metrics and alerts from any InterSystems IRIS instance version 2020.1 or higher. To add an instance to SAM, first prepare the instance, and then add it to a SAM cluster.

Preparing the Instance to be Monitored

InterSystems IRIS instances version 2020.1 or higher contain the /api/monitor web application, which allows an instance to be monitored by SAM. In order for SAM to collect metrics and alerts from the /api/monitor endpoint, that endpoint must allow for unauthenticated access.

To make sure unauthenticated access is allowed:

  1. Open the Management Portal of the InterSystems IRIS instance you would like to add to SAM.

  2. Go to the Web Applications page (System Administration > Security > Applications > Web Applications).

  3. Select /api/monitor to open the Edit Web Application page.

  4. In the Security Settings section, select Unauthenticated.

For more information about the /api/monitor web application, see Monitoring InterSystems IRIS Using REST API in Monitoring Guide.

Adding the Instance to a Cluster

To add an InterSystems IRIS instance to a SAM cluster, do the following:

  1. Navigate to the main SAM page. Clicking System Alerting & Monitoring from anywhere within the SAM application navigates to the main page.

  2. Select the cluster to which you would like to add the instance. If there are no clusters, you must create one.

  3. Click Edit Cluster to open the Edit Cluster dialog.

  4. Click the +New button at the top to the Instances table.

    Note:

    You may select an existing instance from this table to edit or delete it.

  5. Fill in the following fields:

    • IP – The fully qualified domain name or IP address of the machine hosting the target InterSystems IRIS instance.

      InterSystems recommends using domain names whenever possible, as IP addresses may change.

      Note:

      If the instance you are monitoring is located on the same system as SAM, you may enter host.docker.internal in this field.

    • Port – The web server port of the target InterSystems IRIS instance.

    • Cluster – The cluster to add the target instance to. When first adding an instance, this defaults to the current cluster.

      Note:

      Moving an instance between clusters may produce temporary irregularities in the reported alerts, dashboard, and state for that instance. These irregularities should resolve within a few hours.

    • Instance name and Description – Optional text descriptors to help you identify the instance.

  6. Click Add Instance to begin monitoring the instance with SAM.

Adjusting Configuration Settings

The main SAM page contains a gear icon, located near the top of the screen. Click this icon to access the Configuration Settings dialog.

From this dialog, you can set the number of days (between 1 and 30) for SAM to store alert and metric data.

Defining Cluster Alert Rules

SAM automatically collects InterSystems IRIS alerts from the instances it monitors. If you want to specify additional events that generate alerts, you can do so by defining Prometheus alert rules.

An alert displays information about the instance that generated it; the time the alert fired; and the alert name, message, and severity. A Prometheus alert rule indicates when SAM should fire an alert.

Alert rules are defined on a cluster level, but evaluated distinctly for each instance within the cluster. This means instances within a cluster share the same alert rules, but generate alerts individually.

To create a new alert rule for a cluster, do the following:

  1. Navigate to the main SAM page. Clicking System Alerting & Monitoring from anywhere within the SAM application navigates to the main page.

  2. Select the cluster for which you would like to create an alert rule. If there are no clusters, you must create one.

  3. Click Edit Cluster to open the Edit Cluster dialog.

  4. Click the +New button at the top to the Alert Rules table.

    Note:

    You may select an existing alert rule from this table to edit or delete it.

  5. Fill in the following fields:

    • Alert rule name – Any name for the alert rule. It is often useful to include the metric the rule uses in the name.

    • Alert severity – Either Critical or Warning. The severity of the alert determines the impact it will have on the instance state; see Understanding Instance State for more details.

      Note:

      You can give multiple alert rules the same name, but different severities. If both rules fire at the same time, SAM suppresses the rule with lower severity. This behavior reduces duplicate alerts firing for the same event.

    • Alert expression – An expression that defines when the alert fires, written in Prometheus Query Language.

      The Alert Expression Syntax section below contains an overview of the Prometheus Query Language syntax and several examples.

    • Alert message – A text description of the alert rule, which SAM displays when the alert fires.

      Note:

      The Alert message supports the $value variable, which contains the evaluated value of an alert expression. The syntax is:

      {{ $value }}
      Copy code to clipboard

      The $value variable only holds one value; as such, you should not use it for alert rules that evaluate to multiple values (such as a rule that uses the and operator).

  6. Click Add Alert Rule. SAM validates the alert expression and then adds the alert rule to the cluster.

Below is an example of an alert rule:

When adding a New Alert Rule, every field is required.

Alert Expression Syntax

To write an alert expression, you must use Prometheus Query Language (PromQL). This section provides an overview of how to write alert expressions and some examples.

Note:

If you want to learn how to write advanced alert expressions, read about the full capabilities of PromQL on the “Querying Prometheus” page in the Prometheus Documentation (https://prometheus.io/docs/prometheus/latest/querying/basics/).

A simple alert expression compares a metric to a value. For example:

# Greater than 80 percent of InterSystems IRIS licenses are in use:
iris_license_percent_used{cluster="production"}>80

# There are less than 5 active InterSystems IRIS processes:
iris_process_count{cluster="test"}<5

# The disk storing the MYDATA database is over 75% full:
iris_disk_percent_full{cluster="test",id="MYDATA"}>75

# Same as above, but specifying directory instead of database name:
iris_disk_percent_full{cluster="production",dir="/IRIS/mgr/MYDATA"}>75
Copy code to clipboard

The basic format for an alert expression based on a single metric is:

metric_name{cluster="cluster_name",label(s)}>value
Copy code to clipboard
metric_name The metric that the alert rule uses.
See Metric Descriptions in the “Monitoring InterSystems IRIS Using REST API” section of Monitoring Guide for a table of all the default metrics you can use.
cluster_name The cluster to which the alert rule applies.
SAM applies the alert rule to all instances in the specified cluster. If any instance in the cluster triggers the rule, SAM generates an alert for that instance.
additional_labels If a metric contains labels, you may include these after the cluster label. Multiple labels are separated by commas.
All metrics must include the cluster label, described above.
operator The following comparison operators are available:
  • > (greater than) or >= (greater than or equal to)
  • < (less than) or <= (less than or equal to)
  • == (equal to) or != (not equal to)
value The value can be positive or negative, and may include a decimal component.

Alert Examples

Below are several examples of alert expressions that demonstrate some of the capabilities of PromQL.

Example 1: Basic Alert Rule

The simplest alert expressions directly compares a single metric to a value.

The following alert expression evaluates iris_cpu_usage, which measures the total percent of CPU in use on the machine running InterSystems IRIS. If the value of iris_cpu_usage exceeds 90 for any InterSystems IRIS instance in the test cluster, the alert fires.

iris_cpu_usage{cluster="test"}>90
Copy code to clipboard
Example 2: Arithmetic Operators

PromQL supports the following arithmetic operators, ordered by precedence:

  1. ^ (exponentiation)

  2. * (multiplication), / (division), % (modulo)

  3. + (addition), - (subtraction)

Arithmetic operators are particularly useful when writing an alert expression that contains two or more metrics.

The following expression is triggered when the USER database in the test cluster is greater than 90 percent full. The expression calculates the percent by dividing the database size (iris_db_size_mb) by the database maximum size (iris_db_max_size_mb).

(iris_db_size_mb{cluster="test",id="USER"}/iris_db_max_size_mb{cluster="test",id="USER"})*100>90
Copy code to clipboard
Example 3: Logical OR Operator

PromQL supports logical operators for writing more complex rules. When using the or operator, the expression evaluates two conditions and fires if either is true.

One use for the or operator is to check whether a metric falls outside of a certain range. The following alert expression is triggered when either of the following conditions is true:

  • There are greater than 20 active ECP connections in the production cluster.

  • There is less than one active ECP connection in the production cluster.

iris_ecp_conn{cluster="production"}<1 or iris_ecp_conn{cluster="production"}>20
Copy code to clipboard
Example 4: Logical AND Operator

PromQL also supports the and operator. When using the and operator, the expression evaluates two conditions and fires if both are true.

The following example shows an alert rule that fires when both conditions are true:

  • There are unread alerts in the test cluster.

  • The system health state of an instance in the test cluster is something other than 0.

iris_system_alerts_new{cluster="test"}>=1 and iris_system_monitor_health_state{cluster="test"}!=0
Copy code to clipboard

Tuning the SAM Manager

The SAM Manager is the InterSystems IRIS instance that powers the SAM application. You can open the SAM Manager from a web browser using the following address:

http://<sam-domain-name>:<port>/csp/sys/UtilHome.csp
Copy code to clipboard

where <sam-domain-name> is the DNS name or IP address of the system SAM is running on, and <port> is the configured Nginx port (8080 by default).

Important:

The SAM Manager should not be used to develop or run any application; it is strictly for use by SAM. This section describes the appropriate uses and interactions with the SAM Manager.

For a general purpose InterSystems IRIS instance, see the InterSystems IRIS community edition.

You can do the following actions with the SAM Manager:

Adjusting Startup Settings

The SAM Manager initially allocates memory on startup as follows:

  • 2,000 8KB blocks for the database cache

  • 300 8KB blocks for the routines cache

This allocation should be sufficient when monitoring a modest number (30 or fewer) of InterSystems IRIS instances. If you are monitoring a large number of instances, or find that the SAM Manager is regularly using the full amount of allocated memory, you can increase these limits.

For details on adjusting these settings, see the Allocating Memory to the Database and Routine Caches topic in the System Administration Guide.

Clearing the SAM Database

The SAM Community Edition has a maximum database size limit of 10GB. If this limit is met, SAM may exhibit unexpected behavior, and it becomes necessary to clear the database.

In the SQL page of the SAM Manager (System Explorer > SQL), enter the following command to delete all SAM metric data:

DELETE FROM %SAM.PrometheusSample
Copy code to clipboard

To prevent the SAM database from filling up again, consider lowering the number of days that SAM stores metrics. This option is available in the configuration settings.

Monitoring the SAM Manager

It is possible to use SAM to monitor the SAM Manager, as the SAM Manager is itself an InterSystems IRIS instance. This allows you to keep track of whether the SAM Database is at risk of filling up, and make sure the configured cache sizes are sufficient for SAM operations.

Monitoring the SAM Manager is similar to monitoring any other instance, as described in the Adding Instances to a Cluster section, with the following difference:

For the IP and Port fields, specify the fully qualified DNS name and port (8080 by default) where SAM runs. You can see these values in the address bar of your browser when accessing SAM. For example, if the URL for SAM is:

http://<sam-domain-name>:<port>/api/sam/app/index.csp
Copy code to clipboard

Specify <sam-domain-name> in the IP field, and <port> in the Port field.

Note:

It does not work to specify localhost in the IP field; you must enter a fully qualified DNS name.

Using SAM

Once SAM is fully set up, you can use it to see real-time metrics and alerts for your InterSystems IRIS instances. The SAM application consists of multiple pages that display this information at different levels of detail.

These pages are:

  • Monitor Cluster Page – the “home page” of SAM, which displays an overview of all clusters.

  • Single Cluster Page – a more focused view, which displays only the information for instances in a single cluster.

  • Single Instance Page – the narrowest and most detailed view, which displays the instance’s details, alerts, and metrics dashboard.

The following sections describe various details of SAM:

Monitor Clusters Page

The Monitor Clusters page displays an overview of all your clusters. You can navigate to the Monitor Clusters page at any time by clicking the System Alerting & Monitoring title at the top of any SAM page.

Each SAM cluster appears as a circle depicting the state of all the cluster’s instances. The Monitor Clusters page also includes an Alerts table, showing the recent alerts from all monitored instances, and provides access to the configuration settings`.

To see detailed information about a specific cluster or instance, simply click on it.

Single Cluster Page

To view details about a cluster, click on the cluster card on the Monitor Clusters page.

The Cluster Page displays an Alerts table, showing the recent alerts from all instances in that cluster. There is also an Instances table with details about the target instances. The Instances table shows the following details:

  • IP:Port – The IP address and Port which specify where a target instance is located. You can click this to “zoom in” to the Instance page.

  • State – The state of the instance, which can be OK, Warning, Critical, or Unresponsive. See the UnderStanding Instance State section below for a description of how SAM determines instance state.

  • Name – The name of the instance.

  • Description – The description of the instance.

Single Instance Page

To see the Instance Page, click on an instance’s IP:Port. The Instance Page contains the following sections:

  • A Details table, which contains the instance’s IP:Port, State, Name, Description, and a link to the Management Portal. For details about how SAM calculates State, see the Understanding Instance State section below.

  • An Alerts table, showing the recent alerts for the current instance.

  • A Dashboard, which shows an overview of the Grafana Dashboard for the instance.

The page also has an Edit Instance button, which allows you to modify some of the instance details, and Delete Instance button, which allows you to remove the instance from SAM.

Note:

If you edit an instance and change its network address, SAM purges all existing alerts tied to that instance. This is because SAM assumes different network address refer to different instance.

Grafana Dashboard

The Dashboard displays several graphs of metrics, providing a snapshot of recent activity on the instance. This section describes the information visible in the dashboard by default

The Dashboard is generated using Grafana, an open-sourced metrics visualization tool. You can click View in Grafana to edit the dashboard. For more information about customizing the dashboard, check out the Grafana documentation (https://grafana.com/docs/guides/getting_started/).

The default dashboard contains the following information:

Dashboard Graph Metric(s) used Description
CPU Utilization
iris_cpu_usage
The CPU usage of the system running the instance for the past 30 minutes.
Glorefs
iris_glo_ref_per_sec
iris_glo_ref_rem_per_sec
The global references to local (blue line) and remote (orange line) databases for the past 30 minutes
Global Updates
iris_glo_update_per_sec
Updates to globals located on local databases per second for the past 30 minutes
IRIS Disk Percent
iris_disk_percent_full
Percent of used space on the storage volume for the IRISSYS database
IRIS Disk Remaining
iris_directory_space
Free space available on the storage volume for the IRISSYS database
Database Reads
iris_phys_reads_per_sec
Physical database block reads from disk per second for the past 30 minutes
IRIS Database Latency
iris_db_latency
Milliseconds to complete a random read from the database for the past 30 minutes
IRIS Pri Jnl Size
iris_jrn_size{id="primary"}
Current size of the primary journal file
Pri Jnl Free
iris_jrn_free_space{id="primary"}
Free space available on the primary journal directory’s storage volume
WIJ Free
iris_jrn_free_space{id="WIJ"}
Free space available on the WIJ journal directory’s storage volume
License Current Pct
iris_license_percent_used
Percent of licenses currently in use
Licenses Available
iris_license_available
Number of licenses currently not in use
System Alerts
iris_system_alerts
The number of alerts posted to the messages log since system startup

Viewing the Alerts Table

Multiple pages in SAM include an alerts table. By default, an alerts table displays alerts form the last hour; to view all alerts, select Show All.

Alerts tables contains the following information:

  • Last Reported – The most recent time the alert was reported.

  • Cluster – The cluster containing the instance that generated the alert.

  • IP:Port – The IP address and Port of the instance that generated the alert.

  • Severity – The severity of the alert: either Critical or Warning.

  • Source – The source that generated the alert: either IRIS or Prometheus.

    • An IRIS alert is generated by an InterSystems IRIS instance. The instance’s log monitor scans the messages log and posts notifications with severity 2 or higher to the alerts log, where SAM collects them. For more information, see the Monitoring Guide.

    • A Prometheus alert is generated by SAM according to user-defined alert rules. For more information, see the Defining Cluster Alert Rules section above.

  • Name – The name of the alert.

  • Message – The message associated with the alert.

Understanding Instance Metrics

All InterSystems IRIS instances collect metrics that describe the status and operation of the instance. SAM allows you to monitor those metrics over time, and use them to configure alert rules.

For a list of all these metrics, see Metrics Description in the “Monitoring InterSystems IRIS Using REST API” section of the Monitoring Guide that corresponds to your version of InterSystems IRIS. The Create Application Metrics section on the same page describes how to create your own metrics.

Understanding Instance State

Instance state indicates whether an InterSystems IRIS instance has fired any alerts recently. There are four possible values for instance state: OK, Warning, Critical, or Unreachable. A state of OK means there have been no recent alerts. When an instance fires an alert fires, SAM elevates that instance’s state to Warning or Critical. Unreachable means that, for some reason, SAM cannot access the instance.

Note:

A state of OK does not necessarily mean there are no problems with an instance. Likewise, you may determine that no action is required for an instance with a Critical state. The instance state reflects the number of recent alerts, but does not provide comprehensive information about the instance.

Instance state is a combination of two factors: the InterSystems IRIS instance’s System Health State (which SAM obtains from the iris_system_state metric), and recent Prometheus alerts generated by the instance. For information about the System Health State, see System Monitor Health State in the “Using System Monitor” chapter of Monitoring Guide. For more information about Prometheus alerts, see the Manage Cluster Alert Rules section above.

SAM determines instance state as follows:

  • The state is Critical if either of the following is true:

    • A Prometheus alert with severity Critical fired within the past 30 minutes.

    • The System Health State is 2 or -1.

  • Otherwise, the state is Warning if any of the following are true:

    • A Prometheus alert with severity Critical fired between 30 and 60 minutes ago.

    • A Prometheus alert with severity Warning fired within the past 30 minutes.

    • The System Health State is 1.

  • Finally, the state is OK if:

    • No Prometheus alerts have fired in the past hour.

    • The System Health State is 0.

  • Unreachable means SAM cannot access the instance. See the section below for more information.

Troubleshooting an Unreachable Instance

There are many reasons the state of an instance could become Unreachable. This section provides several potential causes and solutions.

If none of these steps resolve the Unreachable status, contact the InterSystems Worldwide Response Center (WRC) for further troubleshooting help.

Target instance not outputting metrics

The instance you are monitoring with SAM may not be outputting metrics properly. You can check this by using the curl command in the command window, or by viewing the metrics endpoint for the target instance in your web browser at the following URL:

http://<instance-host>:<port>/api/monitor/metrics
Copy code to clipboard

If this displays a list of metrics, the instance outputting metrics properly.

Otherwise, the instance may not be properly configured. In that case, ensure that the instance is on InterSystems IRIS version 2020.1 or higher and that the /api/monitor application allows for unauthenticated access, as described in the Adding an Instance to SAM section.

The SAM database is full

If the SAM database fills up, instances may show up as Unreachable and stop reporting metrics. To check whether this is the case:

  1. Open the SAM Manager from a web browser, using the following address:

    http://<sam-domain-name>:8080/csp/sys/UtilHome.csp
    Copy code to clipboard
  2. Navigate to the Databases page (System Operation > Databases).

  3. Select Free Space View.

  4. Check the % Free column for the SAM database to see whether the value is 0.

If the database is full, you should free some space by deleting data, as described in the Clearing the SAM Database section. Once you have done so, shut down SAM using the stop.sh script, and restart it using start.sh.

To prevent this from happening again, consider lowering the number of days SAM stores data from the Configuration Settings menu.