Skip to main content

Inspect System Performance Using the SAM Web Portal

Important:

System Alerting and Monitoring (SAM) has been deprecated; the following documentation is provided for existing users only. Customers interested in a comprehensive view of their operational platform can access the metrics APIOpens in a new tab and structured logsOpens in a new tab of InterSystems products within another observability tool. Existing users who would like assistance identifying an alternative solution should contact the WRCOpens in a new tab.

Once you have defined your clusters, instances, and alert rules within System Alerting and Monitoring (SAM) version 1.0 or 1.1, you can use it to see real-time metrics and alerts for your InterSystems IRIS instances. The SAM web portal consists of multiple pages that display this information at different levels of detail.

These pages are:

  • Monitor Cluster Page – the “home page” for the SAM web portal, which displays an overview of all clusters.

  • Single Cluster Page – a more focused view, which displays only the information for instances in a single cluster.

  • Single Instance Page – the narrowest and most detailed view, which displays the instance’s details, alerts, and metrics dashboard.

These pages allow you to:

Monitor Clusters Page

The Monitor Clusters page displays an overview of all your clusters. The Monitor Clusters page is the home page for the SAM web application, and you can return to it at any time by clicking the System Alerting & Monitoring title at the top of any other page in the application.

The Monitor Clusters page displays a cluster card for each cluster in your system. This card features a circle depicting the state of all the instances for the cluster. The Monitor Clusters page also includes an Alerts table, showing the recent alerts from all monitored instances, and provides access to the configuration settings.

To see detailed information about a specific cluster, click on the cluster card or the cluster name in the Alerts table. To see detailed information about a specific instance, click on the IP:Port for the instance in the Alerts table.

Configuration Settings Dialog

The main System Alerting and Monitoring page contains a gear icon, located near the top of the screen. Click this icon to access the Configuration Settings dialog.

From this dialog, you can set the number of days (between 1 and 30) for SAM to store alert and metric data in an IRIS database for long-term analysis.

Note:

The section “Tune and Troubleshoot SAM” describes procedures for changing other SAM settings, including how to increase the retention time for data in the Prometheus database, which can help you improve performance if you are constantly querying data more than two hours old.

Single Cluster Page

To view details about a cluster, click on the cluster card on the Monitor Clusters page.

The single cluster page displays an Alerts table, showing the recent alerts from all instances in that cluster. There is also an Instances table with details about the target instances. The Instances table shows the following details:

  • IP:Port – The IP address and Port which specify where a target instance is located. You can click this to “zoom in” to the Instance page.

  • State – The state of the instance, which can be OK, Warning, Critical, or Unresponsive. See “Understand Instance State” below for a description of how SAM determines instance state.

  • Name – The name of the instance.

  • Description – The description of the instance.

Single Instance Page

To see the page for a single instance, click on the IP:Port for the instance. The single instance page contains the following sections:

  • A Details table, which contains the instance’s IP:Port, State, Name, Description, and a link to the Management Portal. For details about how SAM calculates State, see the “Understand Instance State” section below.

  • An Alerts table, showing the recent alerts for the current instance.

  • A Dashboard, which shows an overview of the Grafana Dashboard for the instance.

The page also has an Edit Instance button, which allows you to modify some of the instance details, and Delete Instance button, which allows you to remove the instance from SAM.

Note:

If you edit an instance and change its network address, SAM purges all existing alerts tied to that instance. This is because SAM assumes different network address refer to different instances.

Grafana Dashboard

The Dashboard displays several graphs of metrics, providing a snapshot of recent activity on the instance. This section describes the information visible in the dashboard by default

The Dashboard is generated using Grafana, an open-sourced metrics visualization tool. You can click View in Grafana to edit the dashboard. For more information about customizing the dashboard, check out the Grafana documentation (https://grafana.com/docs/guides/getting_started/Opens in a new tab).

Note:

If you edit the dashboard to display metrics older than two hours, you may want to increase the Prometheus database retention time.

The default dashboard contains the following information:

Dashboard Graph Metric(s) used Description
CPU Utilization

iris_cpu_usage

The CPU usage of the system running the instance for the past 30 minutes.
Glorefs

iris_glo_ref_per_sec

iris_glo_ref_rem_per_sec

The global references to local (blue line) and remote (orange line) databases for the past 30 minutes
Global Updates

iris_glo_update_per_sec

Updates to globals located on local databases per second for the past 30 minutes
IRIS Disk Percent

iris_disk_percent_full

Percent of used space on the storage volume for the IRISSYS database
IRIS Disk Remaining

iris_directory_space

Free space available on the storage volume for the IRISSYS database
Database Reads

iris_phys_reads_per_sec

Physical database block reads from disk per second for the past 30 minutes
IRIS Database Latency

iris_db_latency

Milliseconds to complete a random read from the database for the past 30 minutes
Total Pri Jnl Size

iris_jrn_size{id="primary"}

Current size of the primary journal file
Pri Jnl Free

iris_jrn_free_space{id="primary"}

Free space available on the primary journal directory’s storage volume
WIJ Free

iris_jrn_free_space{id="WIJ"}

Free space available on the WIJ journal directory’s storage volume
License Current Pct

iris_license_percent_used

Percent of licenses currently in use
Licenses Available

iris_license_available

Number of licenses currently not in use
System Alerts

iris_system_alerts

The number of alerts posted to the messages log since system startup

Review the Alerts Table

Each page SAM provides for inspecting your system (the Monitor Clusters page, the Cluster page, and the Instance page) provides an Alerts table listing alerts recorded for instances within the scope of that page. By default, this table displays alerts from the last hour; to view all alerts, select Show All.

An Alerts table contains the following information:

  • Last Reported – The most recent time the alert was reported.

  • Cluster – The cluster containing the instance that generated the alert.

  • IP:Port – The IP address and Port of the instance that generated the alert.

  • Severity – The severity of the alert: either Critical or Warning.

  • Source – The source that generated the alert: either IRIS or Prometheus.

    • An IRIS alert is generated by an InterSystems IRIS instance. The instance’s log monitor scans the messages log and posts notifications with severity 2 or higher to the alerts log, where SAM collects them. For more information, see the Monitoring GuideOpens in a new tab.

    • A Prometheus alert is generated by SAM according to user-defined alert rules. For more information, refer to the previous section on alerts.

  • Name – The name of the alert.

  • Message – The message associated with the alert.

Inspect Instance Metrics

All InterSystems IRIS instances collect metrics that describe the status and operation of the instance. System Alerting and Monitoring allows you to monitor those metrics over time, and use them to configure alert rules.

For a list of all these metrics, see Metrics DescriptionOpens in a new tab in the “Monitoring InterSystems IRIS Using REST API” section of the Monitoring Guide that corresponds to your version of InterSystems IRIS. The Create Application MetricsOpens in a new tab section on the same page describes how to create your own metrics.

Understand Instance State

Instance state indicates whether an InterSystems IRIS instance has fired any alerts recently. There are four possible values for instance state: OK, Warning, Critical, or Unreachable. A state of OK means there have been no recent alerts. When an alert fires for an instance, System Alerting and Monitoring elevates that instance’s state to Warning or Critical. Unreachable means that, for some reason, SAM cannot access the instance.

Note:

A state of OK does not necessarily mean there are no problems with an instance. Likewise, you may determine that no action is required for an instance with a Critical state. The instance state reflects the number of recent alerts, but does not provide comprehensive information about the instance.

Instance state is a combination of two factors: the InterSystems IRIS instance’s System Health State (which SAM obtains from the iris_system_state metric), and recent Prometheus alerts generated by the instance. For information about the System Health State, see System Monitor Health StateOpens in a new tab in the “Using System Monitor” chapter of the Monitoring Guide. For more information about Prometheus alerts, see the Receive Alerts section above.

System Alerting and Monitoring determines instance state as follows:

  • The state is Critical if either of the following is true:

    • A Prometheus alert with severity Critical fired within the past 30 minutes.

    • The System Monitor Health State is 2 or -1.

  • Otherwise, the state is Warning if any of the following are true:

    • A Prometheus alert with severity Critical fired between 30 and 60 minutes ago.

    • A Prometheus alert with severity Warning fired within the past 30 minutes.

    • The System Monitor Health State is 1.

  • Finally, the state is OK if:

    • No Prometheus alerts have fired in the past hour.

    • The System Monitor Health State is 0.

  • Unreachable means SAM cannot access the instance. See the section below for guidance troubleshooting an unreachable instance.

FeedbackOpens in a new tab