Inspect System Performance Using the SAM Web Portal
Once you have defined your clusters, instances, and alert rules within System Alerting and Monitoring (SAM) version 1.0 or 1.1, you can use it to see real-time metrics and alerts for your InterSystems IRIS instances. The SAM web portal consists of multiple pages that display this information at different levels of detail.
These pages are:
Monitor Cluster Page – the “home page” for the SAM web portal, which displays an overview of all clusters.
Single Cluster Page – a more focused view, which displays only the information for instances in a single cluster.
Single Instance Page – the narrowest and most detailed view, which displays the instance’s details, alerts, and metrics dashboard.
These pages allow you to:
Monitor Clusters Page
The Monitor Clusters page displays an overview of all your clusters. The Monitor Clusters page is the home page for the SAM web application, and you can return to it at any time by clicking the System Alerting & Monitoring title at the top of any other page in the application.
The Monitor Clusters page displays a cluster card for each cluster in your system. This card features a circle depicting the state of all the instances for the cluster. The Monitor Clusters page also includes an Alerts table, showing the recent alerts from all monitored instances, and provides access to the configuration settings.
To see detailed information about a specific cluster, click on the cluster card or the cluster name in the Alerts table. To see detailed information about a specific instance, click on the IP:Port for the instance in the Alerts table.
Configuration Settings Dialog
The main System Alerting and Monitoring page contains a gear icon, located near the top of the screen. Click this icon to access the Configuration Settings dialog.
From this dialog, you can set the number of days (between 1 and 30) for SAM to store alert and metric data in an IRIS database for long-term analysis.
Single Cluster Page
To view details about a cluster, click on the cluster card on the Monitor Clusters page.
The single cluster page displays an Alerts table, showing the recent alerts from all instances in that cluster. There is also an Instances table with details about the target instances. The Instances table shows the following details:
IP:Port – The IP address and Port which specify where a target instance is located. You can click this to “zoom in” to the Instance page.
State – The state of the instance, which can be OK, Warning, Critical, or Unresponsive. See “Understand Instance State” below for a description of how SAM determines instance state.
Name – The name of the instance.
Description – The description of the instance.
Single Instance Page
To see the page for a single instance, click on the IP:Port for the instance. The single instance page contains the following sections:
A Details table, which contains the instance’s IP:Port, State, Name, Description, and a link to the Management Portal. For details about how SAM calculates State, see the “Understand Instance State” section below.
An Alerts table, showing the recent alerts for the current instance.
A Dashboard, which shows an overview of the Grafana Dashboard for the instance.
The page also has an Edit Instance button, which allows you to modify some of the instance details, and Delete Instance button, which allows you to remove the instance from SAM.
If you edit an instance and change its network address, SAM purges all existing alerts tied to that instance. This is because SAM assumes different network address refer to different instances.
The Dashboard displays several graphs of metrics, providing a snapshot of recent activity on the instance. This section describes the information visible in the dashboard by default
The Dashboard is generated using Grafana, an open-sourced metrics visualization tool. You can click View in Grafana to edit the dashboard. For more information about customizing the dashboard, check out the Grafana documentation (https://grafana.com/docs/guides/getting_started/Opens in a new tab).
If you edit the dashboard to display metrics older than two hours, you may want to increase the Prometheus database retention time.
The default dashboard contains the following information:
|Dashboard Graph||Metric(s) used||Description|
|The CPU usage of the system running the instance for the past 30 minutes.|
|The global references to local (blue line) and remote (orange line) databases for the past 30 minutes|
|Updates to globals located on local databases per second for the past 30 minutes|
|IRIS Disk Percent||
|Percent of used space on the storage volume for the IRISSYS database|
|IRIS Disk Remaining||
|Free space available on the storage volume for the IRISSYS database|
|Physical database block reads from disk per second for the past 30 minutes|
|IRIS Database Latency||
|Milliseconds to complete a random read from the database for the past 30 minutes|
|Total Pri Jnl Size||
|Current size of the primary journal file|
|Pri Jnl Free||
|Free space available on the primary journal directory’s storage volume|
|Free space available on the WIJ journal directory’s storage volume|
|License Current Pct||
|Percent of licenses currently in use|
|Number of licenses currently not in use|
|The number of alerts posted to the messages log since system startup|
Review the Alerts Table
Each page SAM provides for inspecting your system (the Monitor Clusters page, the Cluster page, and the Instance page) provides an Alerts table listing alerts recorded for instances within the scope of that page. By default, this table displays alerts from the last hour; to view all alerts, select Show All.
An Alerts table contains the following information:
Last Reported – The most recent time the alert was reported.
Cluster – The cluster containing the instance that generated the alert.
IP:Port – The IP address and Port of the instance that generated the alert.
Severity – The severity of the alert: either Critical or Warning.
Source – The source that generated the alert: either IRIS or Prometheus.
An IRIS alert is generated by an InterSystems IRIS instance. The instance’s log monitor scans the messages log and posts notifications with severity 2 or higher to the alerts log, where SAM collects them. For more information, see the Monitoring GuideOpens in a new tab.
A Prometheus alert is generated by SAM according to user-defined alert rules. For more information, refer to the previous section on alerts.
Name – The name of the alert.
Message – The message associated with the alert.
Inspect Instance Metrics
All InterSystems IRIS instances collect metrics that describe the status and operation of the instance. System Alerting and Monitoring allows you to monitor those metrics over time, and use them to configure alert rules.
For a list of all these metrics, see Metrics DescriptionOpens in a new tab in the “Monitoring InterSystems IRIS Using REST API” section of the Monitoring Guide that corresponds to your version of InterSystems IRIS. The Create Application MetricsOpens in a new tab section on the same page describes how to create your own metrics.
Understand Instance State
Instance state indicates whether an InterSystems IRIS instance has fired any alerts recently. There are four possible values for instance state: OK, Warning, Critical, or Unreachable. A state of OK means there have been no recent alerts. When an alert fires for an instance, System Alerting and Monitoring elevates that instance’s state to Warning or Critical. Unreachable means that, for some reason, SAM cannot access the instance.
A state of OK does not necessarily mean there are no problems with an instance. Likewise, you may determine that no action is required for an instance with a Critical state. The instance state reflects the number of recent alerts, but does not provide comprehensive information about the instance.
Instance state is a combination of two factors: the InterSystems IRIS instance’s System Health State (which SAM obtains from the iris_system_state metric), and recent Prometheus alerts generated by the instance. For information about the System Health State, see System Monitor Health StateOpens in a new tab in the “Using System Monitor” chapter of the Monitoring Guide. For more information about Prometheus alerts, see the Receive Alerts section above.
System Alerting and Monitoring determines instance state as follows:
The state is Critical if either of the following is true:
A Prometheus alert with severity Critical fired within the past 30 minutes.
The System Monitor Health State is 2 or -1.
Otherwise, the state is Warning if any of the following are true:
A Prometheus alert with severity Critical fired between 30 and 60 minutes ago.
A Prometheus alert with severity Warning fired within the past 30 minutes.
The System Monitor Health State is 1.
Finally, the state is OK if:
No Prometheus alerts have fired in the past hour.
The System Monitor Health State is 0.
Unreachable means SAM cannot access the instance. See the section below for guidance troubleshooting an unreachable instance.