Inspect System Performance Using the SAM Web Portal
System Alerting and Monitoring (SAM) has been deprecated; the following documentation is provided for existing users only. Customers interested in a comprehensive view of their operational platform can access the metrics APIOpens in a new tab and structured logsOpens in a new tab of InterSystems products within another observability tool. Existing users who would like assistance identifying an alternative solution should contact the WRCOpens in a new tab.
Once you have defined your clusters, instances, and alert rules within System Alerting and Monitoring (SAM) version 2.0, you can use it to see real-time metrics and alerts for your InterSystems IRIS instances. The SAM web portal consists of multiple pages that display this information at different levels of detail.
These pages are:
-
Monitor Clusters Page – the “home page” for the SAM web portal, which displays an overview of all clusters.
-
Single Cluster Page – a more focused view, which displays only the information for instances in a single cluster.
-
Single Instance Page – the narrowest and most detailed view, which displays the instance’s details, alerts, and metrics dashboard.
These pages allow you to:
Monitor Clusters Page
The Monitor Clusters page displays an overview of all your clusters. The Monitor Clusters page is the home page for the SAM web application, and you can return to it at any time by selecting the InterSystems logo (beside the System Alerting & Monitoring title) at the top of any other page in the application.
The Monitor Clusters page displays a cluster card for each cluster in your system. This card features a circle depicting the state of all the instances for the cluster. The Monitor Clusters page also includes an Alerts table, showing the recent alerts from all monitored instances, and provides access to the configuration settings.
To see detailed information about a specific cluster, click on the cluster card or the cluster name in the Alerts table. To see detailed information about a specific instance, click on the IP:Port for the instance in the Alerts table or .
Settings Menu
The main System Alerting and Monitoring page contains a gear icon, located near the top of the screen. Click this icon to access the Settings drop-down menu, which contains the following options:
-
Settings — opens the Configuration Settings dialog, where you can:
-
Set the number of days (between 1 and 30) for SAM to store alert and metric data in an IRIS database for long-term analysis
-
Use HTTPS to connect to your InterSystems IRIS instances, if you have configured SAM and your target instances for SSL/TLS.
-
-
Save Current Configuration — exports data about your clusters, instances, and alerts as a JSON file. Use this file to migrate your system configuration into another SAM instance.
-
Open Existing Configuration — opens a File Upload window, where you can import a SAM system configuration JSON file.
Caution:Importing a SAM system configuration erases the current configuration, and cannot be undone.
The section “Tune and Troubleshoot SAM” describes procedures for changing other SAM settings, including how to increase the retention time for data in the Prometheus database, which can help you improve performance if you are constantly querying data more than two hours old.
Single Cluster Page
To view details about a cluster, click on the cluster card on the Monitor Clusters page.
The single cluster page displays an Alerts table, showing the recent alerts from all instances in that cluster. There is also an Instances table with details about the target instances. The Instances table shows the following details:
-
IP:Port – The IP address and Port which specify where a target instance is located. You can click this to “zoom in” to the Instance page.
-
State – The state of the instance, which can be OK, Warning, Critical, or Unresponsive. See “Understand Instance State” below for a description of how SAM determines instance state.
-
Name – The name of the instance.
-
Description – The description of the instance.
Single Instance Page
To see the page for a single instance, click on the IP:Port for the instance. The single instance page contains the following sections:
-
A Details table, which contains the instance’s IP:Port, State, Name, Description, and a link to the Management Portal. For details about how SAM calculates State, see the “Understand Instance State” section below.
-
An Alerts table, showing the recent alerts for the current instance.
-
A Dashboard, which shows an overview of the current Grafana Dashboard for the instance.
The page also has an Edit Instance button, which allows you to modify some of the instance details, and Delete Instance button, which allows you to remove the instance from SAM.
If you edit an instance and change its network address, SAM purges all existing alerts tied to that instance. This is because SAM assumes different network address refer to different instances.
Grafana Dashboards
A Dashboard displays several graphs of metrics, providing a snapshot of recent activity on the instance.
The table below describes the information reported by the pre-configured SAM Dashboard:
Dashboard Graph | Metric(s) used | Description |
---|---|---|
CPU Utilization |
iris_cpu_usage |
The CPU usage of the system running the instance for the past 30 minutes. |
Glorefs |
iris_glo_ref_per_sec iris_glo_ref_rem_per_sec |
The global references to local (blue line) and remote (orange line) databases for the past 30 minutes |
Global Updates |
iris_glo_update_per_sec |
Updates to globals located on local databases per second for the past 30 minutes |
IRIS Disk Percent |
iris_disk_percent_full |
Percent of used space on the storage volume for the IRISSYS database |
IRIS Disk Remaining |
iris_directory_space |
Free space available on the storage volume for the IRISSYS database |
Database Reads |
iris_phys_reads_per_sec |
Physical database block reads from disk per second for the past 30 minutes |
IRIS Database Latency |
iris_db_latency |
Milliseconds to complete a random read from the database for the past 30 minutes |
Total Pri Jnl Size |
iris_jrn_size{id="primary"} |
Current size of the primary journal file |
Pri Jnl Free |
iris_jrn_free_space{id="primary"} |
Free space available on the primary journal directory’s storage volume |
WIJ Free |
iris_jrn_free_space{id="WIJ"} |
Free space available on the WIJ journal directory’s storage volume |
License Current Pct |
iris_license_percent_used |
Percent of licenses currently in use |
Licenses Available |
iris_license_available |
Number of licenses currently not in use |
System Alerts |
iris_system_alerts |
The number of alerts posted to the messages log since system startup |
You can create and edit SAM dashboards using Grafana, an open-sourced metrics visualization tool. Select View in Grafana to open Grafana. When you save a new dashboard in Grafana, you can set it as the default Dashboard to display for an instance by selecting its name from a drop-down menu in the New Instance or Edit Instance dialog for that instance.
For more information about creating and editing dashboards, check out the Grafana documentation (https://grafana.com/docs/guides/getting_started/Opens in a new tab).
If you edit the dashboard to display metrics older than two hours, you may want to increase the Prometheus database retention time.
To add or edit an instance definition using the REST API in this version of SAM, you must specify the dashboard you wish to display for the instance by setting the dashboardId property. The value of dashboardId is the uid which Grafana assigns the dashboard when it is saved. You can find the value of uid by examining the JSON modelOpens in a new tab for the dashboard in Grafana.
Review the Alerts Table
Each page SAM provides for inspecting your system (the Monitor Clusters page, the Cluster page, and the Instance page) provides an Alerts table listing alerts recorded for instances within the scope of that page. By default, this table displays alerts from the last hour; to view all alerts, select Show All.
An Alerts table contains the following information:
-
Last Reported – The most recent time the alert was reported.
-
Cluster – The cluster containing the instance that generated the alert.
-
IP:Port – The IP address and Port of the instance that generated the alert.
-
Severity – The severity of the alert: either Critical or Warning.
-
Source – The source that generated the alert: either IRIS or Prometheus.
-
An IRIS alert is generated by an InterSystems IRIS instance. The instance’s log monitor scans the messages log and posts notifications with severity 2 or higher to the alerts log, where SAM collects them. For more information, see the Monitoring GuideOpens in a new tab.
-
A Prometheus alert is generated by SAM according to user-defined alert rules. For more information, refer to the previous section on alerts.
-
-
Name – The name of the alert.
-
Message – The message associated with the alert.
Inspect Instance Metrics
All InterSystems IRIS instances collect metrics that describe the status and operation of the instance. System Alerting and Monitoring allows you to monitor those metrics over time, and use them to configure alert rules.
For a list of all these metrics, see Metrics DescriptionOpens in a new tab in the “Monitoring InterSystems IRIS Using REST API” section of the Monitoring Guide that corresponds to your version of InterSystems IRIS. The Create Application MetricsOpens in a new tab section on the same page describes how to create your own metrics.
Understand Instance State
Instance state indicates whether an InterSystems IRIS instance has fired any alerts recently. There are four possible values for instance state: OK, Warning, Critical, or Unreachable. A state of OK means there have been no recent alerts. When an alert fires for an instance, System Alerting and Monitoring elevates that instance’s state to Warning or Critical. Unreachable means that, for some reason, SAM cannot access the instance.
A state of OK does not necessarily mean there are no problems with an instance. Likewise, you may determine that no action is required for an instance with a Critical state. The instance state reflects the number of recent alerts, but does not provide comprehensive information about the instance.
Instance state is a combination of two factors: the InterSystems IRIS instance’s System Health State (which SAM obtains from the iris_system_state metric), and recent Prometheus alerts generated by the instance. For information about the System Health State, see System Monitor Health StateOpens in a new tab in the “Using System Monitor” chapter of the Monitoring Guide. For more information about Prometheus alerts, see the Receive Alerts section above.
System Alerting and Monitoring determines instance state as follows:
-
The state is Critical if either of the following is true:
-
A Prometheus alert with severity Critical fired within the past 30 minutes.
-
The System Monitor Health State is 2 or -1.
-
-
Otherwise, the state is Warning if any of the following are true:
-
A Prometheus alert with severity Critical fired between 30 and 60 minutes ago.
-
A Prometheus alert with severity Warning fired within the past 30 minutes.
-
The System Monitor Health State is 1.
-
-
Finally, the state is OK if:
-
No Prometheus alerts have fired in the past hour.
-
The System Monitor Health State is 0.
-
-
Unreachable means SAM cannot access the instance. See the section below for guidance troubleshooting an unreachable instance.