Monitoring InterSystems IRIS Using REST API
Every InterSystems IRIS® data platform instance contains a REST interface that provides statistics about the instance. The REST API provides a way to gather information from multiple machines running InterSystems IRIS, allowing you to monitor in detail all instances that comprise your application.
This appendix describes the metrics the /api/monitor service provides. These metrics are compatible with Prometheus, an open-source monitoring and alerting tool. Configuring Prometheus to scrape multiple connected InterSystems IRIS instances provides a cohesive view of your entire system, making it easier to evaluate whether the system is behaving properly and efficiently.
For an introduction to creating and using REST interfaces, see First Look: Developing Rest Interfaces with InterSystems Products.
/api/monitor Service
The /api/monitor service provides information about the InterSystems IRIS Instance on which it runs. By default, the /api/monitor web application is enabled with “Unauthenticated” access. For information about setting up authentication for this service, see the Securing REST ServicesOpens in a new tab chapter in Creating REST Services.
This API has the following two endpoints:
-
/metrics Endpoint, which returns all instance metrics, and can be configured to return specific application metrics.
-
/alerts Endpoint, which returns any system alerts that have been posted since the endpoint was last scraped.
InterSystems IRIS logs any errors in the SystemMonitor.log file, which is located in the install-dir/mgr directory.
/metrics Endpoint
The /metrics endpoint returns a list of metrics, which are described in the Metric Descriptions section. The Create Application Metrics section contains instructions for how to define custom metrics.
To configure Prometheus to scrape an instance of InterSystems IRIS, follow the instructions in First Steps With Prometheus (https://prometheus.io/docs/introduction/first_steps/Opens in a new tab).
Metric Descriptions
The metrics are returned in a text-based format, described in the Exposition Formats page of the Prometheus documentation (https://prometheus.io/docs/instrumenting/exposition_formats/Opens in a new tab). Each metric is listed on a single line with only one space, which separates the name from the value.
InterSystems IRIS metrics are listed in the table below. Metric names with a label appear here with line breaks to improve readability.
This table contains metrics for the version of InterSystems IRIS documented here. As metrics may be added in newer versions, be sure this documentation matches your version of InterSystems IRIS.
Metric Name | Description |
---|---|
iris_cpu_pct
{id="ProcessType"} |
Percent of CPU usage by InterSystems IRIS process type. ProcessType can be any of the following:
ECPWorker, ECPCliR, ECPCliW, ECPSrvR, ECPSrvW, LICENSESRV, WDAUX, WRTDMN, JRNDMN, GARCOL, CSPDMN, CSPSRV, ODBCSRC, MirrorMaster, MirrorPri, MirrorBack, MirrorPre, MirrorSvrR, MirrorJrnR, MirrorSK, MirrorComm (For more information about InterSystems IRIS Processes, see Secure InterSystems Processes and Operating System Resources.) |
iris_cpu_usage | Percent of CPU usage for all programs on the operating system |
iris_csp_activity
{id="IPaddress:port"} |
Number of web requests served by the Web Gateway Server since it was started |
iris_csp_actual_connections
{id="IPAddress:port"} |
Number of current connections to this server by the Web Gateway Server |
iris_csp_gateway_latency
{id="IPaddress:port"} |
Amount of time to obtain a response from the Web Gateway Server when fetching iris_csp_ metrics, in milliseconds |
iris_csp_in_use_connections
{id="IPaddress:port"} |
Number of current connections to this server by the Web Gateway Server that are processing a web request |
iris_csp_private_connections
{id="IPaddress:port"} |
Number of current connections to this server by the Web Gateway Server that are reserved for state-aware applications (Preserve mode 1) |
iris_csp_sessions | Number of currently active web session IDs on this server |
iris_cache_efficiency | Ratio of global references to physical reads and writes, as a percent |
iris_db_expansion_size_mb
{id="database"} |
Amount by which to expand database, in megabytes |
iris_db_free_space
{id="database"} |
Free space available in database, in megabytes (This metric is only updated once per day, and may not reflect recent changes.) |
iris_db_latency
{id="database"} |
Amount of time to complete a random read from database, in milliseconds |
iris_db_max_size_mb
{id="database"} |
Maximum size to which database can grow, in megabytes |
iris_db_size_mb
{id="database",dir="path"} |
Size of database, in megabytes |
iris_directory_space
{id="database",dir="path"} |
Free space available on the database directory’s storage volume, in megabytes |
iris_disk_percent_full
{id="database",dir="path"} |
Percent of space filled on the database directory’s storage volume |
iris_ecp_conn | Total number of active client connections on this ECP application server |
iris_ecp_conn_max | Maximum active client connections from this ECP application server |
iris_ecp_connections | Number of servers synchronized when this ECP application server synchronizes with its configured ECP data servers |
iris_ecp_latency | Latency between the ECP application server and the ECP data server, in milliseconds |
iris_ecps_conn | Total active client connections to this ECP data server per second |
iris_ecps_conn_max | Maximum active client connections to this ECP data server |
iris_glo_a_seize_per_sec | Number of Aseizes on the global resource per second (For more information, see Considering Seizes, ASeizes, and NSeizes in the “Monitoring Performance Using ^mgstat” section of Monitoring Guide.) |
iris_glo_n_seize_per_sec | Number of Nseizes on the global resource per second (For more information, see Considering Seizes, ASeizes, and NSeizes in the “Monitoring Performance Using ^mgstat” section of Monitoring Guide.) |
iris_glo_ref_per_sec | Number of references to globals located on local databases per second |
iris_glo_ref_rem_per_sec | Number of references to globals located on remote databases per second |
iris_glo_seize_per_sec | Number of seizes on the global resource per second (For more information, see Considering Seizes, ASeizes, and NSeizes in the “Monitoring Performance Using ^mgstat” section of Monitoring Guide.) |
iris_glo_update_per_sec | Number of updates (SET and KILL commands) to globals located on local databases per second |
iris_glo_update_rem_per_sec | Number of updates (SET and KILL commands) to globals located on remote databases per second |
iris_jrn_block_per_sec | Journal blocks written to disk per second |
iris_jrn_free_space
{id="JournalType",dir="path"} |
Free space available on each journal directory’s storage volume, in megabytes. JournalType can be WIJ, primary, or secondary |
iris_jrn_size
{id="JournalType"} |
Current size of each journal file, in megabytes. JournalType can be WIJ, primary, or secondary |
iris_license_available | Number of licenses not currently in use |
iris_license_consumed | Number of licenses currently in use |
iris_license_percent_used | Percent of licenses currently in use |
iris_log_reads_per_sec | Logical reads per second |
iris_obj_a_seize_per_sec | Number of Aseizes on the object resource per second (For more information, see Considering Seizes, ASeizes, and NSeizes in the “Monitoring Performance Using ^mgstat” section of Monitoring Guide.) |
iris_obj_del_per_sec | Number of objects deleted per second |
iris_obj_hit_per_sec | Number of object references per second, in process memory |
iris_obj_load_per_sec | Number of objects loaded from disk per second, not in shared memory |
iris_obj_miss_per_sec | Number of object references not found in memory per second |
iris_obj_new_per_sec | Number of objects initialized per second |
iris_obj_seize_per_sec | Number of seizes on the object resource per second (For more information, see Considering Seizes, ASeizes, and NSeizes in the “Monitoring Performance Using ^mgstat” section of Monitoring Guide.) |
iris_page_space_percent_used | Percent of maximum allocated page file space used |
iris_phys_mem_percent_used | Percent of physical memory (RAM) currently in use |
iris_phys_reads_per_sec | Physical database blocks read from disk per second |
iris_phys_writes_per_sec | Physical database blocks written to disk per second |
iris_process_count | Total number of active InterSystems IRIS processes |
iris_rtn_a_seize_per_sec | Number of Aseizes on the routine resource per second (For more information, see Considering Seizes, ASeizes, and NSeizes in the “Monitoring Performance Using ^mgstat” section of Monitoring Guide.) |
iris_rtn_call_local_per_sec | Number of local routine calls per second to globals located on remote databases per second |
iris_rtn_call_miss_per_sec | Number of routines calls not found in memory per second |
iris_rtn_call_remote_per_sec | Number of remote routine calls per second |
iris_rtn_load_per_sec | Number of routines locally loaded from or saved to disk per second |
iris_rtn_load_rem_per_sec | Number of routines remotely loaded from or saved to disk per second |
iris_rtn_seize_per_sec | Number of seizes on the routine resource per second (For more information, see Considering Seizes, ASeizes, and NSeizes in the “Monitoring Performance Using ^mgstat” section of Monitoring Guide.) |
iris_sam_get_db_sensors_seconds | Amount of time it took to collect iris_db* sensors, in seconds |
iris_sam_get_jrn_sensors_seconds | Amount of time it took to collect iris_jrn* sensors, in seconds |
iris_sam_get_sql_sensors_seconds | Amount of time it took to collect iris_sql* sensors, in seconds |
iris_sam_get_wqm_sensors_seconds | Amount of time it took to collect iris_wqm* sensors, in seconds |
iris_smh_available
{id="purpose"} |
Shared memory available by purpose, in kilobytes (For more information, including a list of identifiers for purpose, see Generic (Shared) Memory Heap Usage in “Monitoring InterSystems IRIS Using the Management Portal” section of Monitoring Guide.) |
iris_smh_percent_full
{id="purpose"} |
Percent of allocated shared memory in use by purpose (For more information, including a list of identifiers for purpose, see Generic (Shared) Memory Heap Usage in “Monitoring InterSystems IRIS Using the Management Portal” section of Monitoring Guide.) |
iris_smh_total | Shared memory allocated for current instance, in kilobytes |
iris_smh_total_percent_full | Percent of allocated shared memory in use for current instance |
iris_smh_used
{id="purpose"} |
Shared memory in use by purpose, in kilobytes (For more information, including a list of identifiers for purpose, see Generic (Shared) Memory Heap Usage in “Monitoring InterSystems IRIS Using the Management Portal” section of Monitoring Guide.) |
iris_sql_queries_avg_runtime
{id="namespace"} |
Average SQL statement runtime, in seconds |
iris_sql_queries_avg_runtime_std_dev
{id="namespace"} |
Standard deviation of the average SQL statement runtime |
iris_sql_queries_per_second
{id="namespace"} |
Average number of SQL statements, per second |
iris_system_alerts | The number of alerts posted to the messages log since system startup |
iris_system_alerts_log | The number of alerts currently located in the alerts log |
iris_system_alerts_new | Whether new alerts are available on the /api/monitor/alerts endpoint, as a Boolean |
iris_system_state | A number representing the system monitor health state (For more information, see System Monitor Health State in the “Using System Monitor” section of Monitoring Guide.) |
iris_trans_open_count | Number of open transactions on the current instance |
iris_trans_open_secs | Average duration of open transactions on the current instance, in seconds |
iris_trans_open_secs_max | Duration of longest currently open transaction on the current instance, in seconds |
iris_wd_buffer_redirty | Number of database buffers the write daemon wrote during the most recent cycle that were also written in prior cycle |
iris_wd_buffer_write | Number of database buffers the write daemon wrote during its most recent cycle |
iris_wd_cycle_time | Amount of time the most recent write daemon cycle took to complete, in milliseconds |
iris_wd_proc_in_global | Number of processes actively holding global buffers at start of the most recent write daemon cycle |
iris_wd_size_write | Size of database buffers the write daemon wrote during its most recent cycle, in kilobytes |
iris_wd_sleep | Amount of time that the write daemon was inactive before its most recent cycle began, in milliseconds |
iris_wd_temp_queue | Number of in-memory buffers the write daemon used at the start of its most recent cycle |
iris_wd_temp_write | Number of in-memory buffers the write daemon wrote during its most recent cycle |
iris_wdwij_time | Amount of time the write daemon spent writing to the WIJ file during its most recent cycle, in milliseconds |
iris_wd_write_time | Amount of time the write daemon spent writing buffers to databases during its most recent cycle, in milliseconds |
iris_wij_writes_per_sec | WIJ physical block writes per second |
iris_wqm_active_worker_jobs
{id="category"} |
Average number of worker jobs running logic that are not blocked |
iris_wqm_commands_per_sec
{id="category"} |
Average number of commands executed in this Work Queue Management category, per second |
iris_wqm_globals_per_sec
{id="category"} |
Average number of global references run in this Work Queue Management category, per second |
iris_wqm_max_active_worker_jobs
{id="category"} |
Maximum number of active workers since the last log entry was recorded |
iris_wqm_max_work_queue_depth
{id="category"} |
Maximum number of entries in the queue of this Work Queue Management category since the last log |
iris_wqm_waiting_worker_jobs
{id="category"} |
Average number of idle worker jobs waiting for a group to connect to and do work for |
Create Application Metrics
To add custom application metrics to those returned by the /metrics endpoint:
-
Create a new class that inherits from %SYS.Monitor.SAM.AbstractOpens in a new tab.
-
Define the PRODUCT parameter as the name of your application. This can be anything except for iris, which is reserved for the InterSystems IRIS metrics.
-
Implement the GetSensors()Opens in a new tab method to define the desired custom metrics, as follows:
-
The method must contain one or more calls to the SetSensor()Opens in a new tab method. This method sets the name and value for an application metric. The values should be integers or floating point numbers to ensure compatibility with Prometheus and InterSystems SAM.
You can optionally define a label for the metric, though if you do, you must always define a label for that particular metric.
Note:For best practices when choosing metric and label names, see Metric and Label Naming in the Prometheus documentation (https://prometheus.io/docs/practices/naming/Opens in a new tab).
-
The method must return $$$OK if successful.
Important:A slow implementation of GetSensors() can negatively impact system performance. Be sure to test that your implementation of GetSensors() is efficient, and avoid implementations that could time out or hang.
-
-
Compile the class. An example is shown below:
/// Example of a custom class for the /metric API Class MyMetrics.Example Extends %SYS.Monitor.SAM.Abstract { Parameter PRODUCT = "myapp"; /// Collect metrics from the specified sensors Method GetSensors() As %Status { do ..SetSensor("my_counter",$increment(^MyCounter),"my_label") do ..SetSensor("my_gauge",$random(100)) return $$$OK } }
-
Use the AddApplicationClass()Opens in a new tab method of the SYS.Monitor.SAM.ConfigOpens in a new tab class to add the custom class to the /metrics configuration. Pass as arguments the name of the class and the namespace where it is located.
For example, enter the following in the Terminal from the %SYS namespace:
%SYS>set status = ##class(SYS.Monitor.SAM.Config).AddApplicationClass("MyMetrics.Example", "USER") %SYS>w status status=1
-
Ensure that /api/monitor web application has the necessary Application Roles to access the custom metrics. For details on how to edit application roles, see Edit an Application: The Application Roles Tab.
This step grants /api/monitor access to the data needed for the custom metric. For example, if the custom metric class is located in the USER database (protected by the %DB_USER resource), grant /api/monitor the %DB_USER role.
-
Review the output of the /metrics endpoint by pointing your browser to http://<instance-host>:52773/api/monitor/metrics (where 52773 is the default WebServer port). The metrics you defined should appear after the InterSystems IRIS metrics, such as:
[...] myapp_my_counter{id="my_label") 1 myapp_my_gauge 92
The /metrics endpoint now returns the custom metrics you defined. The InterSystems IRIS metrics include an “iris_” prefix, while your custom metrics use the value of PRODUCT as a prefix.
/alerts Endpoint
The /alerts endpoint fetches the most recent alerts from the alerts.log file and returns them in JSON format, such as:
{"time":"2019-08-15T10:36:38.313Z","severity":2,\ "message":"Failed to allocate 1150MB shared memory using large pages. Switching to small pages."}
When /alerts is called, it returns the alerts that have been generated since the previous time /alerts was called. The iris_system_alerts_new metric is a Boolean that indicates whether new alerts have been generated.
For more information about when and how alerts are generated, see the Using Log Monitor chapter of this guide.