Monitoring InterSystems IRIS Using REST API
Every InterSystems IRIS® data platform instance contains a REST interface that provides statistics about the instance. The REST API provides a way to gather information from multiple machines running InterSystems IRIS, allowing you to monitor in detail all instances that comprise your application.
This appendix describes the metrics the /api/monitor service provides. These metrics are compatible with Prometheus, an open-source monitoring and alerting tool. Configuring Prometheus to scrape multiple connected InterSystems IRIS instances provides a cohesive view of your entire system, making it easier to evaluate whether the system is behaving properly and efficiently.
For an introduction to creating and using REST interfaces, see Introduction to Creating REST Services. To quickly try it on your own, see Developing Rest InterfacesOpens in a new tab.
/api/monitor Service
The /api/monitor service provides information about the InterSystems IRIS Instance on which it runs. By default, the /api/monitor web application is enabled with “Unauthenticated” access. For information about setting up authentication for this service, see the Securing REST ServicesOpens in a new tab chapter in Creating REST Services.
This API has the following two endpoints:
-
/metrics Endpoint, which returns all instance metrics, and can be configured to return specific application metrics.
-
/alerts Endpoint, which returns any system alerts that have been posted since the endpoint was last scraped.
InterSystems IRIS logs any errors in the SystemMonitor.log file, which is located in the install-dir/mgr directory.
/metrics Endpoint
The /metrics endpoint returns a list of metrics, which are described in Metric Descriptions. You can also enable the collection of additional metrics about active interoperability productions, as described in Interoperability Metrics. Create Application Metrics contains instructions for how to define custom metrics.
To configure Prometheus to scrape an instance of InterSystems IRIS, follow the instructions in First Steps With Prometheus (https://prometheus.io/docs/introduction/first_steps/Opens in a new tab).
Metric Descriptions
The metrics are returned in a text-based format, described in the Exposition Formats page of the Prometheus documentation (https://prometheus.io/docs/instrumenting/exposition_formats/Opens in a new tab). Each metric is listed on a single line with only one space, which separates the name from the value.
InterSystems IRIS metrics are listed in the table below. Metric names with a label appear here with line breaks to improve readability.
This table contains metrics for the version of InterSystems IRIS documented here. As metrics may be added in newer versions, be sure this documentation matches your version of InterSystems IRIS.
Metric Name | Description |
---|---|
iris_cpu_pct
{id="ProcessType"} |
Percent of CPU usage by InterSystems IRIS process type. ProcessType can be any of the following:
ECPWorker, ECPCliR, ECPCliW, ECPSrvR, ECPSrvW, LICENSESRV, WDAUX, WRTDMN, JRNDMN, GARCOL, CSPDMN, CSPSRV, ODBCSRC, MirrorMaster, MirrorPri, MirrorBack, MirrorPre, MirrorSvrR, MirrorJrnR, MirrorSK, MirrorComm(For more information about InterSystems IRIS Processes, see Secure InterSystems Processes and Operating System Resources.) |
iris_cpu_usage | Percent of CPU usage for all programs on the operating system |
iris_csp_activity
{id="IPaddress:port"} |
Number of web requests served by the Web Gateway Server since it was started |
iris_csp_actual_connections
{id="IPAddress:port"} |
Number of current connections to this server by the Web Gateway Server |
iris_csp_gateway_latency
{id="IPaddress:port"} |
Amount of time to obtain a response from the Web Gateway Server when fetching iris_csp_ metrics, in milliseconds |
iris_csp_in_use_connections
{id="IPaddress:port"} |
Number of current connections to this server by the Web Gateway Server that are processing a web request |
iris_csp_private_connections
{id="IPaddress:port"} |
Number of current connections to this server by the Web Gateway Server that are reserved for state-aware applications (Preserve mode 1) |
iris_csp_sessions | Number of currently active web session IDs on this server |
iris_cache_efficiency | Ratio of global references to physical reads and writes, as a percent |
iris_db_expansion_size_mb
{id="database"} |
Amount by which to expand database, in megabytes |
iris_db_free_space
{id="database"} |
Free space available in database, in megabytes (This metric is only updated once per day, and may not reflect recent changes.) |
iris_db_latency
{id="database"} |
Amount of time to complete a random read from database, in milliseconds |
iris_db_max_size_mb
{id="database"} |
Maximum size to which database can grow, in megabytes |
iris_db_size_mb
{id="database",dir="path"} |
Size of database, in megabytes |
iris_directory_space
{id="database",dir="path"} |
Free space available on the database directory’s storage volume, in megabytes |
iris_disk_percent_full
{id="database",dir="path"} |
Percent of space filled on the database directory’s storage volume |
iris_ecp_conn | Total number of active client connections on this ECP application server |
iris_ecp_conn_max | Maximum active client connections from this ECP application server |
iris_ecp_connections | Number of servers synchronized when this ECP application server synchronizes with its configured ECP data servers |
iris_ecp_latency | Latency between the ECP application server and the ECP data server, in milliseconds |
iris_ecps_conn | Total active client connections to this ECP data server per second |
iris_ecps_conn_max | Maximum active client connections to this ECP data server |
iris_glo_a_seize_per_sec | Number of Aseizes on the global resource per second (For more information, see Considering Seizes, ASeizes, and NSeizes in the “Monitoring Performance Using ^mgstat” section of Monitoring Guide.) |
iris_glo_n_seize_per_sec | Number of Nseizes on the global resource per second (For more information, see Considering Seizes, ASeizes, and NSeizes in the “Monitoring Performance Using ^mgstat” section of Monitoring Guide.) |
iris_glo_ref_per_sec | Number of references to globals located on local databases per second |
iris_glo_ref_rem_per_sec | Number of references to globals located on remote databases per second |
iris_glo_seize_per_sec | Number of seizes on the global resource per second (For more information, see Considering Seizes, ASeizes, and NSeizes in the “Monitoring Performance Using ^mgstat” section of Monitoring Guide.) |
iris_glo_update_per_sec | Number of updates (SET and KILL commands) to globals located on local databases per second |
iris_glo_update_rem_per_sec | Number of updates (SET and KILL commands) to globals located on remote databases per second |
iris_jrn_block_per_sec | Journal blocks written to disk per second |
iris_jrn_free_space
{id="JournalType",dir="path"} |
Free space available on each journal directory’s storage volume, in megabytes. JournalType can be WIJ, primary, or secondary |
iris_jrn_size
{id="JournalType"} |
Current size of each journal file, in megabytes. JournalType can be WIJ, primary, or secondary |
iris_license_available | Number of licenses not currently in use |
iris_license_consumed | Number of licenses currently in use |
iris_license_percent_used | Percent of licenses currently in use |
iris_log_reads_per_sec | Logical reads per second |
iris_obj_a_seize_per_sec | Number of Aseizes on the object resource per second (For more information, see Considering Seizes, ASeizes, and NSeizes in the “Monitoring Performance Using ^mgstat” section of Monitoring Guide.) |
iris_obj_del_per_sec | Number of objects deleted per second |
iris_obj_hit_per_sec | Number of object references per second, in process memory |
iris_obj_load_per_sec | Number of objects loaded from disk per second, not in shared memory |
iris_obj_miss_per_sec | Number of object references not found in memory per second |
iris_obj_new_per_sec | Number of objects initialized per second |
iris_obj_seize_per_sec | Number of seizes on the object resource per second (For more information, see Considering Seizes, ASeizes, and NSeizes in the “Monitoring Performance Using ^mgstat” section of Monitoring Guide.) |
iris_page_space_percent_used | Percent of maximum allocated page file space used |
iris_phys_mem_percent_used | Percent of physical memory (RAM) currently in use |
iris_phys_reads_per_sec | Physical database blocks read from disk per second |
iris_phys_writes_per_sec | Physical database blocks written to disk per second |
iris_process_count | Total number of active InterSystems IRIS processes |
iris_rtn_a_seize_per_sec | Number of Aseizes on the routine resource per second (For more information, see Considering Seizes, ASeizes, and NSeizes in the “Monitoring Performance Using ^mgstat” section of Monitoring Guide.) |
iris_rtn_call_local_per_sec | Number of local routine calls per second to globals located on remote databases per second |
iris_rtn_call_miss_per_sec | Number of routines calls not found in memory per second |
iris_rtn_call_remote_per_sec | Number of remote routine calls per second |
iris_rtn_load_per_sec | Number of routines locally loaded from or saved to disk per second |
iris_rtn_load_rem_per_sec | Number of routines remotely loaded from or saved to disk per second |
iris_rtn_seize_per_sec | Number of seizes on the routine resource per second (For more information, see Considering Seizes, ASeizes, and NSeizes in the “Monitoring Performance Using ^mgstat” section of Monitoring Guide.) |
iris_sam_get_db_sensors_seconds | Amount of time it took to collect iris_db* sensors, in seconds |
iris_sam_get_jrn_sensors_seconds | Amount of time it took to collect iris_jrn* sensors, in seconds |
iris_sam_get_sql_sensors_seconds | Amount of time it took to collect iris_sql* sensors, in seconds |
iris_sam_get_wqm_sensors_seconds | Amount of time it took to collect iris_wqm* sensors, in seconds |
iris_smh_available
{id="purpose"} |
Shared memory available by purpose, in kilobytes (For more information, including a list of identifiers for purpose, see Generic (Shared) Memory Heap Usage in “Monitoring InterSystems IRIS Using the Management Portal” section of Monitoring Guide.) |
iris_smh_percent_full
{id="purpose"} |
Percent of allocated shared memory in use by purpose (For more information, including a list of identifiers for purpose, see Generic (Shared) Memory Heap Usage in “Monitoring InterSystems IRIS Using the Management Portal” section of Monitoring Guide.) |
iris_smh_total | Shared memory allocated for current instance, in kilobytes |
iris_smh_total_percent_full | Percent of allocated shared memory in use for current instance |
iris_smh_used
{id="purpose"} |
Shared memory in use by purpose, in kilobytes (For more information, including a list of identifiers for purpose, see Generic (Shared) Memory Heap Usage in “Monitoring InterSystems IRIS Using the Management Portal” section of Monitoring Guide.) |
iris_sql_active_queries
{id="namespace"} |
The number of SQL statements currently executing |
iris_sql_active_queries_95_percentile
{id="namespace"} |
For the current set of active SQL statements, the 95th percentile elapsed time since a statement began executing |
iris_sql_active_queries_99_percentile
{id="namespace"} |
For the current set of active SQL statements, the 99th percentile elapsed time since a statement began executing |
iris_sql_commands_per_second
{id="namespace"} |
Average number of ObjectScript commands executed to perform SQL queries, per second |
iris_sql_queries_avg_runtime
{id="namespace"} |
Average SQL statement runtime, in seconds |
iris_sql_queries_avg_runtime_std_dev
{id="namespace"} |
Standard deviation of the average SQL statement runtime |
iris_sql_queries_per_second
{id="namespace"} |
Average number of SQL statements, per second |
iris_system_alerts | The number of alerts posted to the messages log since system startup |
iris_system_alerts_log | The number of alerts currently located in the alerts log |
iris_system_alerts_new | Whether new alerts are available on the /api/monitor/alerts endpoint, as a Boolean |
iris_system_state | A number representing the system monitor health state (For more information, see System Monitor Health State in the “Using System Monitor” section of Monitoring Guide.) |
iris_trans_open_count | Number of open transactions on the current instance |
iris_trans_open_secs | Average duration of open transactions on the current instance, in seconds |
iris_trans_open_secs_max | Duration of longest currently open transaction on the current instance, in seconds |
iris_wd_buffer_redirty | Number of database buffers the write daemon wrote during the most recent cycle that were also written in prior cycle |
iris_wd_buffer_write | Number of database buffers the write daemon wrote during its most recent cycle |
iris_wd_cycle_time | Amount of time the most recent write daemon cycle took to complete, in milliseconds |
iris_wd_proc_in_global | Number of processes actively holding global buffers at start of the most recent write daemon cycle |
iris_wd_size_write | Size of database buffers the write daemon wrote during its most recent cycle, in kilobytes |
iris_wd_sleep | Amount of time that the write daemon was inactive before its most recent cycle began, in milliseconds |
iris_wd_temp_queue | Number of in-memory buffers the write daemon used at the start of its most recent cycle |
iris_wd_temp_write | Number of in-memory buffers the write daemon wrote during its most recent cycle |
iris_wdwij_time | Amount of time the write daemon spent writing to the WIJ file during its most recent cycle, in milliseconds |
iris_wd_write_time | Amount of time the write daemon spent writing buffers to databases during its most recent cycle, in milliseconds |
iris_wij_writes_per_sec | WIJ physical block writes per second |
iris_wqm_active_worker_jobs
{id="category"} |
Average number of worker jobs running logic that are not blocked |
iris_wqm_commands_per_sec
{id="category"} |
Average number of commands executed in this Work Queue Management category, per second |
iris_wqm_globals_per_sec
{id="category"} |
Average number of global references run in this Work Queue Management category, per second |
iris_wqm_max_active_worker_jobs
{id="category"} |
Maximum number of active workers since the last log entry was recorded |
iris_wqm_max_work_queue_depth
{id="category"} |
Maximum number of entries in the queue of this Work Queue Management category since the last log |
iris_wqm_waiting_worker_jobs
{id="category"} |
Average number of idle worker jobs waiting for a group to connect to and do work for |
Interoperability Metrics
In addition to the metrics described in the previous section, an InterSystems IRIS instance can also record metrics about active interoperability productionsOpens in a new tab and include them in the output of the /metrics endpoint. The recording of these interoperability metrics is disabled by default. To enable it, you must perform the following steps for each interoperability production you want to monitor:
-
Open a Terminal session for the InterSystems IRIS instance running the production you want to monitor. If necessary, switch to the namespace associated with the production by executing the following command:
set $namespace = "[interopNS]"
where [interopNS] is the namespace name.
-
In the Terminal, execute the following command to enable the collection of metrics for the active production within the current namespace (SAM refers to System Alerting and MonitoringOpens in a new tab, the InterSystems monitoring solution):
do ##class(Ens.Util.Statistics).EnableSAMForNamespace()
Note:If the recording of metrics is enabled for a namespace but the corresponding production is not active, the /metrics endpoint does not return any metrics.
The Ens.Util.Statistics class provides methods for customizing the output of the /metrics endpoint. For example, invoking the method DisableSAMIncludeHostLabel will provide aggregated metrics for the entire production instead of providing them for each host individually.
The metrics available after completing this step are described in the Basic Interoperability Metrics table below.
-
If you would like to collect additional metrics about activity volume for a production, you must enable activity monitoring by invoking the class method Ens.Util.Statistics.EnableStatsForProduction in the corresponding namespace using the Terminal. You must also add the Ens.Activity.Operation.Local business operation to the production. This process is detailed in Enabling Activity Monitoring on the Monitoring Activity Volume page.
The additional metrics available after completing this step are described in the Activity Volume Metrics table below.
-
If you would like to collect additional HTTP transmission metrics for an interoperability web client which uses the EnsLib.HTTP.OutboundAdapter or the EnsLib.SOAP.OutboundAdapter, you must enable the reporting of HTTP metrics for the corresponding business operation by performing the following steps:
-
Open the Management Portal for the InterSystems IRIS instance containing the web client you want to monitor.
-
Select Interoperability and choose the namespace containing the web client.
-
Select Configure > Production to open the Production Configuration page.
-
Select the operation which uses the HTTP or SOAP outbound adapter.
-
In the Alerting Control section of the Production Settings > Settings panel, select the Provide Metrics for SAM checkbox.
-
Select Apply to save your settings.
The additional metrics available after completing this step are described in the HTTP Metrics table below.
Note:Currently, HTTP transmission metrics are only collected for business operations which invoke actors using the Queue style (not inProc). For more information on the difference between these invocation styles, see Defining a Business Operation Class in Defining Business Operations.
-
InterSystems IRIS interoperability metrics are listed in the tables below. Metric names with a label appear here with line breaks to improve readability.
These tables contain metrics for the version of InterSystems IRIS documented here. As metrics may be added in newer versions, be sure this documentation matches your version of InterSystems IRIS.
Metric Name | Description |
---|---|
iris_interop_alert_delay
{id="namespace",host="host",production="production"} |
Number of hosts within the production and namespace that have triggered a Queue Wait AlertOpens in a new tab. If output has been configured to include host labels, the hosts that have triggered Queue Wait Alerts are provided separately and the value will be 1. |
iris_interop_hosts
{id="namespace",status="status",host="host",production="production"} |
Number of hosts within the production and namespace which currently have the specified status. If output has been configured to include host labels, the status of each host is provided separately and the value will be 1. status can be OK, Error, Retry, Starting, Inactive, or Unconfigured. |
iris_interop_messages
{id="namespace",host="host",production="production"} |
Number of messages processed since the production started. If output has been configured to include host labels, the number of messages processed by each host is provided separately |
iris_interop_messages_per_sec
{id="namespace",host="host",production="production"} |
Average number of messages processed within the production and namespace in a second over the most recent sampling interval. If output has been configured to include host labels, the number of messages processed by each host is provided separately |
iris_interop_queued
{id="namespace",host="host",production="production"} |
Number of messages currently queued within the production and namespace. If output has been configured to include host labels, the number of messages currently queued for each host is provided separately. |
Metric Name | Description |
---|---|
iris_interop_avg_processing_time
{id="namespace",hosttype="HostType",host="host",production="production",messagetype="MessageType"} |
Average length of time required to process a message of the specified MessageType within the production and namespace, in seconds. HostType can be service, operation, or actor (that is, process). MessageType is user-defined; if no MessageType is specified,"-" is returned. If output has been configured to include host labels, the message processing time for each host is provided separately. |
iris_interop_avg_queueing_time
{id="namespace",hosttype="HostType",host="host",production="production",messagetype="MessageType"} |
Average duration that a message of the specified MessageType spent in the queue while being processed by a host of HostType within the production and namespace, in seconds. HostType can be service, operation, or actor (that is, process). MessageType is user-defined; if no MessageType is specified,"-" is returned. If output has been configured to include host labels, the queueing time for each host is provided separately. |
iris_interop_sample_count
{id="namespace",hosttype="HostType",host="host",production="production",messagetype="MessageType"} |
Number of messages of the specified MessageType processed by a host of HostType within the production and namespace over the most recent sampling interval. HostType can be service, operation, or actor (that is, process). MessageType is user-defined; if no MessageType is specified,"-" is returned. If output has been configured to include host labels, the number of messages processed by each host is provided separately. |
iris_interop_sample_count_per_sec
{id="namespace",hosttype="HostType",host="host",production="production",messagetype="MessageType"} |
Number of messages of the specified MessageType processed per second by a host of HostType within the production and namespace, averaged over the most recent sampling interval. HostType can be service, operation, or actor (that is, process). MessageType is user-defined; if no MessageType is specified,"-" is returned. If output has been configured to include host labels, the number of messages processed by each host is provided separately. |
Metric Name | Description |
---|---|
iris_interop_avg_http_received_chars
{id="namespace",host="host",production="production"} |
Average number of characters received per HTTP or SOAP response within the production and namespace over the most recent sampling interval. If output has been configured to include host labels, the average number of characters received by each host is provided separately. |
iris_interop_avg_http_sent_chars
{id="namespace",host="host",production="production"} |
Average number of characters sent per HTTP or SOAP request within the production and namespace over the most recent sampling interval. If output has been configured to include host labels, the average number of characters sent by each host is provided separately. |
iris_interop_avg_http_ttfc
{id="namespace",host="host",production="production"} |
Time to First Character (TTFC): average length of time between the start of an HTTP or SOAP request and the first character of the corresponding response, in seconds. If output has been configured to include host labels, the TTFC for each host is provided separately |
iris_interop_avg_http_ttlc
{id="namespace",host="host",production="production"} |
Time to Last Character (TTLC): average length of time between the start of an HTTP or SOAP request and the last character of the corresponding response. If output has been configured to include host labels, the TTLC for each host is provided separately. |
iris_interop_http_sample_count
{id="namespace",host="host",production="production"} |
Number of HTTP or SOAP transmissions sent within the production and namespace over the most recent sampling interval. If output has been configured to include host labels, the number of transmissions sent by each host is provided separately. |
iris_interop_http_sample_count_per_sec
{id="namespace",host="host",production="production"} |
Number of HTTP or SOAP transmissions sent per second within the production and namespace, averaged over the most recent sampling interval. If output has been configured to include host labels, the number of transmissions sent by each host per second is provided separately. |
Create Application Metrics
To add custom application metrics to those returned by the /metrics endpoint:
-
Create a new class that inherits from %SYS.Monitor.SAM.AbstractOpens in a new tab.
-
Define the PRODUCT parameter as the name of your application. This can be anything except for iris, which is reserved for the InterSystems IRIS metrics.
-
Implement the GetSensors()Opens in a new tab method to define the desired custom metrics, as follows:
-
The method must contain one or more calls to the SetSensor()Opens in a new tab method. This method sets the name and value for an application metric. The values should be integers or floating point numbers to ensure compatibility with Prometheus and InterSystems SAM.
You can optionally define a label for the metric, though if you do, you must always define a label for that particular metric.
Note:For best practices when choosing metric and label names, see Metric and Label Naming in the Prometheus documentation (https://prometheus.io/docs/practices/naming/Opens in a new tab).
-
The method must return $$$OK if successful.
Important:A slow implementation of GetSensors() can negatively impact system performance. Be sure to test that your implementation of GetSensors() is efficient, and avoid implementations that could time out or hang.
-
-
Compile the class. An example is shown below:
/// Example of a custom class for the /metric API Class MyMetrics.Example Extends %SYS.Monitor.SAM.Abstract { Parameter PRODUCT = "myapp"; /// Collect metrics from the specified sensors Method GetSensors() As %Status { do ..SetSensor("my_counter",$increment(^MyCounter),"my_label") do ..SetSensor("my_gauge",$random(100)) return $$$OK } }
-
Use the AddApplicationClass()Opens in a new tab method of the SYS.Monitor.SAM.ConfigOpens in a new tab class to add the custom class to the /metrics configuration. Pass as arguments the name of the class and the namespace where it is located.
For example, enter the following in the Terminal from the %SYS namespace:
%SYS>set status = ##class(SYS.Monitor.SAM.Config).AddApplicationClass("MyMetrics.Example", "USER") %SYS>w status status=1
-
Ensure that /api/monitor web application has the necessary Application Roles to access the custom metrics. For details on how to edit application roles, see Edit an Application: The Application Roles Tab.
This step grants /api/monitor access to the data needed for the custom metric. For example, if the custom metric class is located in the USER database (protected by the %DB_USER resource), grant /api/monitor the %DB_USER role.
-
Review the output of the /metrics endpoint by pointing your browser to http://<instance-host>:52773/api/monitor/metrics (where 52773 is the default web server port). The metrics you defined should appear after the InterSystems IRIS metrics, such as:
[...] myapp_my_counter{id="my_label") 1 myapp_my_gauge 92
The /metrics endpoint now returns the custom metrics you defined. The InterSystems IRIS metrics include an “iris_” prefix, while your custom metrics use the value of PRODUCT as a prefix.
/alerts Endpoint
The /alerts endpoint fetches the most recent alerts from the alerts.log file and returns them in JSON format, such as:
{"time":"2019-08-15T10:36:38.313Z","severity":2,\
"message":"Failed to allocate 1150MB shared memory using large pages. Switching to small pages."}
When /alerts is called, it returns the alerts that have been generated since the previous time /alerts was called. The iris_system_alerts_new metric is a Boolean that indicates whether new alerts have been generated.
For more information about when and how alerts are generated, see the Using Log Monitor chapter of this guide.