Using Caché System Monitor
Caché System Monitor is a flexible, user-extensible utility used to monitor a Caché instance and generate notifications when the values of one or more of a wide range of metrics indicate a potential problem. As provided, System Monitor incorporates the following Caché instance monitoring tools:
-
System Monitor monitors system status and resources, generating notifications (alerts and warnings) based on fixed parameters and tracking overall system health.
-
Caché Health Monitor (Health Monitor) samples key system and user-defined metrics and compares them to user-configurable parameters and established normal values, generating notifications when samples exceed applicable thresholds.
-
Caché Application Monitor (Application Monitor) samples significant system metrics, stores the values in the local namespace, and evaluates them using user-created alert definitions. When an alert is triggered, it can either generate an email notification or call a specified class method.
All three tools run in the %SYS namespace by default. System Monitor and Application Monitor can optionally be run in other namespaces under namespace-specific configurations and settings. You can define and configure your own components to extend the capabilities of System Monitor in each namespace as your needs require.
See Caché System Monitoring Tools in the “Using Caché Monitor” chapter of this guide for an overview of general Caché instance monitoring tools and Manage Email Options in that chapter for information about configuring Caché Monitor to generate email messages from notifications in the console log, including those generated by System Monitor. See Monitoring Log Files in the “Monitoring Caché Using the Management Portal” chapter for information about the log files discussed in this chapter.
This chapter contains the following sections:
Caché System Monitor
System Monitor samples important system status and resource usage indicators, such as the status of ECP connections and the percentage of the lock table in use, and generates notifications—alerts, warnings, and “status OK” messages—based on fixed statuses and thresholds. These notifications are written to the console log, allowing Caché Monitor to generate email messages from them if configured to do so. System Monitor also maintains a single overall system health state.
System Monitor is managed using the ^%SYSMONMGR utility.
The remainder of this section discusses the following topics:
The System Monitor Process
In each namespace in which it is configured to run, System Monitor gathers and delivers system metric information in three stages using three types of classes (or System Monitor components) in sequence:
-
Obtain metric information
Sensor classes incorporate methods for obtaining the values of system or application metrics. For example, the system sensor class SYS.Monitor.SystemSensors includes the GetProcessCount() method, which returns the number of active processes for the Caché instance, and the GetLockTable() method, which returns the percentage of the instance’s lock table that is in use.
At a fixed interval, System Monitor calls the GetSensors() method of each configured sensor class. A sensor class may do one of the following:
-
Return an array of sensor name/value pairs to be passed by System Monitor to subscriber classes (described in stage 2)
-
Evaluate the sensor values it obtains and return notifications to be posted by System monitor to notifier classes (described in stage 3)
One of the sensor classes provided with System Monitor, SYS.Monitor.SystemSensors, returns a name/value array. The other, %SYS.Monitor.AppMonSensor, performs its own evaluations and generates its own notifications.
-
-
Evaluate metric information
Subscriber classes incorporate methods for evaluating sensor values and generating notifications. After calling each sensor class that returns a name/value array, System Monitor calls the Receive() method of each subscriber class, populating the SensorReading property with the array. For each sensor name/value pair provided to its Receive() method, the subscriber class evaluates the value and if appropriate returns a notification containing text and a severity code.
For example, when System Monitor passes the name/value array returned from SYS.Monitor.SystemSensors.GetSensors() to subscriber classes,
-
the system subscriber, SYS.Monitor.SystemSubscriber, may discover that the LockTablePercentFull value is over 85, its warning threshold for that sensor, and generate a containing with a severity code of 1 and appropriate text.
-
the Health Monitor subscriber, SYS.Monitor.Health.Control, may determine that the ProcessCount value is too high, based on that sensor’s configured parameters and established normal values, and return a notification containing a severity code of 2 and appropriate text.
-
-
Generate notifications
Notifier classes incorporate methods for passing notifications to one or more alerting systems. After calling each sensor class and subscriber class, System Monitor calls the Post() method of each notifier class, populating the Notifications property with the notifications returned by sensor or subscriber classes. The notifier class then passes each notification to the desired alerting method; for example, when the system notifier receives the notifications returned by the system subscriber for LockTablePercentFull and the Health Monitor subscriber for ProcessCount, it writes the severity code and text to the console log. This approach allows notifications to be passed to independent alerting systems such as those in Ensemble and TrakCare, as well as user-defined alerting systems.
System Monitor starts automatically when the instance starts and begins calling the configured sensor classes in each of the configured startup namespaces, passing sensor values to configured subscriber classes and notifications to configured notifier classes in turn. You can define and configure your own System Monitor sensor, subscriber and notifier classes on a per-namespace basis.
In an emergency, System Monitor may need to be shut down. The classmethod %SYS.Monitor.Enabled([flag]) sets, clears, and reports the status of System Monitor. If flag is 0, System Monitor will not start.
Tracking System Monitor Notifications
Typically, any System Monitor alert (notification of severity 2) or sequence of System Monitor warnings (severity 1) should be investigated. Health Monitor can also generate System Monitor alerts and warnings
System Monitor alerts and warnings, including those generated by Health Monitor, and System Monitor status messages (severity 0) are written to the console log (install-dir\mgr\cconsole.log). (All System Monitor and Health Monitor status messages are written to the System Monitor log, install-dir\mgr\SystemMonitor.log. Application Monitor alerts are not written to logs, but can be sent by email or passed to a specified notification method.)
To track System Monitor alerts and warnings, you can do the following:
-
View System Monitor alerts using the ^%SYSMONMGR utility. This option lets you display alerts for all sensors or for a specific sensor and view all recorded alerts or only those occurring during a specified time period, but it does not display warnings.
-
Monitor the console log (see Monitoring Log Files in the “Monitoring Caché Using the Management Portal” chapter). Bear in mind that when a sequence of System Monitor alerts is generated for a given sensor within a short period of time, only the first is written to the console log.
Note:In the console log, System Monitor status notifications are labeled with initial capitals, for example [System Monitor] started in %SYS, whereas warnings, alerts and OK messages are labeled in uppercase, such as [SYSTEM MONITOR] CPUusage Warning: CPUusage = 90 ( Warnvalue is 85).
-
Configure Caché Monitor to send email notifications of alerts (and optionally warnings) appearing in the console log (instead of writing them to the alerts log, the default). When relying on this method, keep in mind that Caché Monitor does not generate a notification for every console log entry of the configured severity; when there is a series of entries from a given process (such as System Monitor) within about one hour, a notification is generated for the first entry only. For example, if a network problem causes System Monitor alerts concerning ECP connections, open transactions, and shadow server connection to be generated over a 15 minute period, Caché Monitor generates only one notification (for whichever alert was first). For this reason, on receiving a single System Monitor notification from Caché Monitor you should immediately view System Monitor alerts and consult the console log.
System Monitor Status and Resource Metrics
The following table lists the system status and resource usage metrics sampled by System Monitor and the notification thresholds and rules for each that result in warnings (severity 1), alerts (severity 2), and “status OK” severity 0 notifications.
Metric | Description | Notification Rules |
---|---|---|
Disk Space |
Available space in a database directory |
|
Journal Space |
Available space in the journal directory |
|
Paging | Percentage of physical memory and paging space used |
|
Lock Table | Percentage of the lock table in use |
|
Write Daemon |
Status of the write daemon |
|
ECP Connections |
State of connections to ECP application servers or ECP data servers |
|
Shadow Server | Status of connection to shadow sources |
|
Shared Memory Heap (Generic Memory Heap) | Status of shared memory heap (SMH), also known as generic memory heap (gmheap) |
|
Open Transactions | Duration of longest open local or remote (ECP) transactions |
|
License Expiration | Days until license expires |
|
SSL/TLS Certificate Expiration | Days until certificate expires |
|
ISCAgent (mirror members only) | ISCAgent status |
|
System Monitor Health State
Based on notifications posted to the console log (see Monitoring Log Files in the “Monitoring Caché Using the Management Portal” chapter of this guide), including both system alerts generated directly by the Caché instance and alerts and warnings generated by System Monitor and its Health Monitor component, System Monitor maintains a single value summarizing overall system health in a register in shared memory.
At startup, the system health state is set based on the number of system (not System Monitor) alerts posted to the console log during the startup process. Once System Monitor is running, the health state can be elevated by either system alerts or System Monitor alerts or warnings. Status is cleared to the next lower level when 30 minutes have elapsed since the last system alert or System Monitor alert or warning was posted. The following table shows how the system health state is determined.
State | Set at startup when ... | Set following startup when ... | Cleared to ... |
---|---|---|---|
GREEN/ok(0) |
no system alerts are posted during startup | 30 minutes (if state was YELLOW) or 60 minutes (if state was RED) have elapsed since the last system alert or System Monitor alert or warning was posted | n/a |
YELLOW/warning(1) |
up to four system alerts are posted during startup | state is GREEN and
|
GREEN when 30 minutes have elapsed since the last system alert or System Monitor alert or warning was posted |
RED/alert(2) | five or more system alerts are posted during startup |
|
YELLOW when 30 minutes have elapsed since the last system alert or System Monitor alert or warning was posted |
A fourth state, HUNG, can occur when global updates are blocked. Specifically, the following events change the state to HUNG:
-
The journal daemon is paused for more than 5 seconds or frozen (see Journal I/O Errors in the “Journaling” chapter of the Caché Data Integrity Guide).
-
Any of switches 10, 11, 13, or 14 are set (see Using Switches in the “Managing Caché Remotely” chapter of Caché Specialized System Tools and Utilities).
-
The write daemon is stopped for any reason or sets the updates locked flag for more than 3 seconds.
-
The number of available buffers falls into the critical region and remains there for more than 5 seconds.
When the health state changes to HUNG, the reason is written to the console log.
The System Monitor health state can be viewed using
-
the View System Health option on the View System Data menu of ^%SYSMONMGR (does not report HUNG)
-
the $SYSTEM.Monitor API, which lets you access the system status directly. Use $SYSTEM.Monitor.State()Opens in a new tab to return the system status; see also the SetState, Clear, Alert, and ClearAlerts methods.
-
the ccontrol list and ccontrol qlist commands (do not include health state on Windows)
When System Monitor is not running, the System Monitor health state is always GREEN.
System Monitor Defaults
System Monitor calls a provided set of classes that can be augmented, runs in the %SYS namespace, and operates under three default settings that can be changed.
Default System Monitor Components
Five classes are provided with Caché and configured in System Monitor in the %SYS namespace by default.
Sensor classes:
-
SYS.Monitor.SystemSensors
System sensor class obtaining sensor values to be passed to configured subscriber classes, including the System Monitor subscriber (SYS.Monitor.SystemSubscriber) and Caché Health Monitor subscriber (SYS.Monitor.Health.Control).
-
%SYS.Monitor.AppMonSensor
Class providing sensor, subscriber and notification services for Caché Application Monitor; obtains sensor values and stores them in the local namespace, evaluates the values based on user-defined alerts and either generates an email message or calls a user-specified method when an alert is triggered, based on the alert definition.
Subscriber classes:
-
SYS.Monitor.Health.Control
Subscriber class for Health Monitor; receives and evaluates statistical sensor values from SYS.Monitor.SystemSensors and posts notifications to the system notifier.
-
SYS.Monitor.SystemSubscriber
System Monitor subscriber available to all sensor classes; contains all code required to monitor and analyze the sensors in SYS.Monitor.SystemSensors. Generates System Monitor notifications and Health Monitor notifications for some sensors.
Notifier class:
-
SYS.Monitor.SystemNotifyOpens in a new tab
System notifier available to all subscriber classes. On receiving a notification from the system subscriber (SYS.Monitor.SystemSubscriber) or Health Monitor subscriber (SYS.Monitor.Health.Control), writes it to the System Monitor log, and to the console log if it is of severity 2 (alert). (See the chapter “Monitoring Caché Using the Management Portal” in this guide for information on these log files.)
The system notifier also generates a single overall evaluation of system status that can be obtained using the SYS.Monitor.State() method, which returns 0 (OK), 1 (warning), or 2 (alert).
User-defined classes can be configured using ^%SYSMONMGR.
Default System Monitor Namespace
All System Monitor and Application Monitor configurations and settings are namespace-specific. By default, System Monitor starts and runs only in the %SYS namespace. Additional startup namespaces for System Monitor and Application Monitor can be configured using ^%SYSMONMGR. Following any change you make to the System Monitor or Application Monitor configuration for a namespace, you must restart System Monitor in the namespace for the change to take effect.
Health Monitor runs only in the %SYS namespace.
Default System Monitor Settings
By default, the System Monitor is always running when the instance is running; it can be stopped using ^%SYSMONMGR but will start automatically again when the instance next starts.
By default, the System Monitor
-
calls the GetSensors() method of each configured sensor class every 30 seconds
-
writes only alerts, warnings and messages to the System Monitor log, and does not write sensor readings
-
does not save sensor readings
These settings can be changed using ^%SYSMONMGR.
Using the ^%SYSMONMGR Utility
The ^%SYSMONMGR utility lets you manage and configure the System Monitor. The utility can be executed in any namespace, and changes made with it affect only the namespace in which it is started. You must maintain a separate System Monitor configuration for each startup namespace you configure by starting ^%SYSMONMGR in that namespace. Following any change you make to the System Monitor configuration for a namespace, you must restart System Monitor in the namespace for the change to take effect.
To manage the System Monitor, enter the following command in the Terminal:
%SYS>do ^%SYSMONMGR
The main menu appears.
1) Start/Stop System Monitor 2) Set System Monitor Options 3) Configure System Monitor Classes 4) View System Monitor State 5) Manage Application Monitor 6) Manage Health Monitor 7) View System Data 8) Exit Option?
Enter the number of your choice or press Enter to exit the utility.
The options in the main menu let you perform System Monitor tasks as described in the following table:
Option | Description |
---|---|
1) Start/Stop System Monitor |
|
2) Set System Monitor Options |
|
3) Configure System Monitor Components |
|
4) View System Monitor State |
|
5) Manage Application Monitor |
|
6) Manage Health Monitor |
|
7) View System Data |
|
Start/Stop System Monitor
When a Caché instance starts, System Monitor starts automatically and begins calling configured classes in each configured startup namespace; this cannot be changed. While the instance is running, however, you can stop System Monitor, and must do so in order to change the configuration of Caché Health Monitor. In addition, following any change you make to the System Monitor configuration for a namespace, you must restart System Monitor in the namespace for the change to take effect.
When you enter 1 at the main menu, the following menu is displayed:
1) Start System Monitor
2) Stop System Monitor
3) Exit
Enter 2 to stop System Monitor when it is running, and 1 to start it when it is stopped.
System Monitor monitors the size of the console log and rolls it over when required, thereby limiting the space it uses to twice the MaxConsoleLogSize configuration setting, which is 5 MB by default. When System Monitor is stopped, therefore, the console log may grow beyond this limit until the instance is restarted or the PurgeErrorsAndLogs task is run. See Monitoring Log Files in the “Monitoring Caché Using the Management Portal” chapter for information about the console log.
Set System Monitor Options
To change global System Monitor settings or to return them to their default values, stop System Monitor if it is running and then enter 2 at the main menu:
1) Set Sample Interval
2) Set Debugging Level
3) Reset Defaults
4) Manage Debug Data
5) Exit
Enter 1 to set the interval at which System Monitor calls each configured sensor class; the default is 30 seconds.
Enter 2 to set the debugging level. The default is 0 (base) which writes System Monitor and Health monitor status and error messages to the System Monitor log, and does not save sensor readings. Debugging level 1 (log all sensors) writes sensor readings to the System Monitor log along with messages and saves sensor readings, which can then be viewed using the View Sensor Data option of the View System Data menu.
Enter 3 to reset the sample interval, debugging level, and saving of sensor readings to their default values.
Enter 4 to set the number of days for which sensor readings are saved; the default is 5.
Your changes take effect when you next start or restart System Monitor.
Configure System Monitor Components
As described in Caché System Monitor, you can create your own sensor, subscriber and notifier classes by extending %SYS.Monitor.AbstractSensorOpens in a new tab, %SYS.Monitor.AbstractSubscriberOpens in a new tab, and %SYS.Monitor.AbstractNotificationOpens in a new tab, respectively, and configure them in System Monitor to extend the capabilities of the provided classes described in Default System Monitor Components. You can also add namespaces other than %SYS for System Monitor to start and run in.
Configure System Monitor Classes
When you enter 3 at the main menu, the following menu is displayed:
1) Configure Components
2) Configure Startup Namespaces
3) Exit
Enter 1 to display the options for configuring classes:
1) List Classes
2) Add Class
3) Delete Class
4) Exit
Enter 1 to list the currently configured classes for the namespace in which you started ^%SYSMONMGR, including provided system classes and those you have configured.
Enter 2 to configure a user-defined class for the namespace in which you started ^%SYSMONMGR. The class you specify must exist in the namespace and be recognized by System Monitor as a valid sensor, subscriber or notifier class. You can also enter a description of the class.
Enter 3 to delete a user-defined class you have configured.
Configuring or deleting a class affects only the namespace in which you started ^%SYSMONMGR.
Configure System Monitor Namespaces
When an instance starts, System Monitor automatically starts as a separate process in each configured startup namespace (by default %SYS only). All System Monitor configurations and settings are namespace-specific. When you make changes using ^%SYSMONMGR, the changes affect only the namespace in which you started the utility.
All instances of ^%SYSMONMGR write messages to the same System Monitor log. Startup namespaces can be configured from any namespace.
When you enter 3 at the main menu, the following menu is displayed:
1) Configure Components
2) Configure Startup Namespaces
3) Exit
Enter 2 to display the options for configuring namespaces:
1) List Startup Namespaces
2) Add Namespace
3) Delete Namespace
4) Exit
Enter 1 to list the currently configured startup namespaces.
Enter 2 to add a startup namespace.
Enter 3 to delete a startup namespace. (You cannot delete %SYS.)
View System Monitor State
Enter 4 at the main menu to display the status of System Monitor and its components in the namespace in which you started ^%SYSMONMGR, for example:
Component State
System Monitor OK
%SYS.Monitor.AppMonSensor None
SYS.Monitor.SystemSensors OK
SYS.Monitor.Health.Control Running: Period is Thursday 09:00 - 11:30
SYS.Monitor.SystemSubscriber OK
SYS.Monitor.SystemNotifier OK
In this example, System Monitor and its system sensor, subscriber and notifier classes are running normally, as is Health Monitor’s subscriber class. However, none of Application Monitor’s classes are activated (see Manage Monitor Classes), so it is not evaluating sensor samples or generating alerts.
Manage Application Monitor
Manage Health Monitor
View System Data
Enter 7 at the main menu to display options for viewing System Monitor information about the system.
1) View Sensor Data
2) View System Health
3) View Alerts
4) Exit
Enter 1 to view saved sensor readings, if you have enabled saving of sensor data using the Manage Sensor Readings option on the Set System Monitor Options menu. You can display saved readings for all sensors or for a specific sensor, and you can view all saved sensor readings or only those for a time period you specify.
Enter 2 to view the System Monitor health state, including all alerts between the previous GREEN state and the current state, if not GREEN.
Enter 3 to view System Monitor alerts. You can display alerts for all sensors or for a specific sensor, and you can view all alerts within the period you specified using the Manage Sensor Readings option on the Set System Monitor Options menu, or only those for a time period you specify.
Defining System Monitor Components
The SYS.Monitor API lets define your own sensor, subscriber, and notifier classes.
Sensor Classes
Sensor classes extend %SYS.Monitor.AbstractSensorOpens in a new tab. The System Monitor controller initially calls each sensor class’s Start() method; thereafter, on each cycle, it calls the GetSensors() method. The SetSensor() method is used within the sensor class to set sensor name/value pairs in the SensorReading property, which is returned by GetSensors() and passed to all subscriber classes.
A sensor class may also evaluate sensor readings and, as a result of its evaluation, call the %SYS.Monitor.EmailOpens in a new tab class for generating email messages from notifications or any user-defined alerting method.
Subscriber Classes
Subscriber classes extend %SYS.Monitor.AbstractSubscriberOpens in a new tab. The System Monitor controller initially calls each subscriber class’s Start() method; thereafter, on each cycle, it calls the Receive() method once for each sensor class called in the cycle, passing the SensorReading property with the sensor name/value pairs received from that sensor class. The subscriber class may evaluate one or more of the name/value pairs and set notifications using the Notify() method, which populates the Notifications property.
A subscriber class may also, as a result of its sensor evaluation, call the %SYS.Monitor.EmailOpens in a new tab class for generating email messages from notifications, or any user-defined alerting method.
%SYS.Monitor.SampleSubscriberOpens in a new tab is provided as a sample subscriber class.
Notifier Classes
Notifier classes extend %SYS.Monitor.AbstractNotificationOpens in a new tab. The System Monitor controller initially calls each notifier class’s Start() method; thereafter, on each cycle, it calls the Post() method once for each subscriber class called in the cycle, passing the Notifications property with the notifications received from that subscriber. The notifier class calls then passes the notifications to its alerting method(s), which may include the %SYS.Monitor.EmailOpens in a new tab class for generating email messages from notifications or any user-defined alerting method.
Caché Health Monitor
Caché Health Monitor monitors a running Caché instance by sampling the values of a broad set of key metrics during specific periods and comparing them to configured parameters for the metric and established normal values for those periods; if sampled values are too high, Health Monitor generates an alert (notification of severity 2) or warning (severity 1). For example, if CPU usage values sampled by Health Monitor at 10:15 AM on a Monday are too high based on the configured maximum value for CPU usage or normal CPU usage samples taken during the Monday 9:00 AM to 11:30 AM period, Health Monitor generates a notification.
This section covers the following topics:
Caché Health Monitor Overview
Health Monitor uses a fixed set of rules to evaluate sampled values and identify those that are abnormally high. This design is based on the approach to monitoring manufacturing processes described in the “Process or Product Monitoring and Control” section of the NIST/SEMATECH e-Handbook of Statistical MethodsOpens in a new tab, with deviation from normal values determined using rules based on the WECO statistical probability rules (Western Electric RulesOpens in a new tab), both adapted specifically for Caché monitoring purposes.
Health Monitor alerts (severity 2) and warnings (severity 1) are written to the console log (install-dir\mgr\cconsole.log). See Tracking System Monitor Notifications for information about ways to make sure you are aware of these notifications.
Health Monitor status messages (severity 0) are written to the System Monitor log (install-dir\mgr\SystemMonitor.log).
Unlike System Monitor and Application Monitor, Health Monitor runs only in the %SYS namespace.
This section contains the following subsections:
Health Monitor Process Description
By default, Health Monitor does not start automatically when the instance starts; for this to happen, you must enable Health Monitor within Caché System Monitor using the ^%SYSMONMGR utility. (You can specify an interval to wait after Caché starts before starting Health Monitor when it is enabled, allowing the instance to reach normal operating conditions before sampling begins.) You can always use the utility to see the current status of Health Monitor. For more information, see Using ^%SYSMONMGR to Manage Health Monitor, later in this chapter.
The basic elements of the Health Monitor process are described in the following:
-
Health Monitor samples 41 system sensors defined in SYS.Monitor.SystemSensors (see Default System Monitor Components).
Some sensors represent an overall metric for the Caché instance; for example, the LicensePercentUsed sensor samples the percentage of the instance’s authorized license units that are currently in use, while the JournalGrowthRate sensor samples the amount of data (in KB per minute) written to the instance’s journal files.
Other sensors apply to a particular sensor item, such as a database or mirror; for example, DBLatency sensors sample the time (in milliseconds) required to complete a random read on each mounted database, while DBReads sensors sample the number of reads per minute from each mounted database (the databases are specified by directory).
-
Each sensor is represented by a sensor object within Health Monitor that sets at least one and possibly three parameters, as follows:
-
a required base (minimum) value for sensor samples
-
optionally, either a maximum value and warning value, or a multiplier and warning multiplier
For example, by default the DBLatency sensor object specifies a base of 1000, a maximum value of 3000, and a warning value of 1000, while the DBReads sensor object specifies a base of 1024, a multiplier of 2, and a warning multiplier of 1.6.
-
-
Each sensor is sampled every 30 seconds during specified weekly, monthly, quarterly or yearly periods; samples below the base specified by the sensor object are discarded.
By default there are 63 weekly periods each of which represents one of nine specified intervals during a particular day of the week, for example 9:00 AM to 11:30 AM on Mondays, but you can configure your own periods.
-
To evaluate sensor samples, Health Monitor uses the sensor object parameters and, if necessary, a chart for each sensor during each period, containing previously collected samples and their mean value and the standard deviation from the mean, or sigma.
If a sensor object has maximum and warning values set, a chart is not required to evaluate samples for sensors using that object, because notifications are generated by comparing samples to those values (see Notification Rules). Under the default settings, therefore, charts are not required for DBLatency sensors.
For each sensor object that instead has multiplier values set, a chart is required. Under the default settings, therefore, charts are required for DBReads sensors. The chart for the DBReads c:\InterSystems\Cache\mgr\docbook sensor during the Monday 09:00 - 11:30 period, for example, might indicate the mean reads per minute from the DOCBOOK database during this period to be 2145, with a sigma of 141 and a highest single value of 2327.
If a chart for a sensor during a particular period is required but does not yet exist, it must be generated before samples taken during that period can be evaluated. When Health Monitor is active, therefore, each sensor is in one of two modes during any given period:
-
If a chart is required but does not exist, that sensor is automatically in analysis mode.
In analysis mode, Health Monitor simply records the samples it collects, then at the end of the period generates the required chart for the sensor. To ensure that the chart is reliable, a minimum of 13 samples must have been taken in analysis mode. Until 13 valid samples are taken within a single recurrence of a period, the sensor remains in analysis mode and no chart is generated for that period
Note:Charts should always be generated from samples taken during normal, stable operation of the Caché instance. For example, when a Monday 09:00 - 11:30 chart does not exist, it should not be generated on a Monday holiday or while a technical problem is affecting the operation of the Caché instance.
-
If a required chart exists, or no chart is required, that sensor is in monitoring mode.
In monitoring mode, Health Monitor collects samples to evaluate against the values in the sensor object or the existing chart. To ensure that notifications are not triggered by transient abnormal samples, every six sample values are averaged together to generate one reading every three minutes, and it is these readings that are evaluated.
-
-
Sensor readings are evaluated by the appropriate subscriber class (see The System Monitor Process). When a sequence of readings meets the criteria for a notification when compared to the sensor object settings and the appropriate chart (if required), the subscriber class generates an alert or a warning by passing a notification containing text and a severity code to the system notifier, SYS.Monitor.SystemNotifyOpens in a new tab.
Specifically, when three (3) consecutive readings exceed the maximum value for the sensor object, an alert (notification of severity 2) is generated; when five (5) consecutive readings exceed the warning value for the sensor object, a warning (notification of severity 1) is generated. Complete information about how the maximum and warning values are determined for each sensor object appears in the Notification Rules section, but examples are as follows:
-
The DBLatency sensor object has maximum and warning values set by default. Therefore, for the DBLatency c:\InterSystems\Cache\mgr\docbook sensor during the Monday 09:00 - 11:30 period, an alert is generated if three consecutive readings are greater than the sensor object maximum value (3000 by default).
-
The DBReads sensor object, on the other hand, has multipliers set by default, which means the maximum value is the multiplier times the greater of:
-
the mean in the chart plus three times the sigma in the chart
-
the highest value in the chart plus one sigma
So for the DBReads c:\InterSystems\Cache\mgr\docbook sensor during the same period, an alert is generated if three consecutive readings are greater than 5136—the default sensor object multiplier of 2 times 2568 (the chart mean of 2145 plus three times the sigma of 141), which is greater than 2468 (the high chart value of 2327 plus one sigma).
-
-
If the DBReads sensor object were edited to remove the multipliers, leaving it with only a base, an alert would be generated for DBReads c:\InterSystems\Cache\mgr\docbook if three consecutive readings were greater than 2568, which is the greater of
-
the mean in the chart plus three times the sigma in the chart
-
the highest value in the chart (2327)
-
Note:Because no chart is required to evaluate readings from sensors whose sensor objects have maximum and warning values specified, evaluation of these sensor readings and posting of any resulting notifications is handled by the SYS.Monitor.SystemSubscriber subscriber class, rather than the SYS.Monitor.Health.Control subscriber class (see Default System Monitor Components). As a result, notifications for these sensors are generated even when Health Monitor is not enabled (see Using ^%SYSMONMGR to Manage Health Monitor), as long as System Monitor is running.
If you want to generate notifications using values for some sensors represented by a given sensor object but using multipliers for others—for example, using values for DBLatency sensors for some databases but multipliers for others—you can do so by setting multipliers in the sensor object and manually creating charts for those for which you want to use absolute values; see Charts for more information.
-
-
When a period has recurred five times since a chart was generated for a sensor or sensor/item during that period, not including those during which an alert was generated, the readings from these five normal period recurrences are evaluated to detect a rising or shifted mean for the sensor. If the mean is rising or has shifted with 95% certainty, the chart is recalibrated—the existing chart for the sensor during that period is replaced with a chart generated from the samples taken during the most recent recurrence of the period. For example, if the number of users accessing a database is growing slowly but steadily, the mean DBReads value for that database is likely to also rise slowly but steadily, resulting in regular chart recalibration every five periods, which avoids unwarranted alerts.
Note that sensor object maximum and multiplier values cannot be automatically recalibrated in the same way, and should be adjusted manually because automatic chart recalibration does not apply to such sensors. For example, if the number of users accessing a database grows, the base, maximum value, and warning value for the DBLatency sensor object may require manual adjustment.
Health Monitor Elements and Extensions
Health Monitor is provided with a set of default elements that you can reconfigure and extend in various ways, as described in the following subsections:
Sensors and Sensor Objects
A Health Monitor sensor object represents one of the sensors in SYS.Monitor.SystemSensors. Each sensor object must provide a base value, and can optionally provide either a maximum value and warning value, or a multiplier and a warning multiplier; see Health Monitor Process Description and Notification Rules for information about how these values are used in evaluating sensor readings. The Health Monitor sensor objects are shown with their default parameters in the following table.
Where a sensor item is shown, the sensor object represents multiple sensors, one for each applicable item (job type, CSP server, database, or mirror); where there is no sensor item listed, the sensor object represents just one instance-wide sensor.
Sensor objects can be listed and edited (but not deleted) using the ^%SYSMONMGR utility as described in Configure Health Monitor Classes. Editing a sensor object allows you to modify one or all of its values. You can enter a base, maximum value, and warning value; a base, multiplier, and warning multiplier; or a base only.
Sensor Object | Sensor Item | Description | Base | Max | Mult | Warn | Warn Mult |
---|---|---|---|---|---|---|---|
CPUUsage | System CPU usage (percent). | 50 | 85 | — | 75 | — | |
CSPSessions |
IP_address:port |
Number of active CSP sessions on the listed CSP gateway server. |
100 | — | 2 | — | 1.6 |
CSPActivity | IP_address:port | Requests per minute to the listed CSP gateway server. | 100 | — | 2 | — | 1.6 |
CSPActualConnections | IP_address:port | Number of connections created on the listed CSP gateway server. | 100 | — | 2 | — | 1.6 |
CSPInUseConnections | IP_address:port | Number of currently active connections to the listed CSP gateway server. | 100 | — | 2 | — | 1.6 |
CSPPrivateConnections | IP_address:port | Number of private connections to the listed CSP gateway server. | 100 | — | 2 | — | 1.6 |
CSPUrlLatency | IP_address:port | Time (milliseconds) required to obtain a response from IP_address:port/csp/sys/UtilHome.csp. | 1000 | 5000 | — | 3000 | — |
CSPGatewayLatency | IP_address:port | Time (milliseconds) required to obtain a response from the listed CSP gateway server when fetching the metrics represented by the CSP sensor objects. | 1000 | 2000 | — | 1000 | — |
DBLatency |
database_directory |
Milliseconds to complete a random read from the listed mounted database. |
1000 | 3000 | — | 1000 | — |
DBReads |
database_directory |
Reads per minute from the listed mounted database. |
1024 | — | 2 | — | 1.6 |
DBWrites |
database_directory |
Writes per minute to the listed mounted database. |
1024 | — | 2 | — | 1.6 |
ECPAppServerKBPerMinute |
KB per minute sent to the ECP data server. |
1024 | — | 2 | — | 1.6 | |
ECPConnections |
Number of active ECP connections. |
100 | — | 2 | — | 1.6 | |
ECPDataServerKBPerMinute |
KB per minute received as ECP data server. |
1024 | — | 2 | — | 1.6 | |
ECPLatency |
Network latency (milliseconds) between the ECP data server and this ECP application server. |
1000 | 3000 | — | 3000 | — | |
ECPTransOpenCount | Number of open ECP transactions | 100 | — | 2 | — | 1.6 | |
ECPTransOpenSecsMax | Duration (seconds) of longest currently open ECP transaction | 60 | — | 2 | — | 1.6 | |
GlobalRefsPerMin |
Global references per minute. |
1024 | — | 2 | — | 1.6 | |
GlobalSetKillPerMin |
Global sets/kills per minute. |
1024 | — | 2 | — | 1.6 | |
JournalEntriesPerMin |
Number of journal entries written per minute. |
1024 | — | 2 | — | 1.6 | |
JournalGrowthRate |
Number of KB per minute written to journal files. |
1024 | — | 2 | — | 1.6 | |
LicensePercentUsed | Percentage of authorized license units currently in use. | 50 | — | 1.5 | — | — | |
LicenseUsedRate | License acquisitions per minute. | 20 | — | 1.5 | — | — | |
LockTablePercentFull |
Percentage of the lock table in use. |
50 | 99 | — | 85 | — | |
LogicalBlockRequestsPerMin |
Number of logical block requests per minute. |
1024 | — | 2 | — | 1.6 | |
MirrorDatabaseLatencyBytes |
mirror_name |
On the backup failover member of a mirror, number of bytes of journal data received from the primary but not yet applied to mirrored databases on the backup (measure of how far behind the backup’s databases are). |
2*107 | — | 2 | — | 1.6 |
MirrorDatabaseLatencyFiles | mirror_name | On the backup failover member of a mirror, number of journal files received from the primary but not yet fully applied to mirrored databases on the backup (measure of how far behind the backup’s databases are). | 3 | — | 2 | — | 1.6 |
MirrorDatabaseLatencyTime |
mirror_name |
On the backup failover member of a mirror, time (in milliseconds) between when the last journal file was received from the primary and when it was fully applied to the mirrored databases on the backup (measure of how far behind the backup’s databases are). |
1000 | 4000 | — | 3000 | — |
MirrorJournalLatencyBytes |
mirror_name |
On the backup failover member of a mirror, number of bytes of journal data received from the primary but not yet written to the journal directory on the backup (measure of how far behind the backup is). |
2*107 | — | 2 | — | 1.6 |
MirrorJournalLatencyFiles |
mirror_name |
On the backup failover member of a mirror, number of journal files received from the primary but not yet written to the journal directory on the backup (measure of how far behind the backup is). |
3 | — | 2 | — | 1.6 |
MirrorJournalLatencyTime |
mirror_name |
On the backup failover member of a mirror, time (in milliseconds) between when the last journal file was received from the primary and when it was fully written to the journal directory on the backup (measure of how far behind the backup is). |
1000 | 4000 | — | 3000 | — |
PhysicalBlockReadsPerMin |
Number of physical block reads per minute. |
1024 | — | 2 | — | 1.6 | |
PhysicalBlockWritesPerMin |
Number of physical block writes per minute. |
1024 | — | 2 | — | 1.6 | |
ProcessCount |
Number of active processes for the Caché instance. |
100 | — | 2 | — | 1.6 | |
RoutineCommandsPerMin |
Number of routine commands per minute. |
1024 | — | 2 | — | 1.6 | |
RoutineLoadsPerMin |
Number of routine loads per minute. |
1024 | — | 2 | — | 1.6 | |
RoutineRefsPerMin |
Number of routine references per minute. |
1024 | — | 2 | — | 1.6 | |
SMHPercentFull | Percentage of the shared memory heap (generic memory heap) in use. | 50 | 98 | — | 85 | — | |
ShadowConnectionsLatency | Network latency (milliseconds) of shadow server connections to this data source. | 1000 | — | 2 | — | 1.6 | |
ShadowLatency | Network latency (milliseconds) of this shadow server’s connection to data source. | 1000 | — | 2 | — | 1.6 | |
TransOpenCount | Number of open local transactions (local and remote). | 100 | — | 2 | — | 1.6 | |
TransOpenSecondsMax | Duration (seconds) of longest currently open local transaction. | 60 | — | 2 | — | 1.6 | |
WDBuffers |
Average number of database buffers updated per write daemon cycle. |
1024 | — | 2 | — | 1.6 | |
WDCycleTime |
Average number of seconds required to complete a write daemon cycle. |
60 | — | 2 | — | 1.6 | |
WDWIJTime |
Average number of seconds spent updating the write image journal (WIJ) per cycle. |
60 | — | 2 | — | 1.6 | |
WDWriteSize |
Average number of bytes written per write daemon cycle. |
1024 | — | 2 | — | 1.6 |
Some sensors are not sampled for all Caché instances. For example, the four ECP... sensors are sampled only on ECP data and application servers.
When you are monitoring a mirror member (see the “Mirroring” chapter of the Caché High Availability Guide), the following special conditions apply to Health Monitor:
-
No sensors are sampled while the mirror is restarting (for example, just after the backup failover member has taken over as primary) or if the member’s status in the mirror is indeterminate.
-
If a sensor is in analysis mode for a period and the member’s status in the mirror changes during the period, no chart is created and the sensor remains in analysis mode.
-
Only the MirrorDatabaseLatency* and MirrorJournalLatency* sensors are sampled on the backup failover mirror member.
-
All sensors except the MirrorDatabaseLatency* and MirrorJournalLatency* sensors are sampled on the primary failover mirror member.
Notification Rules
Health Monitor generates an alert (notification of severity 2) if three (3) consecutive readings of a sensor during a period are greater than the sensor maximum value, and a warning (notification of severity 1) if five (5) consecutive readings of a sensor during a period are greater than the sensor warning value. The maximum and warning values depend on the settings in the sensor object and whether the applicable chart was generated by Health Monitor or created by a user, as shown in the following table.
Note also that, as described in Health Monitor Process Description:
-
When a sensor object has maximum value and warning value set, no chart is required and therefore no chart is generated, and notifications are generated even when Health Monitor is disabled.
-
When a sensor object has multiplier and warning multiplier set, or base only, a chart is required; until sufficient samples have been collected in analysis mode to generate the chart, no notifications are generated.
-
When a user-created chart exists, it does not matter what the sensor object settings are.
Sensor Object Settings | Chart Type | Sensor Maximum Value | Sensor Warning Value | Active When |
---|---|---|---|---|
base, maximum value, warning value | none | sensor object maximum value | sensor object warning value | System Monitor running |
base, multiplier, warning multiplier | generated | sensor object multiplier times greater of:
|
sensor object warning multiplier times greatest of:
|
System Monitor running, Health Monitor enabled |
base only | generated | greater of:
|
greater of:
|
|
(n/a if user-created chart exists) | user-created | chart alert value | chart warning value |
Periods
By default there are 63 recurring weekly periods during which sensors are sampled. Each of these periods represents one of the following specified intervals during a particular day of the week:
00:15 a.m. – 02:45 a.m. | 03:00 a.m. – 06:00 a.m. | 06:15 a.m. – 08:45 a.m. |
09:00 a.m. – 11:30 a.m. | 11:45 a.m. – 01:15 p.m. |
01:30 p.m. – 04:00 p.m. |
04:15 p.m. – 06:00 p.m. |
06:15 p.m. – 08:45 p.m. |
09:00 p.m. – 11:59 p.m. |
You can list, add and delete periods using the Configure Periods option in the Configure Health Monitor Classes submenu of the ^%SYSMONMGR utility. You can add monthly, quarterly or yearly periods as well as weekly periods.
Quarterly periods are listed in three-month increments beginning with the month specified as the start month; for example, if you specify 5 (May) as the starting month, the quarterly cycle repeats in August (8), November (11) and February (2).
Descriptions are optional for user-defined periods.
Charts
Health Monitor generates charts containing the readings taken for each sensor during analysis mode for each period, and the mean and sigma calculated from those readings. The mean, sigma and highest single value in the chart are used to evaluate some sensor readings, as described in Notification Rules.
The Configure Charts option of the ^%SYSMONMGR utility Configure Health Monitor Classes submenu lets you display a list of all current charts. including the mean and sigma of each, and also display the details of a particular chart, including the individual readings and highest reading.
The Configure Charts option also provides two ways to customize alerting by customizing charts.
-
By editing an existing chart, you can change the mean and/or sigma to whatever values you wish. The standard notification rules apply, but using the values you have entered.
-
You can create a chart, specifying an alert value and a warning value. When you do this, the sensor object settings no longer apply; alerts and warnings are generated based solely on the values you supply for the chart.
When listing, examining, editing or creating charts, the Item heading or prompt refers to a job type, a database directory path specifying a database, an IP address specifying a CSP gateway server, or a mirror name specifying a mirror; see Sensors and Sensor Objects for more information.
You can also programmatically build chart statistics based on a list of values with the following SYS.Monitor.Health.ChartOpens in a new tab class methods:
-
CreateChart()Opens in a new tab — Creates a chart for a specific period/sensor, evaluates the list of values, and sets the resulting mean and sigma values.
-
SetChartStats()Opens in a new tab — Evaluates the list of values and sets the resulting mean and sigma values for a specified period/sensor.
For more information, see the SYS.Monitor.Health.ChartOpens in a new tab class documentation.
A chart generated by Health Monitor, including one you have edited, can be automatically recalibrated as described in the final step of Health Monitor Process Description. In addition, all charts generated by Health Monitor, including those that have been edited, are deleted when a Caché instance is upgraded. A chart created using the Configure Charts submenu or the CreateChart()Opens in a new tab class method, however, is never automatically recalibrated or deleted on upgrade. A user-created chart is therefore permanently associated with the selected sensor/period combination until you select the Reset Charts option within the Reset Defaults option of the Configure Health Monitor Classes submenu or select Recalibrate Charts within the Configure Charts option.
Using ^%SYSMONMGR to Manage Health Monitor
As described in Using the ^%SYSMONMGR Utility, the ^%SYSMONMGR utility lets you manage and configure System Monitor, including Health Monitor. To manage Health Monitor, change to the %SYS namespace in the Terminal, then enter the following command:
%SYS>do ^%SYSMONMGR
1) Start/Stop System Monitor
2) Set System Monitor Options
3) Configure System Monitor Classes
4) View System Monitor State
5) Manage Application Monitor
6) Manage Health Monitor
7) View System Data
8) Exit
Option?
Enter 6 for Manage Health Monitor. The following menu displays:
1) Enable/Disable Health Monitor 2) View Alerts Records 3) Configure Health Monitor Classes 4) Set Health Monitor Options 5) Exit Option?
Enter the number of your choice or press Enter to exit the Health Monitor utility.
Health Monitor runs only in the %SYS namespace. When you start ^%SYSMONMGR in another namespace, option 6 (Manage Health Monitor) does not appear.
The options in the main menu let you perform Health Monitor tasks as described in the following table:
Option | Description |
---|---|
1) Enable/Disable Health Monitor |
|
2) View Alert Records |
|
3) Configure Health Monitor Classes |
|
4) Set Health Monitor Options |
|
When the utility asks you to specify a single element such as a sensor, rule, period or chart, you can enter ? (question mark) at the prompt for a numbered list, then enter the number of the element you want.
All output from the utility can be displayed on the Terminal or sent to a specified device.
View Alerts Records
Choose this option to view recently generated alerts for a specific sensor, or for all sensors. You can examine the details of individual alerts and warnings, including the mean and sigma of the chart and the readings that triggered the notification. (Alert records are purged after a configurable number of days; see the Set Health Monitor Options for more information.).
Configure Health Monitor Classes
The options in this submenu let you customize Health Monitor, as described in the following table.
You cannot use these options to customize Health Monitor while System Monitor is running; you must first stop System Monitor, and then restart it after you have made your changes.
Option | Description |
---|---|
1) Activate/ Deactivate Rules |
(not in use in this release) |
2) Configure Periods |
List the currently configured periods and add and delete periods. |
3) Configure Charts |
Lets you
|
4) Edit Sensor Objects |
List the sensor objects representing the sensors in the SYS.Monitor.SystemSensors class and modify their base, maximum, warning, multiplier, and warning multiplier values. |
5) Reset Defaults |
Lets you
|
Set Health Monitor Options
This submenu lets you set several Health Monitor options, as shown in the following table:
Option | Description |
---|---|
1) Set Startup Wait Time |
Configure the number of minutes System Monitor waits after starting, when Health Monitor is enabled, before passing sensor readings to the Health Monitor subscriber, SYS.Health.Monitor.Control. This allows Caché to reach normal operating conditions before Health Monitor begins creating charts or evaluating readings. |
2) Set Alert Purge Time | Specify when an alert record should be purged (deleted); the default is five days after the alert is generated. |
Caché Application Monitor
Caché Application Monitor monitors a user-extensible set of metrics, maintains a persistent repository of the data it collects, and triggers user-configured alerts.
This section covers the following topics:
Application Monitor Overview
Caché Application Monitor (Application Monitor) is an extensible utility that monitors a user-selected set of system and user-defined metrics in each startup namespace configured in System Monitor. As described in Default System Monitor Components, when %SYS.Monitor.AppMonSensor, the Application Monitor sensor class, is called by System Monitor, it samples metrics, evaluates the samples, and generates its own notifications. (Unlike System Monitor and Health Monitor notifications, these are not written to the console log.) Specifically, Application Monitor does the following in each System Monitor startup namespace:
-
Starts when System Monitor starts.
-
Lets you register the provided system monitor classes (they are registered in %SYS by default).
-
Lets you activate the system and user-defined classes you want to monitor. You can activate any registered system class; you can activate any user-defined class that is present in the local namespace. For example, if you have created a user-defined class only in the USER namespace, you can activate that class only in the USER namespace.
-
Monitors each active class by sampling the metrics specified by the class. These metrics represent the properties returned by the sample class called by the GetSample() method of the monitor class. For example, the %Monitor.System.LockTableOpens in a new tab class calls %Monitor.System.Sample.LockTableOpens in a new tab which returns (among others) the properties TotalSpace, containing the total size of the lock table, and UsedSpace, containing the size of the lock table space in use. The sampled data, along with monitor and class metadata, is stored in the local namespace and can be accessed by all object and SQL methods.
-
If an alert is configured for a class and the class returns a property value satisfying the evaluation expression configured in it, generates an email message or calls a class method, if one of these actions is configured in the alert. For example, you can first configure email notifications to a list of recipients, then configure an alert for the %Monitor.System.LockTableOpens in a new tab class, specifying that an email be sent when the ratio of the UsedSpace property of %Monitor.System.Sample.LockTableOpens in a new tab to the TotalSpace property is greater than .9 (90% full).
The %Monitor.System.HistorySysOpens in a new tab and %Monitor.System.HistoryPerfOpens in a new tab classes provided with Application Monitor, when activated, create and maintain a historical database of system usage and performance metrics to help you analyze system usage and performance issues over time. These classes and %Monitor.System.HistoryUserOpens in a new tab run only in %SYS and cannot be registered in other namespaces. See the “Caché History Monitor” chapter of this guide for more information about these classes and the historical database.
Using ^%SYSMONMGR to Manage Application Monitor
As described in Using the ^%SYSMONMGR Utility, the ^%SYSMONMGR utility lets you manage and configure System Monitor, including Application Monitor. The utility can be executed in any namespace, and changes made with it affect only the namespace in which it is started. You must maintain a separate Application Monitor configuration for each startup namespace you configure by starting ^%SYSMONMGR in that namespace.
Following any change you make to the Application Monitor configuration, such as activating a class, you must restart System Monitor in the namespace in which you made the change for the change to take effect.
To manage Application Monitor, enter the following command in the Terminal:
%SYS>do ^%SYSMONMGR
then enter 5 for Manage Application Monitor. The following menu displays:
1) Set Sample Interval
2) Manage Monitor Classes
3) Change Default Notification Method
4) Manage Email Options
5) Manage Alerts
6) Debug Monitor Classes
7) Exit
Option?
Enter the number of your choice or press Enter to exit the Application Monitor utility.
Manage Application Monitor
The options in the main menu let you manage Application Monitor as described in the following table:
Option | Description |
---|---|
1) Set Sample Interval |
Sets the interval at which metrics are sampled; the default is 30 seconds. This setting can be overridden for an individual class by setting a class-specific interval using the 5) Set Class Sample Interval option on the Manage Monitor Classes submenu.
Note:
As described in Set System Monitor Options, System Monitor calls each configured sensor class, including %SYS.Monitor.AppMonSensor, every 30 seconds by default, but this setting can also be changed. If the Application Monitor sampling interval or a class-specific interval is different from the System Monitor interval, whichever interval is longer is in effect. For example, if the System Monitor interval is 30 and the Application Monitor interval is 120, all active Application Monitor classes are sampled every 120 seconds; if the System Monitor interval is 60 and the %Monitor.System.LockTableOpens in a new tab class interval is 20, the class is sampled every 60 seconds. |
2) Manage Monitor Classes | Displays the Manage Monitor Classes submenu which lets you manage system- and user-defined monitor classes in the namespace in which you are running the Application Monitor Manager. |
3) Change Default Notification Method | Lets you specify the default action for alerts when triggered. Any alerts you create use this action unless you specify otherwise. |
4) Manage Email Options | Displays the Monitor Email Options submenu which lets you enable and configure email notifications so you can specify this action in alerts. |
5) Manage Alerts | Displays Manage Alerts submenu which lets you create alerts for system and user-defined monitor classes. |
6) Debug Monitor Classes | Displays Debug Monitor Classes menu which lets you enable and disable debugging as well as lists errors. |
Manage Monitor Classes
This submenu lets you manage system and user-defined monitor classes. Enter the number of your choice or press Enter to return to the main menu:
Option? 2
1) Activate/Deactivate Monitor Class
2) List Monitor Classes
3) Register Monitor System Classes
4) Remove/Purge Monitor Class
5) Set Class Sample Interval
6) Exit
Option?
This submenu displays a list of menu items that let you manage the system- and user-defined classes as described in the following table:
Option | Description |
---|---|
1) Activate / Deactivate Monitor Class |
Application Monitor samples active classes only. This option lets you activate an inactivate class, or deactivate an active one. You can display a numbered list of the system and user-defined classes registered in the local namespace, including the activation state of each, by entering ? at the Class? prompt, then enter either the number or class name. |
2) List Monitor Classes | Displays a list of the system and user-defined classes registered in the local namespace, including the activation state of each. |
3) Register Monitor System Classes | Registers all system monitor classes (except the %Monitor.System.HistorySysOpens in a new tab, %Monitor.System.HistoryPerfOpens in a new tab, and %Monitor.System.HistoryUserOpens in a new tab classes) and stores them in the local namespace. System classes must still be activated using option 1) Activate/Deactivate Monitor Class on this menu for sampling to begin. |
4) Remove/Purge Class | Removes a monitor class from the list of classes in the local namespace. You can display a numbered list of the system and user-defined classes registered in the local namespace, including the activation state of each, by entering ? at the Class? prompt, then enter either the number or class name.
Note:
This option does not remove the class, but simply removes the name of the class from the list of registered classes that can be activated. To reset the list, choose option 3) Register Monitor System Classes on this menu. |
5) Set Class Sample Interval |
Lets you override the default Application Monitor sampling interval, specified by the 1) Set Sample Interval option of the Manage Application Monitor menu, for a single class. The default is 0, which means the class does not have a class-specific sample interval.
Note:
See the description of the Set Sample Interval option for an explanation of precedence between this setting, the Set Sample Interval setting, and the System Monitor sample interval discussed in Set System Monitor Options. |
Change Default Notification Method
When you create an alert, you specify an action to be taken when it is triggered; the default choice for this action is the default notification method, set using this option. Enter the number of your choice or press Enter to return to the main menu:
Option? 3
Notify Action (0=none,1=email,2=method)? 0 =>
The choice you make with this option is used when you configure an alert to use the default notification method, as described in the following table:
Input Field | Description |
---|---|
0 | Do not take action when an alert is triggered. |
1 | Send an email message to the configured recipients when an alert is triggered. For information about configuring email, see Manage Email Options. |
2 |
Call a notification method when an alert is triggered. If you select this action, the method is called with two arguments – the application name specified in the alert and a %List object containing the properties returned to the monitor class by the sample class (as described in Application Monitor Overview). When prompted, enter the full class name and method, that ispackagename.classname.method. This method must exist in the local namespace. |
Manage Email Options
The options in this submenu let you configure and enable email. When email is enabled, Application Monitor sends email notifications when an alert configured for them is triggered. Enter the number of your choice or press Enter to return to the main menu:
Option? 4
1) Enable/Disable Email
2) Set Sender
3) Set Server
4) Manage Recipients
5) Set Authorization
6) Test Email
7) Exit
Option?
The options in this submenu let you manage the email notifications for the Application Monitor as described in the following table:
Option | Description |
---|---|
1) Enable / Disable Email |
Enabling email makes it possible for Application Monitor to send email notifications when alerts are triggered, if configured. Disabling email prevents Application Monitor from sending email notifications when an alert is triggered.
Note:
It is not necessary to reconfigure email options when you disable and then reenable email. |
2) Set Sender | This option is required. Enter text identifying the sender of the email. Depending on the specified outgoing mail server, this may or may not have to be a valid email account. |
3) Set Server | This option is required. Enter the name of the server that handles outgoing email for your site. If you are not sure, your IT staff should be able to provide this information. |
4) Manage Recipients |
This option displays a submenu that lets you list, add, or remove valid email addresses of recipients: 1) List Recipients 2) Add Recipient 3) Remove Recipient 4) Exit When adding or removing recipients, email addresses must be entered individually, one per line. Addresses of invalid format are rejected. |
5) Set Authorization | Lets you specify the authorization username and password if required by your email server. Consult your IT staff to obtain this information. If you do not provide entries, the authorization username and password are set to NULL. |
6) Test Email | Sends a test message to the specified recipients using the specified email server. If the attempt fails, the resulting error message may help you fix the problem. |
Manage Alerts
An alert specifies
-
a condition within the namespace that is of concern to you, defined by the values of properties sampled by a monitor class
-
an action to be taken to notify you when that condition occurs
To return to the previous example, you might create an alert specifying
-
condition: the lock table is over 90% full, defined by the UsedSpace property returned when the %Monitor.System.LockTableOpens in a new tab class calls %Monitor.System.Sample.LockTableOpens in a new tab being more than 90% of the TotalSpace property
-
action: send an email notification
The definition of a condition based on properties is called an evaluation expression; after specifying the properties of the sample class you want to use, you specify the evaluation expression. Properties are indicated in the expression by placeholders corresponding to the order in which you provide them; for example, if when creating the lock table alert you specify the UsedSpace property first and then the TotalSpace property, you would enter the evaluation expression as %1 / %2 > .9, but if you enter the properties in the reverse order, the expression would be %2 / %1 > .9.
When the alert menu displays, enter the number of your choice or press Enter to return to the main menu:
Option? 2
1) Create Alert
2) Edit Alert
3) List Alerts
4) Delete Alert
5) Enable/Disable Alert
6) Exit
Option?
The options in this submenu let you manage alerts for the Application Monitor as described in the following table:
Input Field | Description |
---|---|
1) Create Alert | Lets you define a new alert. For a description of the prompts and responses, see the Responses to Alert Prompts. The newly created alert is enabled by default. |
2) Edit Alert |
Lets you modify an existing alert. Enter the name of the alert to edit, or enter ? for a list of existing alerts and then enter a number or name.
Note:
You must respond to all prompts including those that you do not want to modify; that is, you must re-enter information for fields that you do not want to modify as well as the revised information for the fields you are modifying. For a description of the prompts and responses, see the Responses to Alert Prompts. |
3) List Alerts |
Displays the definitions of all alerts in the local namespace; for example: Alert: LockTable90 USER Action: email Class: %Monitor.System.LockTable Property: UsedSpace,TotalSpace Expression: %1/%2>.9 Notify Once: True Enabled: Yes |
4) Delete Alert |
Lets you delete an existing alert. Enter the name of the alert to edit, or enter ? for a list of existing alerts and then enter a number or name.
Note:
Each alert must be entered individually; that is, you cannot specify a series or range of alerts to delete. |
5) Enable / Disable Alert |
Enabling an alert activates it. Disabling an alert deactivates it.
Note:
It is not necessary to reconfigure alert options when you disable and then reenable an alert. |
The following table describes the valid responses to Alert prompts:
Input Field | Description |
---|---|
Alert Name? |
Enter an alphanumeric name. To display a numbered list of alert names already defined in the local namespace, enter ? at the Alert Name? prompt. |
Application? | Enter descriptive text to be passed to the email message or notification method. This text can include references to the properties you specify at the Property? prompt later in the procedure in the form %N, where %1 refers to the first property in the list of properties, %2 the second property, and so on. |
Action (0=default, 1=email, 2=method)? |
Specifies the action to take when the alert is triggered. Enter one of the following options:
|
Raise this alert during sampling?
orDefine a trigger for this alert? |
The first prompt is displayed when are creating an alert; the send prompt is displayed when you are editing an alert for which you entered No at the first prompt when creating it. Enter one of the following:
|
Class? |
Enter the name of a system or user-defined monitor class registered in the local namespace. To display a numbered list of registered classes in the local namespace, including its activation state, enter ? at the Class? prompt, then enter a number or name.
Note:
You can create an alert for an inactive class. An alert is not removed when the class it is configured for is removed. |
Property? |
Enter the name of a property defined in the class specified in the preceding prompt that you are using in the evaluation expression, in the descriptive text, or in both.. To display a numbered list of properties defined in the named class, enter ? at the Property? prompt, then enter a number or name. Each property must be entered individually. When you are done, press Enter at an empty prompt to display the list of properties in the order in which you specified them. |
Evaluation expression? | Expression used to evaluate the properties specified at the Property? prompt. For example, in (%1 = "User") && (%2 < 100), %1 refers to the first property in the list of properties, %2 the second property, and so on. |
Notify once only? |
Enter one of the following:
|
Debugging Monitor Classes
This submenu lets you manage system debugging.
Debugging monitor classes adds the capability to capture any errors generated during the collection of sample values by user-defined Application Monitor classes.
Enter the number of your choice or press Enter to return to the main menu:
Option? 6
1) Enable Debug
2) Disable Debug
3) List Errors
4) Exit
Option?
The options in this submenu let you manage debugging for Application Monitor as described in the following table:
Input Field | Description |
---|---|
1) Enable Debug |
Lets you enable debugging. If the class is not creating sample values, then you can check to see if errors are preventing the sample values from being saved. |
2) Disable Debug |
Lets you disable debugging. |
3) List Errors |
Displays the definitions of all errors in the local namespace; for example: %Save(), %New(), Initialize() and GetSample(). Enable debugging for the class using ^%SYSMONMGR, and the System Monitor will save the last error caught for specific methods within the class. |
Application Monitor Metrics
The system monitor classes included with Application Monitor call various sample classes, as shown in the following table:
For a list of properties corresponding to the sample metrics in each category, see the InterSystems Class Reference.
Similar functions that control the MONITOR facility are available through the classes in the %Monitor.System package, which also allows you to save the data as a named collection in a persistent object format. See the %Monitor.System.Sample package classes and the %Monitor.System.SystemMetricsOpens in a new tab class documentation in the InterSystems Class Reference for more details.
Generating Metrics
The %Monitor.SampleAgent class does the actual sampling, invoking the Initialize() and GetSample() methods of the metrics classes.
The %Monitor.SampleAgent.%New(n) constructor takes one argument, the name of the metrics class it is to run. It creates an instance of that class, and invokes the Startup() method on that class. Then, each time the %Monitor.SampleAgent.Collect() method is invoked, the Sample Agent invokes the Initialize() method for the class, then repeatedly invokes the GetSample() method for the class. On each call to GetSample(), %Monitor.SampleAgent creates a sample class for the metrics class; the pseudocode for these operations is:
set sampler = ##class(%Monitor.SampleAgent).%New("MyMetrics.Freespace")
/* at this point, the sampler has created an instance of MyMetrics.Freespace,
and invoked its Startup method */
for I=1:1:10 { do sampler.Collect() hang 10 }
/* at each iteration, sampler calls MyMetrics.Freespace.Initialize(), then loops
on GetSample(). Whenever GetSample() returns $$$OK, sampler creates a new
MyMetrics.Sample.Freespace instance, with the sample data. When GetSample()
returns an error value, no sample is created, and sampler.Collect() returns. */
Viewing Metrics Data
All metrics classes are CSP-enabled; the CSP code is generated automatically when the sample class is generated. Therefore, the simplest way to view metrics is using a web browser; for example, based on the example in Generating Metrics and assuming a superserver port of 57772, the CSP URL is: http://localhost:57772/csp/user/MyMetrics.Sample.Freespace.clsOpens in a new tab, which displays output similar to:
Monitor - Freespace c:\cache51\
Name of dataset: c:\cache51\
Current amount of Freespace: 8.2MB
Monitor - Freespace c:\cache51\mgr\
Name of dataset: c:\cache51\mgr\
Current amount of Freespace: 6.4MB
Alternatively, you can use the Display(metric_class) method of the %Monitor.ViewOpens in a new tab class; for example:
%SYS>set mclass="Monitor.Test.Freespace"
%SYS>set col=##class(%Monitor.SampleAgent).%New(mclass)
%SYS>write col.Collect()
1
%SYS>write ##class(%Monitor.View).Display(mclass)
Monitor - Freespace c:\cache51\
Name of dataset: c:\cache51\
Current amount of Freespace: 8.2MB
Monitor - Freespace c:\cache51\mgr\
Name of dataset: c:\cache51\mgr\
Current amount of Freespace: 6.4MB
The URL for a class with % (percent sign) in the name must use %25 in its place. For example, for the %Monitor.System.FreespaceOpens in a new tab class, use
http://localhost:57772/csp/sys/%25Monitor.System.Freespace.cls
Writing User-Defined Application Monitor Classes
In addition to the provided system classes, you can write your monitor and sample classes to monitor user application data and counters.
A monitor class is any class that inherits from the abstract Monitor class, %Monitor.AdaptorOpens in a new tab; the %Monitor.System classes are examples of such classes. To create your own user-defined monitor classes:
-
Run ^%MONAPPMGR in the namespace where you want to monitor data. Use option 2 to list monitor classes, and within that menu, use option 3 to register monitor system classes.
SAMPLES>d ^%MONAPPMGR 1) Set Sample Interval 2) Manage Monitor Classes 3) Change Default Notification Method 4) Manage Email Options 5) Manage Alerts 6) Exit Option? 2 1) Activate/Deactivate Monitor Class 2) List Monitor Classes 3) Register Monitor System Classes 4) Remove/Purge Monitor Class 5) Set Class Sample Interval 6) Exit Option? 3 Exporting to XML started on 06/21/2022 12:52:36 Exporting class: Monitor.Sample Export finished successfully. Load started on 06/21/2022 12:52:36 Loading file C:\InterSystems\SRCCTRL\mgr\Temp\t0jFhPqLkZoYAA.stream as xml Imported class: Monitor.Sample Compiling class Monitor.Sample Compiling table Monitor.Sample Compiling routine Monitor.Sample.1 Load finished successfully. 1) Activate/Deactivate Monitor Class 2) List Monitor Classes 3) Register Monitor System Classes 4) Remove/Purge Monitor Class 5) Set Class Sample Interval 6) Exit Option?
-
Write a class that inherits from %Monitor.AdaptorOpens in a new tab. The inheritance provides persistence, parameters, properties, code generation, and a projection that generates the monitor metadata from your class definition. See the %Monitor.AdaptorOpens in a new tab class documentation for full details on this class, as well as the code you must write.
-
Compile your class. Compiling classes that inherit from %Monitor.AdaptorOpens in a new tab generates new sample classes in a subpackage of the users class called Sample. For example, if you compile A.B.MyMetric, a new class is generated in A.B.Sample.MyMetric. You do not need to do anything with the generated class.
Important:When deleting application monitor classes, only the monitor class should be deleted; that is, do not delete generated sample classes. Use the Management Portal to delete only the monitor class (for example, A.B.MyMetric) from which the sample class (for example, A.B.Sample.MyMetric) is generated; this automatically deletes both the monitor class and generated sample class.
All sample classes are automatically CSP-enabled, so that sample data for the user's metrics may be viewed by pointing to A.B.Sample.MyMetric.cls. Application Monitor automatically invokes this class and generates data and alerts, if the class has been activated; for information about activating monitor classes, see Manage Monitor Classes.
The SECURITYRESOURCE parameter is empty in %Monitor.AdaptorOpens in a new tab, and therefore in user classes inheriting from %Monitor.AdaptorOpens in a new tab unless explicitly modified. Code generation copies the SECURITYRESOURCE value from the user-defined class to the generated sample class. See %CSP.Page Class Parameters in Using Caché Server Pages (CSP) for information about the SECURITYRESOURCE parameter.
The following simple example retrieves the free space for each dataset in a Caché instance. For detailed instructions for creating a user-defined Application Monitor class and alert to send email notifications when an application error occurs, written by an InterSystems senior technical trainer and accompanied by downloadable code, see Creating a Custom Application Monitor Class and an AlertOpens in a new tab on InterSystems Developer Community.
Each sampling requests n instances of sample data objects, each instance corresponding to a dataset. In this example, each instance has only a single property, the free space available in that dataset when the sample was collected.
-
Create a class that inherits from %Monitor.AdaptorOpens in a new tab:
Class MyMetric.Freespace Extends %Monitor.Adaptor [ ProcedureBlock ] { }
-
Add properties that you want to be part of the sample data. They must be of %Monitor types:
-
Gauge
-
Integer
-
Numeric
-
String
For example:
Class MyMetric.Freespace Extends %Monitor.Adaptor [ ProcedureBlock ] { /// Name of dataset Property DBName As %Monitor.String(CAPTION = "Database Name"); /// Current amount of Freespace Property FreeSpace As %Monitor.String; }
-
-
Add an INDEX parameter that tells which fields form a unique key among the instances of the samples:
Class MyMetric.Freespace Extends %Monitor.Adaptor [ ProcedureBlock ] { Parameter INDEX = "DBName"; }
-
Add control properties as needed, marking them [Internal] so they do not become part of the storage definition in the generated class.
Class MyMetric.Freespace Extends %Monitor.Adaptor [ ProcedureBlock ] { /// Result Set Property Rspec As %Library.ResultSet [Internal]; }
-
Override a method named Initialize(). Initialize is called at the start of each metrics gathering run.
Class MyMetric.Freespace Extends %Monitor.Adaptor [ ProcedureBlock ] { /// Initialize the list of datasets and freespace. Method Initialize() As %Status { set ..Rspec = ##class(%Library.ResultSet).%New("SYS.Database:FreeSpace") do ..Rspec.Execute("*",0) return $$$OK } }
-
Override a method named GetSample(). GetSample() is called repeatedly until a status of 0 is returned. You write code to populate the metrics data for each sample instance.
Class MyMetric.Freespace Extends %Monitor.Adaptor [ ProcedureBlock ] { /// Get dataset metric sample. /// A return code of $$$OK indicates there is a new sample instance. /// A return code of 0 indicates there is no sample instance. Method GetSample() As %Status { // Get freespace data set stat = ..Rspec.Next(.sc) // Quit if we have done all the datasets if 'stat { Quit 0 } // populate this instance set ..DBName = ..Rspec.Get("Directory") set ..FreeSpace = ..Rspec.Get("Available") // quit with return value indicating the sample data is ready return $$$OK } }
-
Compile the class. The class is shown below:
Class MyMetric.Freespace Extends %Monitor.Adaptor { Parameter INDEX = "DBName"; /// Name of dataset Property DBName As %Monitor.String; /// Current amount of Freespace Property FreeSpace As %Monitor.String; /// Result Set Property Rspec As %Library.ResultSet [Internal]; /// Initialize routine metrics. Method Initialize() As %Status { set ..Rspec = ##class(%Library.ResultSet).%New("SYS.Database:FreeSpace") do ..Rspec.Execute("*",0) return $$$OK } /// Get routine metric sample. /// A return code of $$$OK indicates there is a new sample instance. /// Any other return code indicates there is no sample instance. Method GetSample() As %Status { // Get freespace data Set stat = ..Rspec.Next(.sc) // Quit if we have done all the datasets if 'stat { Quit 0 } // populate this instance set ..DBName = ..Rspec.Get("Directory") set ..FreeSpace = ..Rspec.Get("Available") // quit with return value indicating the sample data is ready return $$$OK } }
-
Additionally, you can override the Startup() and Shutdown() methods. These methods are called once when sampling begins, so you can open channels or perform other one-time-only initialization:
Class MyMetric.Freespace Extends %Monitor.Adaptor [ ProcedureBlock ] { /// Open a tcp/ip device to send warnings Property io As %Status; Method Startup() As %Status { set ..io="|TCP|2" set host="127.0.0.1" open ..io:(host:^serverport:"M"):200 } Method Shutdown() As %Status { close ..io } }
-
Compiling the class creates a new class, MyMetric.Sample.Freespace in the MyMetric.Sample package :
/// Persistent sample class for MyMetric.Freespace Class MyMetric.Sample.Freespace Extends Monitor.Sample { Parameter INDEX = "DBName"; Property Application As %String [ InitialExpression = "MyMetric" ]; /// Name of dataset Property DBName As %Monitor.String(CAPTION = ""); /// Current amount of Freespace Property FreeSpace As %Monitor.String(CAPTION = ""); Property GroupName As %String [ InitialExpression = "Freespace" ]; Property MetricsClass As %String [ InitialExpression = "MyMetric.Freespace" ]; }
Note:You should not modify this class. You may, however, inherit from it to write custom queries against your sample data.
Important:If you do modify and recompile an active user-defined Application monitor class, the class is deactivated and the class-specific sample interval override, if any, is removed; to restor it, you must activate it, reset the sample interval if desired, and restart of System Monitor. If System Monitor is running when you modify and recompile a user-defined class, all alerts based on the class are deleted.