Using Caché Monitor
Caché Monitor monitors the Caché instance’s console log for errors and traps reported by Caché daemons and user processes and generates corresponding notifications, including email if configured. This chapter discusses the following topics:
Caché System Monitoring Tools
Caché provides three sets of tools for general monitoring of Caché instances, as follows:
The Management Portal provides several pages and log files that let you monitor a variety of system indicators, system performance, Caché locks, and errors and traps, as described in the “Monitoring Caché Using the Management Portal” chapter of this guide. Of these, the console log (install-dir\mgr\cconsole.log by default) is the most comprehensive, containing general messages, startup/shutdown, license, and network errors, certain operating system errors, and indicators of the success or failure of jobs started remotely from other systems, as well as alerts, warnings and messages from Caché System Monitor.
Caché Monitor, as described in this chapter, generates notifications for console log entries of a configured minimum severity and either writes them to the alerts log or emails them to specified recipients. This allows console log alerts of all types to be extracted and brought to the attention of system operators.
Caché System Monitor generates alerts and warnings related to important system status and resource usage indicators and also incorporates Caché Application Monitor and Caché System Health Monitor, which monitor system and user-defined metrics and generate alerts and warnings when abnormal values are encountered. System Monitor and Health Monitor alerts and warnings are written to the console log; Application Monitor alerts can be sent by email or passed to a specified notification method. System Monitor (including Application Monitor and Health Monitor) is managed using the ^%SYSMONMGR utility. See the “Caché System Monitor” chapter of this guide for detailed information about using System Monitor, Application Monitor and Health Monitor.
Caché Monitor Overview
Caché Monitor scans the console log at regular intervals for entries of the configured severity level and generates corresponding notifications. These notifications are either written to the alerts log or sent by email to specified recipients.
Caché writes general messages, errors and traps, and the success or failure of jobs started remotely from other systems to the console log; see Monitoring Log Files in the “Monitoring Caché Using the Management Portal” chapter of this guide for more information. In addition, Caché System Monitor write alerts and warnings to the console log. By generating notifications based on console log contents, Caché Monitor bring alerts to the attention of system operators.
Caché Monitor does not generate a notification for every console log entry of the configured severity. When there is a series of entries from a given process within less than about an hour of each other, a notification is generated for the first entry only. For this reason, you should immediately consult the console log (and view System Monitor alerts, if applicable) on receiving a single notification from Caché Monitor. However, the console log entries listed in Caché Monitor Errors and Traps always generate notifications.
Caché Monitor operates with the following settings by default:
Caché Monitor is continuously running when the instance is running.
The console log is scanned every 10 seconds.
Notifications are generated for console log entries of severity 2 (severe) and 3 (fatal).
Notifications are written to the alerts log.Note:
You can view the alerts log in the Management Portal by navigating to the System Logs page (System Operation > System Logs) and selecting System Monitor Log, then using the Browse button to select the alerts.log file.
The alerts log is not created until Caché Monitor writes its first notification to the log.
You can configure and manage Caché Monitor, including changing its default settings and configuring email notifications, using the interactive ^MONMGR utility.
Using the ^MONMGR Utility
The Caché Monitor Manager (^MONMGR) utility must be executed in the %SYS namespace (the name is case-sensitive).
To start the Caché Monitor Manager, enter the following command in the Terminal:
%SYS>do ^MONMGRCopy code to clipboard
The main menu appears. Enter the number of your choice or press Enter to exit the Caché Monitor Manager:
1) Start/Stop/Update MONITOR 2) Manage MONITOR Options 3) Exit Option?Copy code to clipboard
The options in the main menu let you manage Caché Monitor as described in the following table:
|1) Start / Stop / Update Monitor||Displays the Start/Stop/Update Monitor submenu which lets you manage Caché Monitor and the alerts log.|
|2) Manage MONITOR Options||Displays the Manage Monitor Options submenu which lets you manage Caché Monitor notification options (sampling interval, severity level, email).|
|3) Exit||Exits from the Caché Monitor Manager.|
This submenu lets you manage the operation of the Caché Monitor Manager. Enter the number of your choice or press Enter to return to the main menu:
Option? 1 1) Update MONITOR 2) Halt MONITOR 3) Start MONITOR 4) Reset Alerts 5) Exit Option?
The options in this submenu let you manage the operation of Caché Monitor as described in the following table:
|1) Update MONITOR||Dynamically restarts Caché Monitor based on the current settings (interval, severity level, email) in Manage Monitor Options.|
|2) Halt MONITOR||Stops Caché Monitor. The console log is not scanned until Caché Monitor is started.|
|3) Start MONITOR||Starts Caché Monitor. The console log is monitored based on the current settings (interval, severity level, email) in Manage Monitor Options.|
|4) Reset ALERTS||Deletes the alerts log (if it exists).|
|5) Exit||Returns to the main menu.|
Manage Monitor Options
This submenu lets you manage Caché Monitor’s scanning and notification options. Enter the number of your choice or press Enter to return to the main menu:
Option? 2 1) Set Monitor Interval 2) Set Alert Level 3) Manage Email Options 4) Exit Option?
The options in this submenu let you manage the operation of Caché Monitor as described in the following table:
|1) Set Monitor Interval||Lets you change the interval at which the console log is scanned. InterSystems recommends an interval no longer than the default of 10 seconds.|
|2) Set Alert Level||Lets you set the severity level of console log entries generating notifications, as follows:
|3) Manage Email Options||Lets you configure Caché Monitor email notifications using the Manage Email Options submenu.|
|4) Exit||Returns to the main menu.|
Because Caché Monitor generates a notification only for the first in a series of console log entries from a given process within about an hour, setting the alert level to 1 could mean that when a warning has generated an alerts log entry or email message, a subsequent severity 2 alert from the same process does not generate a notification. For example, a license expiration warning from Caché System Monitor could prevent a more serious shadow server disconnection alert 15 minutes later from generating an alerts log entry or email message.
Manage Email Options
The options in this submenu let you configure and enable/disable email. When email is enabled, Caché Monitor sends notifications by email; when it is disabled, notifications are written to the alerts log. Enter the number of your choice or press Enter to return to the Manage Monitor Options submenu:
Option? 3 1) Enable/Disable Email 2) Set Sender 3) Set Server 4) Manage Recipients 5) Set Authentication 6) Test Email 7) Exit Option?
The options in this submenu let you manage the email notifications for Caché Monitor as described in the following table:
|1) Enable / Disable Email||Enabling email causes Caché Monitor to:
Disabling email causes Caché Monitor to write entries to the alerts log.
Enabling/disabling email does not affect other email settings; that is, it is not necessary to reconfigure email options when you enable/disable email.
|2) Set Sender||Select this option to enter text that indicating the sender of the email, for example Cache Monitor. The text you enter does not have to represent a valid email account. You can set this field to NULL by entering - (dash).|
|3) Set Server||Select this menu item to enter the name and port number (default 25) of the email server that handles email for your site. Consult your IT staff to obtain this information. You can set this field to NULL by entering - (dash).|
|4) Manage Recipients||
This option displays a submenu that lets you list, add, or remove the email addresses to which each notification is sent:
Each valid email address must be added individually; when you select 2) Add Recipient, do not enter more than one address when responding to the Email Address? prompt.
|5) Set Authentication||Lets you specify the authentication username and password if required by your email server. Consult your IT staff to obtain this information. If you do not provide entries, the authentication username and password are set to NULL. You can set the User field to NULL by entering - (dash).|
|6) Test Email||Sends a test message to the specified recipients using the specified email server.|
|7) Exit||Returns to the Manage Monitor Options submenu.|
Caché Monitor Errors and Traps
The following console log errors always generate Caché Monitor notifications:
Process halt due to segment violation (access violation).
<FILEFULL>in database %
AUDIT: ERROR: FAILED to change audit database to '%. Still auditing to '%.
AUDIT: ERROR: FAILED to set audit database to '%.
Sync failed during expansion of sfn #, new map not added
Sync failed during expansion of sfn #, not all blocks added
WRTDMN failed to allocate wdqlist...freezing system
WRTDMN: CP has exited - freezing system
Write Daemon encountered serious error - System Frozen
Insufficient global buffers - WRTDMN in panic mode
WRTDMN Panic: SFN x Block y written directly to database
Unexpected Write Error: dkvolblk returned %d for block #%d in %
Unexpected Write Error: dkswrite returned %d for block #%d in %
Unexpected Write Error: %d for block #%d in %.
Cluster crash - All Cache systems are suspended
System is shutting down poorly, because there are open transactions, or ECP failed to preserve its state
SERIOUS JOURNALING ERROR: JRNSTOP cannot open %.* Stopping journaling as cleanly as possible, but you should assume that some journaling data has been lost.
Unable to allocate memory for journal translation table
Journal file has reached its maximum size of %u bytes and automatic rollover has failed
Write to journal file has failed
Failed to open the latest journal file
Sync of journal file failed
Journaling will be disabled in %d seconds OR when journal buffers are completely filled, whichever comes first. To avoid potential loss of journal data, resolve the cause of the error (consult the Caché system error log, as described in Caché System Error Log in the “Monitoring Caché Using the Management Portal” chapter) or switch journaling to a new device.
Error logging in journal
Journaling Error x reading attributes after expansion
ECP client daemon/connection is hung
Cluster Failsoft failed, couldn't determine locksysid for failed system - all cluster systems are suspended
enqpijstop failed, declaring a cluster crash
enqpijchange failed, declaring a cluster crash
Failure during WIJ processing - Declaring a crash
Failure during PIJ processing - Declaring a crash
Error reading block – recovery read error
Error writing block – recovery write error
WIJ expansion failure: System Frozen - The system has been frozen because WIJ expansion has failed for too long. If space is created for the WIJ, the system will resume otherwise you need to shut it down with cforce
CP: Failed to create monitor for daemon termination
CP: WRTDMN has been on pass %d for %d seconds - freezing system. System will resume if WRTDMN completes a pass
WRTDMN: CP has died before we opened its handle - Freezing system
WRTDMN: Error code %d getting handle for CP monitor - CP not being monitored
WRTDMN: Control Process died with exit code %d - Freezing system
CP: Daemon died with exit code %d - Freezing system
Performing emergency Cache shutdown due to Operating System shutdown
CP: All processes have died - freezing system
cforce failed to terminate all processes
Failed to start slave write daemon
ENQDMN exiting due to reason #
Becoming primary mirror server