Monitoring Ensemble
Monitoring a Production
[Home] [Back] [Next]
InterSystems: The power behind what matters   
Class Reference   
Search:    

The Management Portal provides pages to enable you to monitor a single production more closely (in contrast to the previous chapter, which describes how to monitor all namespaces). This chapter describes how to use these pages. It discusses the following topics:

General Notes
For background information, see the chapter Concepts.”
For information on starting and stopping productions, see Managing Ensemble.” Note that for a live, live, deployed production, InterSystems recommends that you use the auto-start option, which is described in that book.
If a production is Suspended or Troubled, see Correcting Production Problem States.”
Using the Production Monitor
The [Ensemble] > [Production Monitor] page displays real-time status information about the currently running production in a condensed, one-page format, with links for further details. To display this page in the Management Portal, select Ensemble, Monitor, Production Monitor, and Go.
You can use this page to monitor the general health of the production in the selected namespace. The following is a partial example of what this page displays:
By default, this page is automatically updated at frequent intervals. In the left area, you can clear the Auto update check box to disable these updates.
The Production Monitor page displays real-time information provided by the Monitor Service. The Monitor Service is a business service that is implicitly included in every Ensemble production (not visible as part of its configuration). The Monitor Service continually monitors the activities of Ensemble items while a production is running, and records data about them at frequent intervals.
Input Connections
The Input Connections table (upper left) lists all incoming connections from external systems. Each entry indicates following:
  1. Business service status
  2. Business service connection status
  3. Business service name
  4. Number of messages processed since the production started
The statuses are indicated by the cell color. The item status and the connection status cells have the following meaning:
If you hover over the name of the service, the hover text provides additional information. If you select on the name of the service, the left area is updated with details and also displays the following associated links:
Output Connections
The Output Connections table (upper right) lists all outgoing connections to external systems. Each entry indicates following:
  1. Business operation status
  2. Business operation connection status
  3. Business operation name
  4. Number of messages processed since the production started
The statuses are indicated by the cell color. The item status and the connection status cells have the following meaning:
If you select the name of the operation, the left area is updated with details and the same links as for the Input Connections table.
Queues
The Queues table (lower left) lists the status of Ensemble internal message queues and how many messages are currently waiting in each queue.
This table uses the same icons and color-coding as the Input Connections table. If you click an item in this table, the left area is updated with details and the Queue Contents link.
Event Log
The Event Log (lower right) summarizes recent entries in the Event Log.
Each entry provides an icon and color to indicate the item’s status, as follows:
If you select an item in this table, the left area is updated to show details of that Event Log entry. It also displays the Event Log link, which you can use to see the entire Event Log.
Activity Graph
The activity graph shows the message activity for the production or for a selected incoming or outgoing connection. The graph can show the message activity over a time period ranging from the previous 7 days to the previous 5 minutes. The following displays the activity graph or history of the production monitor:
You can specify the following for the activity graph:
Custom Metrics
The bottom area of the page might display one or more tables of custom metrics added by your Ensemble developers. For example:
See Adding Business Metrics to the Production Monitor,” in Developing Ensemble Productions.
Monitoring Production Queues
The [Ensemble] > [Queues] page shows the current state of all the message queues being used by the running Ensemble production in the selected namespace.
To display this page in the Management Portal, select Ensemble, Monitor, and Queues.
The table on this page has one row for each queue. The columns in this table are as follows:
To see the contents of any given queue, select the row for that queue. The active messages and queue contents for that queue are displayed. If you select an entry in the queue contents or active messages, information about the message is displayed.
You can refresh the list of queues and contents by clicking the refresh arrow. You can also specify the time period to automatically refresh the list of queues and the active messages and queue contents tables.
The Active Messages table is displayed when there are active messages in the selected queue. It has one row for each active message, which identifies the message and its state. If you select one or more messages by checking the check box, you can abort or select the selected messages.
In the Active Messages table, you can select a message row to view the details of the selected message. The details are displayed to the right in the Header, Body, Contents, and Trace tabs. These tabs are the same as in the [Ensemble] > [Message Viewer] page; see Viewing, Searching, and Managing Messages,”
The Queue Contents table on this page is displayed if there are messages in the selected queue. It has one row for each message in the given queue. The columns in this table are as follows:
In the Queue Content table, you can perform the following tasks:
Diagnosing Problems with Queues
By looking at queues and jobs, you can often quickly spot a problem in the system.
When there is buildup on a queue, it usually means something needs to be repaired. Usually the most important information about queues is the destination, or “target,” of any message that has been too long on a queue. In general, when a queued message is not being sent, it is because it cannot get to its target. If you can find out what is causing a problem with the target, when you solve that problem, the queue buildup will generally disappear. For example:
Monitoring Active Jobs
The [Ensemble] > [Currently Active Jobs] page shows the currently active jobs for the production in the selected namespace.
To display this page in the Management Portal, select Jobs on the Ensemble Monitor menu.
The table on this page has one row for each active job. The columns in this table are as follows:
Diagnosing Problems with Jobs
By looking at jobs and queues, you can often quickly spot a problem in the system.
Most jobs spend most of their time in a dequeuing state while they wait for messages. During shutdown they should become quiescent. If the job does not become quiescent during shutdown, that likely indicates a problem. If the job is constantly in a running state, that also indicates a problem, unless you expect the component to be doing a lot of processing (and it is actually completing this processing).
Jobs that are marked as dead are jobs that have been terminated for some reason and Ensemble has detected that the job is no longer present on the system. This is normally an indication of a serious problem and should not occur. Also, if Ensemble detects a dead job, it writes an error to the Event Log.
Using the Production Configuration Page
Ensemble provides another way to view a production, the [Ensemble] > [Production Configuration] page. To access this page, select Ensemble, Configure, Production, and then click Go.
This page displays the business hosts in the production, with useful color coding as in the following example:
This page displays a circular status indicator next to each business host. If you click Legend to see the meaning of this indicator, Ensemble displays the following:
Note that the primary purpose of this page is for configuring productions as described in Configuring Ensemble Productions.
Correcting Production Problem States
If a production is Suspended or Troubled, read this section.
If the state of a production is Running, then a production has been started and is operating normally. This is an acceptable state.
If the state of a production is Stopped, it is not running and all of its queues are free of synchronous messages. This is also an acceptable state.
In some cases (usually during development), you might see the Update button on this page for a production that is Running. Click this, and Ensemble updates the production to resolve the discrepancy. For an explanation, see The Update Button in Configuring Ensemble Productions.
Recovering a Suspended Production
A production acquires the Suspended status when, at the end of the shutdown sequence, some queues still contain synchronous messages.
You can start the Suspended production again to permit these messages to be processed. However, if the underlying problem is not resolved, you might acquire more synchronous messages in the queue without processing the previous messages.
Therefore, if a live, deployed Ensemble production goes into a Suspended state, contact the InterSystems Worldwide Response Center (WRC) for assistance.
If a production becomes Suspended during development, see Correcting Production Problem States in Developing Ensemble Productions. In this case, you can use a procedure that discards the messages.
Recovering a Troubled Production
A production acquires a status of Troubled if Ensemble is stopped but the production did not shut down properly. This can happen if you restarted Ensemble or rebooted the machine without first stopping the production. In this case, click the Recover button.