Monitoring Ensemble
Monitoring Alerts
[Back] [Next]
   
Server:docs2
Instance:LATEST
User:UnknownUser
 
-
Go to:
Search:    

Alerts are automatic notifications triggered by specified events or thresholds being exceeded. This chapter describes how to configure and monitor alerts and contains the following topics:

Introduction to Alerts
Alerts provide a way for Ensemble to automatically notify users about a serious problem or condition that requires a quick resolution to ensure that the production continues operating normally. When properly configured Ensemble generates alerts for potentially serious problems and should not generate any alerts caused by normal variations in the production performance.
The other monitoring features described in this document require the user to actively check for a problem. Typically, users only check for a problem after it has had a noticeable impact on the production’s performance. Once you have configured alerts to notify users automatically, alerts may make it possible to resolve issues before they have a serious impact on performance and become critical problems.
Ensemble can automatically send an alert message to users when specified thresholds are exceeded, when specified events occur, or when user code explicitly generates alerts. You can process alerts in a number of ways:
In configuring any alert notification system, it is important to calibrate the level that triggers the alert and ensure that the users being notified understand the alert and know how to respond. If the trigger level is set too high, problems may already have a significant impact on performance before Ensemble notifies users. But, if the trigger level is set too low, Ensemble sends out many notifications during the normal operation of the production and users tend to ignore these notifications and may not respond to the few critical ones among them.
Alerts are messages of type Ens.AlertRequest that can be generated by any business service, process, or operation in a production. Ensemble always stores alerts messages in the log. If there is a production component named Ens.Alert, Ensemble sends all alert messages to it. Ens.Alert typically is an operation, such as EnsLib.Email.AlertOperation, a routing process, or the Alert Manager, which has the class Ens.Alerting.AlertManager.
Choosing How to Handle Alerts
To choose how to handle alerts in your production, you should first answer the following questions:
In all cases, you should define what conditions generate alerts as described in Calibrating Alert Sensitivity. Once you have done that:
  1. If you don’t want automatic notification, you are done. Your production should not have a component named Ens.Alert. Ensemble will write alerts to the log file but will not do anything else with them.
  2. If you want to send all alerts to a single list of users by the same mechanism but do not want to track and manage alerts, add an operation with a class such as EnsLib.Email.AlertOperation and name the component Ens.Alert. Ensemble sends all alerts to the operation and it notifies the configured users. See Configuring Simple Notifications.
  3. If you want the capability to route alerts to a set of users and other alerts to other users or to none but do not want to track and manage alerts, add a routing engine to your production and name it Ens.Alert. You will also need to add alert operations, such as EnsLib.Email.AlertOperation. See Configuring Alert Routing.
  4. Finally, if you want to manage your alerts so that you can assign alerts to users, track the status of alerts, and manage the alerts, add an Alert Manager, Notification Manager, one or more Alert operations, and, optionally, an Alert Monitor to your production. See Configuring Alert Management.
Calibrating Alert Sensitivity
Ideally you would like to generate alerts only for conditions that require investigation and resolution and not generate alerts when the production is behaving normally, but in order to generate alerts for all serious conditions, you typically will also get some alerts from the normal variation of production execution. For example, you could set the inactivity timeout for a business service at 300 seconds. During peak hours, this business service may get many requests during a 300 second interval and, if no requests come in during that interval, it probably indicates a problem that should generate an alert. However during off-peak hours, there may be no requests for longer periods with all systems operating correctly. To set the system to catch the important alerts during the peak period, you will generate false positive alerts during the off-peak period. If you are using a router or the Alert Manager, you can suppress notifications for these off-peak alerts.
Each business service, process, and operation in a production can generate alerts when it encounters an error or exceeds a limit. You calibrate alert sensitivity by configuring the alert and error settings for each business service, process, and operation. After you configure these production settings, you should monitor the production during normal operation. If there are large numbers of false positive alerts or missed alerts for serious conditions, you should adjust the settings.
The following settings specify the conditions under which Ensemble generates alerts:
Alert On Error
Check this option if you want the component to generate an alert when it encounters an error. Other production settings control what conditions are considered errors. These settings are described later in this section.
Do not check Alert On Error on any component that is involved in delivering or processing alerts.
Alert Grace Retry Period
This is a time in seconds that the component will retry sending its output before issuing an alert. This setting is most commonly used for business operations. If the component is retrying and eventually succeeds within this time there will be no alert. Setting this to a value such as 60 seconds will suppress alerts on transient issues such as a dropped network connection that fixes itself, but won’t wait too long before alerting you to a real problem.
Inactivity Timeout
This is a time in seconds that the component will wait for a request before issuing an alert. This is typically used for business services. If you set it to a value such as 300 seconds for a busy system you will be alerted fairly quickly if you stop receiving messages. For quieter systems you might want a longer interval. This value applies all day, so you might get false positives on off-peak times, when traffic is much lower than normal, but you can filter these in your alert handler if necessary.
Queue Wait Alert
This is a time in seconds that a message can stay on a queue before it generates an alert. Setting this to a value such as 300 seconds would be a good starting point. For critical systems, 5 minutes might be too long, but for other less critical systems a longer interval may be appropriate.
Queue Count
Specifies the number of items on a queue. When the queue size reaches this number Ensemble generates an alert. Setting this to a value such as 10 causes an alert when a queue starts to build on a critical item indicating some delay. But some queues may get this large during normal operation and if a component receives a nightly batch of work, setting this parameter may cause unnecessary alerts.
Alert On Bad Message
In routers that provide validation, such as the HL7 router, this setting specifies that any message that fails the specified validation generates an alert. This alert is in addition to sending the original message to the bad message handler.
There are many settings that control what conditions Ensemble treats as errors. If Alert On Error is checked, these settings control what conditions generate alerts. The following are some commonly used settings that control error conditions:
Stay Connected
If this is a positive integer, the connection is dropped and an error is issued after that number of seconds with no activity. Setting this to -1 means never disconnect and never cause an error on a disconnect from the other end. A large value such as 99999 means the connection will be dropped after an extremely long period of inactivity, but has the side effect of not causing an error if the other end drops the connection.
Failure Timeout
Specifies the number of seconds to retry sending a message before failing and considering it an error. Setting this property to –1 instructs Ensemble to continue to retry sending the message indefinitely and to not fail. Typically, this is the setting to use for critical messages that must be processed in FIFO order, such as HL7 messages. If Failure Timeout is set to a positive integer, the operation fails after this number of seconds, discards the current message and attempts to process the next message.
Replycode Action
This setting for HL7 determines how the operation responds when the target application returns an error reply code. This is a critical setting that determines the operation behavior for a failure. The values are fairly complex but hover text help is available by clicking on the ‘Replycode Action’ label in config settings tab. Typically an operation should retry indefinitely if there is an error sending the message. To ensure indefinite retries after all NACKs set FailureTimeout to -1 and ReplyCodeAction to :?R=RF and :?E=RF. If the target application rejects the message (AR) retrying will probably have the same results so you might want to do something different. However applications can’t always be relied on to use the ACK codes correctly and each application should be considered individually. One option is to disable the business operation after a rejected message but that stops all traffic. Suspending rejected messages until the problem can be fixed will allow other messages to flow but will break the FIFO ordering.
Configuring Alert Monitoring
Alerts are messages generated by production components. Ensemble automatically writes the alerts to a log file and sends then to the production component named Ens.Alert. If your production does not have a component named Ens.Alert, then Ensemble writes alerts to the log file but does not send them to any component. The component named Ens.Alert can be of any class. The most frequently used classes for Ens.Alert are:
Configuring Simple Notifications
If you can handle all the alerts via the same output mechanism and all alerts are to be sent to the same list of users, you can use an operation class as the component named Ens.Alert. To send alerts by email, use the EnsLib.EMail.AlertOperation class. To use another mechanism, you must develop a new operation. See Using a Simple Outbound Adapter Alert Processor for details.
If you are using the EnsLib.EMail.AlertOperation class, you should specify the following configuration settings:
Configuring Alert Routing
If you need to contact users via multiple output mechanisms or you need to send alerts selectively to specified users, add a business process named Ens.Alert with a EnsLib.MsgRouter.RoutingEngine class. In this case, your production must also contain a business operation for each output mechanism, and the alert processor forwards messages to those business operations.
The business process would examine the messages and forward them to different business operations, depending on the alert contents and any logic that you include.
Your logic may need to consider the following factors:
The EnsLib.MsgRouter.RoutingEngine class provides the setting Business Rule Name. If you specify this setting as the name of a routing rule set, this business host uses the logic in that rule set to forward all the messages that it receives. To use the routing rule to specify the addresses to send the alert, you can add a transformation that sets the AlertDestination property of the Ens.AlertRequest class. For an example, see the Demo.HL7.MsgRouter.Production in the ENSDEMO namespace. In this example, the routing rule sends all alerts to the email operation and uses a transform with a lookup to specify the alert destinations based on the component that created the alert. The addresses specified in the AlertDestination property are added to any addresses specified in the Recipients setting.
Configuring Alert Management
Managed alerts are persistent messages that provide a record of what problems occurred in a production, who responded to the problems, what they did to resolve the problems, and how much time it took to resolve the problems. Alert Management can notify key personnel of alerts that are not resolved promptly.
Alert management provides alert routing capability plus the tools needed to track and resolve alerts. Alert management allows you to assign an alert to a specific user, track whether the alert has been resolved or escalated, and report the time that it took to resolve the alert. Alert management can be added to a production using the Ensemble management portal including the rule and transformation editors without writing custom code. For specialized requirements, it is possible to add custom code in the alert management production components. See Adding Custom Code to Alert Management for more information.
The Alert Management framework consists of the following business services, processes, and operations:
The following figure illustrates how the components in the alert management framework are connected.
The following sections describe how to add and configure the alert management components:
Configuring Alert Management in Production Settings
The production settings Alerting Control group is used only for alert management. If you are not using alert management, you can leave these fields blank. If you are using alert management, set these fields as follows:
If you are sending alert notifications to different distribution lists based on the component that generated the alert, it is useful to specify what alert groups each component belongs to.
Defining Alert Groups
If you are sending alert notifications to different distribution lists based on the component that generated the alert, it is useful to specify what alert groups each component belongs to. For large productions with many components, it is more practical to select a subset of alerts based on alert groups than on the individual component names. You specify the alert groups for a component as a string containing a list of alert groups separated by commas. You specify a component’s alert groups in the AlertGroups property. Once you have defined an alert group for one component, you can select it using the check box for another component.
Adding the Alert Manager and Defining Its Rule
The alert processor must be a business process named Ens.Alert. Typically, if you are using the Alert Manager, you add a business process named Ens.Alert with the class Ens.Alerting.AlertManager. If you are adding alert management to a production that already has a router alert processor, you could keep the router named Ens.Alert and have it send some or all of the alerts to the Alert Manager business process.
If you do not define a rule for the Alert Manager, it promotes all alerts to managed alerts, leaves alerts unassigned, sets the deadline based on the number of minutes specified in the Alert Action Window, and sends all managed alerts to the Notification Manager.
To use a rule for the Alert Manager, create a rule and specify the Creation Rule for Managed Alert type in the rule wizard. This creates a rule with a Ens.Alertiing.Context.CreateAlert context. The alert context provides access to:
The rule can suppress promoting the alert to a Managed Alert by returning 0 or can promote the alert to a Managed Alert by returning 1.
The rule can check whether the alert is a repeat occurrence of a previous alert that is represented by a currently open managed alert. To do this, the rule uses the Ens.Alerting.Rule.FunctionSet.IsRecentManagedAlert() function. The IsRecentManagedAlert() function tests if there is a recent open managed alert that is from the same component and has the same text as the specified alert. You can optionally specify that the function adds a reoccurs action to the existing managed alert.
After you have defined the rule, you specify the rule name in the Alert Manager configuration as the CreateManagedAlertRule property. If you need to customize the Alert Manager in ways that the rule does not allow, you can implement a subclass of Ens.Alerting.AlertManager and override the OnProcessAlertRequest() method. See Adding Custom Code to Alert Management.
Adding the Notification Manager and Defining Its Data Transformation
To add the Notification Manager, add a business process with class Ens.Alerting.NotificationManager. If you do not define a rule, the Notification Manager sends all managed alert messages and reminders to the email addresses set in the Alert Notification List using the operation set in the Alert Notification Operation.
To use a data transformation with the Notification Manager, create a data transformation following these instructions:
Since the Notification Manager sets the target NotificationRequest before executing the data transformation, you do not need to copy any of the alert information from the source to the target. You should set the following in the target:
Your rule should not modify target.NotificationRequest. If you add any destinations to the NotificationRequest, they are ignored.
Once you have defined the data transformation, on the production configuration page, select the Notification Manager and set the NotificationTransform to the transformation that you have defined.
If you need to customize the Notification Manager in ways that the data transformation does not allow, you can implement a subclass of Ens.Alerting.NotificationManager and override the OnProcessNotificationRequest() method. See Adding Custom Code to Alert Management.
Adding and Configuring Notification Operations
To use the email alert operation provided with Ensemble, add an operation with the class EnsLib.EMail.AlertOperation, and enter the following settings in the Basic Settings and the Additional Settings group:
Adding the Optional Alert Monitor and Defining Its Rule
The Alert Monitor is optional. It controls whether overdue alerts are escalated and whether reminder notices are to be sent. To add an alert monitor, add a Business service with class Ens.Alerting.AlertMonitor. If you do not include the alert monitor, there are no automatic reminders and no automatic escalations.
If you do not define a rule, the alert monitor sends out reminder notices when a managed alert is still open and its NextMonitorTime has passed. It resets the NextMonitorTime by incrementing it with the number of minutes specified in the Alert Action Window production setting. It does not escalate alerts.
To use a rule for the Alert Monitor, create a rule and specify the Overdue Rule for Managed Alert type in the rule wizard. This creates a rule with a Ens.Alertiing.Context.OverdueAlert context. The alert context provides access to:
If the rule returns 1, the Alert Monitor sends the managed alert to the Notification Monitor. If the rule returns 0, no reminder message is sent.
If you set the New Next Action Time, the Alert Monitor sets the NextMonitorTime to the same time. If you do not set the New Next Action Time, the Alert Monitor takes its default action, which is to set the NextMonitorTime to the current time plus the number of minutes specified in the Alert Action Window production setting.
If you need to customize the Alert Monitor in ways that the rule does not allow, you can implement a subclass of Ens.Alerting.AlertMonitor and override the OnProcessOverdueAlert() method. See Adding Custom Code to Alert Management.
Monitoring, Tracking, and Resolving Managed Alerts
Managed alerts are a special kind of persistent message. The Alert Manager generates a managed alert when it receives an alert that meets the specified conditions. The following figure illustrates the life cycle of a managed alert: from the component creating the alert to a user resolving the problem indicated by the alert and closing the managed alert.
This figure illustrates the life cycle of a typical managed alert:
  1. A business service, process, or operation encounters a specified condition and generates an alert.
  2. The alert is sent to the Alert Manager, which is a business process named Ens.Alert. The alert manager promotes the alert to a managed alert, assigns it to a user, and sends it to the notification manager.
  3. The Notification Manager determines what group of users to contact and how to contact them. It sends the Managed Alert to an alert operation, which sends notifications to a group.
  4. The owner of the managed alert does not solve the problem or update the managed alert.
  5. The alert monitor queries for alerts that are still open when the alert’s NextMonitorTime has been reached. It finds the managed alert. It escalates the alert and sends it to the notification manager.
  6. The Notification Manager determines what group of users to contact and how to contact them. Since the alert is escalated it sends it to a different distribution list. It sends the Managed Alert to an alert operation, which sends email to a group.
  7. In this case the user is able to solve the problem that caused the initial alert. The user updates the managed alert to close the alert.
  8. The managed alert is now inactive but contains the alert history and remains available for reports and analysis.
Acting on Alerts by Viewing My Managed Alerts
After you have received an email or other message indicating that there is a managed alert that requires your action, you can view the open managed alerts that are assigned to you or that are unassigned by selecting Ensemble, Monitor, and My Managed Alerts.
You can view My Alerts, Unassigned Alerts, or All Alerts. You can select alerts that are:
The table displays the following information about the alerts:
To display the alert details and to update the alert, select the alert in the list. The Managed Alert Details form is displayed.
The Managed Alert Details form displays the following fields. You can update the Open, Current Owner, Escalation Level, and Next Action Time fields.
Viewing Managed Alerts
The Managed Alert Viewer allows you to search and view all managed alerts that are stored in the database, including closed alerts and alerts assigned to other users, but you cannot update alerts from the Managed Alert Viewer. To access the Managed Alert Viewer, select Ensemble, View, and Managed Alerts.
You can specify the search criteria in the left pane:
If you select an alert, the alert details are displayed in a panel to the right. The alert details panel displays the same information as described in the previous section, but you cannot update any of the values. Instead of displaying the alert state as a check box, the panel displays an open alert as 1 and a closed alert as 0.
Managed Alerts Walkthrough
There is no sample in ENSDEMO that demonstrates alert management. You can start with any of the example productions and follow the instructions in this section to demonstrate alert management. As an example, this walkthrough starts with the Demo.HL7.MsgRouter.Production production. This starting point uses a routing alert processor, but you can follow the same steps with a production that does not have any alert code. In order to complete this walkthrough you must have access to an SMTP server to send email.
Note:
Because this is a walkthrough of a complex feature, it covers the important steps in the procedure but does not explicitly describe every user action. For example, it assumes that the readers know that they must enable each component in the production and respond to dialog windows.
Open a Sample Production and Delete Any Alert Processor
Open the Demo.HL7.MsgRouter.Production production.
Note:
If you want to run the example production as it is to demonstrate a routing alert processor, you need to update some settings. If you are going to replace these components with the alert management components, you can skip this step. To run any alert processor that uses email notification, you need access to an SMTP server and you need to update the following settings:
You will be modifying this production. If you want to retain the original sample, you should export the production and then import it into another namespace. To export a production, select Production Settings, Actions tab, and Export button. To import a production, select System Explorer, Classes, and the Import button. If you want to preserve your work you should not work in the ENSDEMO namespace, as it is cleared when you upgrade Ensemble with a new or maintenance release.
Select the Ens.Alert routing business process. Delete it by clicking Delete on the Actions tab. You will be adding an Alert Manager and naming it Ens.Alert in the next step.
You can skip deleting the EMailAlertOperation since you would be adding the same component in the next step.
Before you continue on the walkthrough, you should make the following preparations:
Adding the Alert Manager, Notification Manager, Alert Monitor, and Alert Operation
Add the following components to your production:
All component names in this walkthrough are arbitrary except Ens.Alert, which is a required name.
Configuring the Production
Select Production Settings and the Settings tab. The following settings are in the Alerting Control group:
For this and the other settings in this walkthrough, click Apply.
Select the EMailAlertOperation operation, and enter the following settings in the Basic Settings and the Additional Settings group:
You can leave the Alerting Control settings at their default values. You should ensure that the Alert On Error check box is not checked on the EMailAlertOperation operation or any of the alerting business processes or business operations.
If you are sending alert notifications to different distribution lists based on the component that generated the alert, it is useful to specify what alert groups each component belongs to. For large productions with many components, it is more practical to select a subset of alerts based on alert groups than on the individual component names. For this walkthrough, assign the following alert groups to the specified components:
Component Alert Groups
ABC_HL7FileService ABCGroup
XYZ_HL7FileService XYZGroup
Regular_FileOperation ABCGroup,XYZGroup
Extra_Observations NotImportantAlertGroup
Starting the Production and Managing Alerts
You have completed adding and configuring the Managed Alert service, processes, and operation. You have not yet defined a rule or transformation so your alert management components provide their default behavior. This is:
The Alert Groups settings are not used with the default behavior. You will use them when you add rules and the transformation.
Start the production if it is not already running. In order to use the alert management system, you need to first generate alerts. One easy way to do this is to modify a file service File Path to point to a nonexistent directory. For example:
  1. Modify the ABC_HL7FileService File Path to specify an nonexistent directory and click Apply. The component should turn red.
  2. Repeat with XYZ_HL7FileService.
  3. The production should have sent email to the addresses set in Alert Notification Recipients. Check and see if the email has been delivered to this account.
  4. Select Ensemble, Monitor, and My Managed Alerts. The two managed alerts should be displayed after you select the Unassigned and Today tabs. Select an alert. You can now update by reassigning the alert to yourself or another user, by making the next action time earlier or later, by escalating the alert, or by closing the alert. Once you have updated a field, you must enter a reason before you click the Update button.
  5. If you do not close the alerts, the Alert Monitor will send a reminder message and update the NextMonitorTime when the current NextMonitorTime is reached.
  6. You can also go to the Managed Alert Viewer by selecting Ensemble, View, and Managed Alerts. This page allows you to query for alerts including alerts that are closed.
Once you have completed exploring the managed alert user interface, ensure that all managed alerts are closed. If there are any open managed alerts and the Alert Monitor is enabled, it will continue sending reminder messages.
Customizing Alert Management with Rules and Transformations
In this section you customize the Alert Manager and Alert Monitor by defining rules and customize the Notification Manager by defining a transformation.
Create an Alert Manager Rule
The Alert Manager rule controls whether a managed alert is created for an alert and can set some properties, such as the alert owner. To create a rule for the Alert Manager, follow these steps:
  1. Select Ensemble, List, and Business Rules.
  2. Click New to create a new rule for the Alert Manager.
  3. Use the rule editor to enter the following rule. Replace LabManager with a Caché username on your system.
  4. Save the rule.
  5. In the production configuration page, select Ens.Alert and set the CreateManagedAlertRule property to Demo.HL7.MsgRouter.AlertManCreationRule.
Once you have added this rule, the alert management has the following behavior:
Creating a Notification Manager Transformation
The Notification Manager data transformation controls the operations that the targets for a notification and the destination addresses for the message. To create a transformation for the Notification Manager, follow these steps:
  1. Click New to create a new Data Transformation for the Notification Manager.
  2. On the Transform tab. set the Create property to existing.
  3. Add the actions shown in the following illustration.
  4. Save and compile the transformation.
  5. On the production configuration page, select NotifyMan and set the NotificationTransform to the transformation Demo.HL7.MsgRouter.NotifyManTransform.
Once you have added this data transformation, the production has the following behavior:
Creating an Alert Monitor Rule
The Alert Monitor rule controls whether overdue alerts are escalated and whether reminder notices are to be sent. To create a rule for the alert monitor, follow these steps:
  1. Select Ensemble, List, and Business Rules.
  2. Click New to create a new rule for the Alert Monitor.
  3. Use the rule editor to enter the following rule:
  4. Save the rule.
  5. In the production configuration page, select AlertMon and set the OverdueAlertRule property to Demo.HL7.MsgRouter.AlertMonitorRule.
Once you have added this rule to the production, it has the following behavior: