Skip to main content

Tune and Troubleshoot SAM

Important:

System Alerting and Monitoring (SAM) has been deprecated; the following documentation is provided for existing users only. Customers interested in a comprehensive view of their operational platform can access the metrics APIOpens in a new tab and structured logsOpens in a new tab of InterSystems products within another observability tool. Existing users who would like assistance identifying an alternative solution should contact the WRCOpens in a new tab.

This section describes how to perform common tasks to configure the performance of System Alerting and Monitoring (SAM) version 2.0, and how to troubleshoot common problems.

Access the Management Portal for the SAM Manager

The SAM Manager is the InterSystems IRIS instance that powers the System Alerting and Monitoring application. Like other InterSystems IRIS instances, the SAM Manager provides a web application for performing maintenance and administration called the Management PortalOpens in a new tab. You can access the Management Portal for the SAM at the following address:

http://<sam-domain-name>:<port>/csp/sys/UtilHome.csp

where <sam-domain-name> is the DNS name or IP address of the system SAM is running on, and <port> is the configured Nginx port (8080 by default).

Actions you can perform using the Management Portal for SAM include:

Important:

The SAM Manager should not be used to develop or run any application; it is strictly for use by SAM. The directions in this section describe appropriate uses and interactions with the SAM Manager.

For a general purpose InterSystems IRIS instance, install InterSystems IRIS community editionOpens in a new tab.

Adjust Startup Settings

The SAM Manager initially allocates memory on startup as follows:

  • 2,000 MB of 8KB blocks for the database cache

  • 300 MB for the routines cache

This allocation should be sufficient when monitoring a modest number (30 or fewer) of InterSystems IRIS instances. If you are monitoring a large number of instances, or find that the SAM Manager is regularly using the full amount of allocated memory, you can increase these limits.

For details on adjusting these settings, see the Allocating Memory to the Database and Routine CachesOpens in a new tab topic in the System Administration Guide.

Clear the SAM Database

System Alerting and Monitoring Community Edition has a maximum database size limit of 10 GB. If this limit is met, SAM may exhibit unexpected behavior, and it becomes necessary to clear the database.

In the SQL page of the SAM Manager (System Explorer > SQL), enter the following command to delete all SAM metric data:

DELETE FROM %SAM.PrometheusSample

To prevent the SAM database from filling up again, consider using a difference license, changing the storage location, or lowering the number of days that SAM stores metrics using the configuration settings dialog on the Monitor Clusters page.

Monitor the SAM Manager

It is possible to use System Alerting and Monitoring to monitor the SAM Manager, as the SAM Manager is itself an InterSystems IRIS instance. This allows you to keep track of whether the SAM database is at risk of filling up, and make sure the configured cache sizes are sufficient for SAM operations.

Adding the SAM Manager to a SAM cluster is the same as adding any other InterSystems IRIS instance, with the following difference:

For the IP and Port fields, specify the fully qualified DNS name and port (8080 by default) where SAM runs. You can see these values in the address bar of your browser when accessing SAM. For example, if the URL for SAM is:

http://<sam-domain-name>:<port>/api/sam/app/index.csp

Specify <sam-domain-name> in the IP field, and <port> in the Port field.

Note:

It does not work to specify localhost in the IP field; you must enter a fully qualified DNS name.

Configure SAM to use HTTPS

Beginning with version 2.0, SAM can monitor InterSystems IRIS instances over HTTPS, which is secured using SSL/TLS. Using this protocol, you can be certain that the information SAM receives about target instances is genuine.

Setting up an HTTPS connection between SAM and the instances it monitors involves the following steps:

  1. Configure an SSL/TLS port for each instance you want to monitor.

  2. Set up an ISC_SAM SSL/TLS configuration through the Management Portal for the SAM Manager. This allows the SAM Manager to fetch InterSystems IRIS alertsOpens in a new tab over an HTTPS connection.

  3. Modify the isc_prometheus.yml file to use HTTPS. This secures the collection of instance metrics from each instance by Prometheus.

  4. Update each instance in the SAM web application, specifying the SSL/TLS port you configured in step 1.

  5. In the SAM web application, enable the use of HTTPS. This directs the SAM Manager to use your HTTPS configuration to fetch alerts. When HTTPS is enabled, the SAM web portal also provides HTTPS links to the Management Portal on the page for each instance.

This section describes each successive step in greater detail.

Configure SSL/TLS for the Target Instance

To monitor an instance over SSL/TLS, you must configure the web server associated with the instance to serve the instance’s web applications (including /api/monitor) over an SSL/TLS connection. For container environments, a webgateway containerOpens in a new tab provides the web server for an instance’s web applications. You must configure the web server within the webgateway container to use SSL/TLS. Refer to the documentation for your web server for guidance.

Set up the ISC_SAM SSL/TLS Configuration

In order for SAM to obtain alerts from your target instances over HTTPS, you must configure SAM to authenticate the instances as an SSL/TLS client. To do so:

  1. Place a copy of the certificate for the Certificate Authority which can authenticate your instances in the durable storage directory for your SAM iris container.

  2. Access the Management Portal for the SAM Manager.

  3. In the Management Portal, navigate to System Administration > Security > SSL/TLS Configurations.

  4. On the SSL/TLS Configurations page, select Create New Configuration.

  5. Fill out the New SSL/TLS Configuration form to specify the client-side SSL/TLS configuration appropriate for your deployment. The following parameters are required:

    • The Configuration Name must be ISC_SAM

    • The configuration Type must be Client

    • Server certificate verification must be required

    • The File containing trusted Certificate Authority certificate(s) must specify the path to the certificate file which was placed in the durable storage directory in step 1.

  6. To test your configuration, select Test. As prompted, enter the server name and the SSL/TLS port number you have specified for one of your target instances. (Select OK after each prompt.)

    After the SAM Manager attempts to establish an SSL/TLS connection at the address specified, a message appears at the top of the New SSL/TLS Configuration page informs you of the result of its attempt.

  7. Select Save.

Modify the isc_prometheus.yml File

SAM collects alerts using the SAM Manager, but it uses another component (Prometheus) to collect metrics. Therefore, to collect metrics over HTTPS you must also configure Prometheus to authenticate target instances as an SSL/TLS client.

Prometheus is configured by the isc_prometheus.yml file, which is located in the /config/prometheus/ directory within the SAM image. Modify the Prometheus configuration by performing the following steps:

  1. Open the isc_prometheus.yml file in a text editor.

  2. Locate the scrape_configs section

  3. Change the value of the scheme parameter to https

  4. Below the line which now reads scheme: https, add lines so that the start of the scrape_configs section reads as follows:

    scrape_configs:
      - job_name: 'SAM'
        metrics_path: '/api/monitor/metrics'
        scheme: https
        tls_config:
          ca_file: [pathToCACert]
    

    where [pathToCACert] is the path to a copy of the certificate file for the Certificate Authority which can authenticate your target instances.

    Note:

    You may wish to configure Prometheus further to meet the needs of your SAM deployment. Refer to the Prometheus documentationOpens in a new tab for a detailed description of your configuration options.

  5. Save your changes

Update the Instance Port in the SAM Web Application

Usually, a web server uses a different port number for HTTPS connections than it does for HTTP connections. In order for SAM to connect with a target instance over HTTPS, SAM must address the instance using the SSL/TLS port.

This means that for each instance you wish to monitor over HTTPS, you must edit the instance in the SAM web application so that it specifies the new combination SSL/TLS Web Server Port. Ensure that the other connection information is correct as well. For further instructions, see Add or Edit an Instance within a Cluster.

Enable HTTPS in the SAM Configuration Settings

Once you have completed all the preceding configuration steps, simply select Use HTTPS in the Settings menu for the SAM web application and then select Save. This directs the SAM Manager to use the SSL/TLS connection you have configured to fetch alerts over HTTPS, and to provide HTTPS links to the Management Portal for each instance on the single instances pages of the SAM web portal.

Congratulations! Your System Alerting and Monitoring deployment is now communicating with your system over HTTPS.

Create Custom Alert Handlers

You can create custom alert handlers that specify additional actions for System Alerting and Monitoring to perform when an alert fires, such as sending a text or email. Setting up an alert handler is a two step process:

  1. Write the Alert Handler

  2. Import the Alert Handler

Write the Alert Handler

To create an alert handler, you must create a class using an ObjectScript IDEOpens in a new tab. Connect this IDE to an InterSystems IRIS instance that is not part of SAM.

Important:

You cannot use the SAM Manager to create the alert handler, as SAM is not a development platform.

Instead, you must connect the IDE to a different InterSystems IRIS instance (such as the InterSystems IRIS Community EditionOpens in a new tab), and later import the alert handler into the SAM Manager.

After setting up the IDE, create a class with the following characteristics:

  • The class extends the %SAM.AbstractAlertsHandler class.

  • The class implements the HandleAlerts() class method. Within this method, specify the desired behavior when an alert fires.

When SAM detects a new alert (or multiple new alerts), SAM calls the HandleAlerts() method of all alert handlers. The HandleAlerts() method receives a %DynamicArray packet of alerts with the following format:

[
  {
    "labels":{
      "alertname":"High CPU Usage",
      "cluster":"1",
      "instance":"10.0.0.24:9092",
      "job":"SAM",
      "severity":"critical"
    },
    "annotations":{
      "description":"CPU usage exceeded the 95% threshold."
    },
    "ts": "2020-04-17 18:07:42.536"
  },
  {
    "labels":{
      "alertname":"iris_system_alert",
      "cluster":"1",
      "instance":"10.0.0.24:9092",
      "job":"SAM",
      "severity":"critical"
    },
    "annotations":{
      "description":"Previous system shutdown was abnormal, system forced down or crashed"
    },
    "ts":"2020-04-17 18:07:36.926"
  }
]
Note:

Alerts generated by an InterSystems IRIS instance are all named iris_system_alert.

Below is an example of an alert handler class. This example writes a message to the messages log (or Console Log) whenever an alert fires:

/// An example Alert Handler class, which writes messages to the messages log.
Class User.AlertHandler Extends %SAM.AbstractAlertsHandler
{

ClassMethod HandleAlerts(packet As %DynamicArray) As %Status
{
      set iter = packet.%GetIterator()
      while iter.%GetNext(.idx, .alert) {
            set msg = alert.annotations.description
            if alert.labels.severity = "critical" {set severity = 2} else {set severity = 1}
            do ##class(%SYS.System).WriteToConsoleLog(msg, 1, severity)
         }
      q $$$OK
}

}

Import the Alert Handler into SAM

After creating the alert handler, the next step is to import it into SAM.

  1. First, export the alert handler in XML format. How to do this depends on the IDE you are using.

  2. Next, log in to the Management Portal for the SAM Manager from a web browser, using the following address:

    http://<sam-domain-name>:8080/csp/sys/UtilHome.csp
    

    where <sam-domain-name> is the fully qualified DNS name or IP address of the system SAM is running on.

  3. Navigate to the Classes page (System Explorer > Classes).

  4. Make sure the SAM namespace is selected, then click Import. This brings up the Import Classes dialog.

  5. In the Import Classes dialog:

    • For The import file resides on, select My Local Machine.

    • For Select the path and name of the import file, click the Choose File button and select the alert handler XML file from your file system.

  6. At the bottom of the dialog, click Next, then Import. A result dialog should appear to tell you the status of your import.

After the import is complete, you have successfully added the alert handler to SAM. From now on, any time SAM detects a new alert, it calls the HandleAlerts() method of your class.

If you ever need to update an alert handler, simply repeat the steps above with the newer version. This replaces the previous version with the new one.

Improve Performance When Querying Older Metrics

SAM includes two databases: the Prometheus database (used for short-term metrics storage) and an InterSystems IRIS database (used for longer-term storage). The Prometheus database retains data for two hours in a cache optimized for rapid querying, while the InterSystems IRIS database retains the data for long term analysis.

If you constantly run queries for data older than two hours, increasing the Prometheus retention time may increase performance. Adjust this setting by changing the “--storage.tsdb.retention.time” flag in the docker-compose.yml file. For more information, see “Operational aspects” in the Prometheus documentation (https://prometheus.io/docs/prometheus/latest/storage/#operational-aspectsOpens in a new tab).

Troubleshoot an Unreachable Instance

There are many reasons the state of an instance could become Unreachable. This section provides several potential causes and solutions.

If none of these procedures resolve the Unreachable status, contact the InterSystems Worldwide Response Center (WRC)Opens in a new tab for further troubleshooting help.

The target instance is not outputting metrics

The /api/monitor application for the instance you are monitoring with SAM may not be outputting metrics. To determine this, use the web browser or the curl command in the command window to access the following URL:

http://<instance-host>:<port>/api/monitor/metrics

If this does not return a list of metrics, ensure that the instance is on InterSystems IRIS version 2020.1 or higher and that the /api/monitor application is configured to allow unauthenticated access, as described in the section on preparing instances for monitoring.

The target instance has an IP address in the 172.17.x.x range

System Alerting and Monitoring may not be able to reach an instance with an IP address in the 172.17.x.x range (for example, 172.17.123.123). This is because Docker uses this IP range for its own networks.

You can resolve this issue by changing the Docker IP address range. To do this, specify a different range (e.g. 10.10.x.x) in the Docker daemon configuration file using the default-address-pools option. Refer to the Docker documentationOpens in a new tab for further help editing this file.

The target instance is not responding before timeout

The /api/monitor application may be outputting metrics, but failing to respond to the GET request from Prometheus before the connection timeout (ten seconds, by default). An analysis of traffic over the network can confirm that Prometheus is ending each unsuccessful attempt to connect to the instance with a TCP FIN packet.

Restarting the instance may be sufficient to render the /api/monitor application responsive again.

Alternatively, if you have defined custom application metrics for the instance, the time required to compute these metrics may be exceeding the default scraping interval for Prometheus. While it is possible to specify larger values for the scraping interval and timeout parameters in the isc_prometheus.yml file, in this case InterSystems recommends adjusting your organization’s monitoring strategy so that SAM can reliably receive updates to all the metrics it monitors with the default frequency.

The SAM database is full

If the SAM database fills up, instances may show up as Unreachable and stop reporting metrics. To check whether this is the case:

  1. Open the SAM Manager from a web browser, using the following address:

    http://<sam-domain-name>:8080/csp/sys/UtilHome.csp
    
  2. Navigate to the Databases page (System Operation > Databases).

  3. Select Free Space View.

  4. Check the % Free column for the SAM database to see whether the value is 0.

If the database is full, you should free some space by deleting data, as described in an earlier section. Once you have done so, shut down System Alerting and Monitoring using the stop.sh script, and restart it using start.sh.

To prevent this from happening again, you can lower the number of days SAM stores data using the Configuration Settings menu.

Alternatively, you may prefer to change the location where Docker stores the persistent copy of the database to accommodate the volume of data collected. To do this:

  1. Shut down SAM using the stop.sh script.

  2. Change the location where Docker stores a persistent copy of the data from the SAM IRIS container, as described in our guide to setting up SAM.

  3. Move the contents of the former storage location to the new location. The default configuration specified by the docker-compose.yml file stores monitoring data in a named volume (irisdata) located in the /var/lib/docker/volumes/ directory.

  4. Restart SAM using the start.sh script.

FeedbackOpens in a new tab