Caché High Availability Guide
Using Pacemaker-based Red Hat Enterprise Linux HA with Caché
[Back] [Next]
   
Server:docs2
Instance:LATEST
User:UnknownUser
 
-
Search:    

Red Hat Enterprise Linux (RHEL) release 7 includes Pacemaker as its high availability service management component. The InterSystems rgmanager/cman based agent used with previous versions of RHEL cannot be used with Pacemaker-based clusters. This appendix explains how to use the Caché Open Clustering Framework based (OCF-based) resource agent to configure Caché as a resource controlled by RHEL 7 with the Pacemaker-based High Availability Add-On (HA).

The procedures here highlight the key portions of the configuration of RHEL HA, including how to incorporate the Caché OCF-based resource agent into the cluster. Refer to your Red Hat documentation and consult with your hardware and operating system vendors on all cluster configurations.
When using Caché in a high availability environment controlled by RHEL HA:
  1. Install the hardware and operating system according to your vendor recommendations for high availability, scalability and performance; for more information, see Hardware Configuration.
  2. Configure RHEL HA with shared disks and a virtual IP address (VIP), and verify that common failures are detected and the cluster continues operating; see Configuring Red Hat Enterprise Linux HA for more information.
  3. Install the Caché OCF-based resource agent script according to the information in Installing the Caché OCF-based Resource Agent.
  4. Install Caché and your application according to the guidelines in this appendix and verify connectivity to your application through the VIP; for more information, see Installing Caché in the Cluster.
  5. Test disk failures, network failures, and system crashes, and test and understand your application’s response to such failures; for more information, see Application Considerations and Testing and Maintenance.
Hardware Configuration
Configure the hardware according to best practices for your application. In addition to adhering to the recommendations of your hardware vendor, consider the following:
Configuring Red Hat Enterprise Linux HA
Prior to installing Caché and your Caché-based application, follow the recommendations in this section when configuring RHEL 7. These recommendations assume a cluster of two identical nodes. Other configurations are possible; consult with your hardware vendor and the InterSystems Worldwide Response Center (WRC) for guidance.
Red Hat Enterprise Linux
When configuring Linux on the nodes in the cluster, use the following guidelines:
Red Hat Enterprise Linux High Availability Add-On
This document assumes RHEL 7 or later, with the HA Add-On using Pacemaker as the high availability service management component. The script and directions here apply only to Pacemaker-based HA. RHEL 6.5 includes Pacemaker-based HA and may work as well; consult with Red Hat and the InterSystems Worldwide Response Center (WRC) for guidance.
In general, you will follow these steps:
  1. Install and cable all hardware, disk and network.
  2. Configure STONITH fencing resources.
  3. Create VIP and disk resources (file system, LVM, perhaps CLVM) that include the network paths and volume groups of the shared disk.
Be sure to include the entire set of volume groups, logical volumes and mount points required for Caché and the application to run. These include those mount points required for the main Caché installation location, your data files, journal files, and any other disk required for the application in use.
With the move to Pacemaker, the RHEL HA Add-on no longer supports qdisk or any quorum disk. With two-node clusters, therefore, a robust STONITH configuration is especially important to avoid a partitioned or unsynchronized cluster. Consult with Red Hat on other possibilities such as adding a third-node for quorum purposes only.
Installing the Caché OCF-based Resource Agent
The Caché OCF-based resource agent consists of one file that must be installed on all nodes in the cluster that will run Caché resources.
A sample Caché agent script is included in a development installation of Caché, or in the Caché distribution kit in the dist/dev/cache/HAcluster/RedHat directory. This sample is sufficient for most cluster installations without modification. No development installation is required; simply follow the instructions provided here for copying the Caché agent script file to its proper locations in the cluster.
The following files are located in /installdir/dev/cache/HAcluster/RedHat after a development installation:
To copy the agent script, do the following:
  1. Copy the script to the /usr/lib/ocf/resource.d/heartbeat/ directory on all cluster nodes, changing the name of the file to Cache, as follows:
    cp installdir/dev/cache/HAcluster/RedHat/CacheOCFagent /usr/lib/ocf/resource.d/heartbeat/Cache
  2. Adjust the ownerships and permissions of the agent file on each node:
    chown root:root /usr/lib/ocf/resource.d/heartbeat/Cache 
    chmod 555 /usr/lib/ocf/resource.d/heartbeat/Cache
You are now ready to install Caché in the cluster and configure RHEL HA to control your Caché instance(s) using the Caché agent.
Installing Caché in the Cluster
After a resource group has been created and configured, install Caché in the cluster using the procedures outlined in this section. These instructions assume that the Caché resource script has been properly located as described in the previous section, Installing the Caché OCF-based Resource Agent.
Procedures differ depending on whether you are installing only one instance of Caché or multiple instances of Caché. Installing a single instance of Caché is common in clusters dedicated to the production instance. In development and test clusters, it is common to have multiple instances of Caché controlled by the cluster software. If it is possible that you will install multiple instances of Caché in the future, follow the procedure for multiple instances.
Note:
For information about upgrading Caché in an existing failover cluster, see Upgrading a Cluster in the “Upgrading Caché” chapter of the Caché Installation Guide.
Installing a Single Instance of Caché
To install a single instance of Caché in the cluster, use the following procedure.
Note:
If any Caché instance that is part of a failover cluster is to be added to a Caché mirror, you must use the procedure described in Installing Multiple Instances of Caché, rather than the procedure in this section.
  1. Bring the resource group online on one node. This should mount all required disks and allow for the proper installation of Caché.
    1. Check the file and directory ownerships and permissions on all mount points and subdirectories.
    2. Prepare to install Caché by reviewing the Installing Caché on UNIX® and Linux chapter of the Caché Installation Guide.
  2. Create a link from /usr/local/etc/cachesys to the shared disk. This forces the Caché registry and all supporting files to be stored on the shared disk resource you have configured as part of the resource group.
    A good choice is to use a ./usr/local/etc/cachesys/ subdirectory under your installation directory. For example, assuming Caché is to be installed in /cacheprod/cachesys/, specify the following:
    mkdir –p /cacheprod/cachesys/usr/local/etc/cachesys
    mkdir –p /usr/local/etc/
    ln –s /cacheprod/cachesys/usr/local/etc/cachesys /usr/local/etc/cachesys
  3. Run Caché cinstall on the node with the mounted disks. Be sure the users and groups (either default or custom) are available on all nodes in the cluster, and that they all have the same UIDs and GIDs.
  4. Stop Caché and relocate all resources to the other node. Note that Pacemaker does not yet control Caché.
  5. On the second node in the cluster, create the link in /usr/local/etc/ and the links in /usr/bin for ccontrol and csession:
    mkdir –p /usr/local/etc/
    ln –s /cacheprod/cachesys/usr/local/etc/cachesys/ /usr/local/etc/cachesys/
    ln –s /usr/local/etc/cachesys/ccontrol /usr/bin/ccontrol
    ln –s /usr/local/etc/cachesys/csession /usr/bin/cession
  6. Manually start Caché using ccontrol start. Test connectivity to the cluster through the VIP. Be sure the application, all interfaces, any ECP clients, and so on connect to Caché using the VIP as configured here.
  7. Be certain Caché is stopped on all nodes.
  8. Add the Caché resource configured to control your new instance to your cluster, as follows. This example assumes the instance being controlled is named CACHEPROD. See Understanding the Parameters of the Caché Resource for information about the Instance and cleanstop options.
    pcs resource create CacheProd ocf:heartbeat:Cache Instance=CACHEPROD cleanstop=1
  9. The Caché resource (CacheProd) must be configured to collocate and ordered to start after its disk resources and optionally its VIP resource(s). To prevent unexpected stops and restarts, Caché and its collocated resources should be configured to prefer their current location (resource-stickiness=INFINITY) and thus never fail back after a node reboots.
  10. Verify that Caché starts in the cluster.
Installing Multiple Instances of Caché
To install multiple instances of Caché, use the procedure in this section.
Note:
If any Caché instance that is part of a failover cluster is to be added to a Caché mirror, the ISCAgent (which is installed with Caché) must be properly configured; see Configuring the ISCAgent in the “Mirroring” chapter of this guide for more information.
To install Caché on the first node, do the following:
  1. Bring the resource group online on one node. This should mount all required disks and allow for the proper installation of Caché.
    1. Check the file and directory ownerships and permissions on all mount points and subdirectories.
    2. Prepare to install Caché by reviewing the Installing Caché on UNIX® and Linux chapter of the Caché Installation Guide.
  2. Run Caché cinstall on the node with the mounted disks. Be sure the users and groups (either default or custom) are available on all nodes in the cluster, and that they all have the same UIDs and GIDs.
  3. The /usr/local/etc/cachesys directory and all its files must be available to all nodes at all times. To enable this, after Caché is installed on the first node, copy /usr/local/etc/cachesys to each node in the cluster. The following method preserves symbolic links during the copy process:
    cd /usr/local
    rsync -av -e ssh etc root@node2:/usr/local/
  4. Verify that ownerships and permissions on the cachesys directory and its files are identical on all nodes
    Note:
    In the future, keep the Caché registries on all nodes in sync using ccontrol create or ccontrol update, or by copying the directory again; for example:
    ccontrol create CSHAD directory=/myshadow/ versionid=2015.1.475
  5. Stop Caché and relocate all resources to the other node. Note that Pacemaker does not yet control Caché.
  6. On the second node in the cluster, create the links in /usr/bin for ccontrol and csession, as follows
    ln –s /usr/local/etc/cachesys/ccontrol /usr/bin/ccontrol
    ln –s /usr/local/etc/cachesys/csession /usr/bin/csession
    
  7. Manually start Caché using ccontrol start. Test connectivity to the cluster through the VIP. Be sure the application, all interfaces, any ECP clients, and so on connect to Caché using the VIP as configured here.
  8. Be certain Caché is stopped on all nodes.
  9. Add the Caché resource configured to control your new instance to your cluster, as follows. This example assumes the instance being controlled is named CACHEPROD. See Understanding the Parameters of the Caché Resource for information about theInstance and cleanstop options.
    pcs resource create CacheProd ocf:heartbeat:Cache Instance=CACHEPROD cleanstop=1
  10. The Caché resource (CacheProd) must be configured to collocate and ordered to start after its disk resources and optionally its VIP resource(s). To prevent unexpected stops and restarts, Caché and its collocated resources should be configured to prefer their current location (resource-stickiness=INFINITY) and thus never fail back after a node reboots.
  11. Verify that Caché starts in the cluster.
When you are ready to install a second instance of Caché within the same cluster, follow these additional steps:
  1. Configure Red Hat HA to add the disk and IP resources associated with the second instance of Caché.
  2. Bring the new resources online so the disks are mounted on one of the nodes.
  3. Be sure the users and groups to be associated with the new instance are created and synchronized between nodes.
  4. On the node with the mounted disk, run ccinstall following the procedures outlined in the "Installing Caché on UNIX® and Linux" chapter of the Caché Installation Guide.
  5. Stop Caché.
  6. Synchronize the Caché registry using the following steps:
    1. On the install node run
      ccontrol list
    2. Record the instance name, version ID and installation directory of the instance you just installed.
    3. On the other node, run the following command to create the registry entry, using the information you recorded from the recently installed instance:
      ccontrol create instname versionid=vers_ID directory=installdir
  7. Add the Caché resource for this instance to your cluster as follows:
    pcs resource create instname ocf:heartbeat:Cache Instance=instname cleanstop=1
  8. The new Caché resource must be configured to collocate and ordered to start after its disk resources and optionally its VIP resource(s). To prevent unexpected stops and restarts, Caché and its collocated resources should be configured to prefer their current location (resource-stickiness=INFINITY) and thus never fail back after a node reboots.
  9. Verify that Caché starts in the cluster.
Understanding the Parameters of the Caché Resource
The Caché OCF-based resource agent has two parameters that can be configured as part of the resource:
Application Considerations
Consider the following for your applications:
Testing and Maintenance
Upon first setting up the cluster, be sure to test that failover works as planned. This also applies any time changes are made to the operating system, its installed packages, the disk, the network, Caché, or your application.
In addition to the topics described in this section, you should contact the InterSystems Worldwide Response Center (WRC) for assistance when planning and configuring RHEL HA cluster to control Caché. The WRC can check for any updates to the Caché agent, as well as discussing failover and HA strategies.
Failure Testing
Typical full scale testing must go beyond a controlled service relocation. While service relocation testing is necessary to validate that the package configuration and the service scripts are all functioning properly, you should also test responses to simulated failures. Be sure to test failures such as:
Testing should include a simulated or real application load. Testing with an application load builds confidence that the application will recover in the event of actual failure.
If possible, test with a heavy disk write load; during heavy disk writes the database is at its most vulnerable. Caché handles all recovery automatically using its CACHE.WIJ and journal files, but testing a crash during an active disk write ensures that all file system and disk devices are properly failing over.
Software and Firmware Updates
Keep software patches and firmware revisions up to date. Avoid known problems by adhering to a patch and update schedule.
Monitor Logs
Keep an eye on the /var/log/pacemaker.log file and messages file in /var/log/ as well as the Caché cconsole.log files. The Caché agent resource script logs time-stamped information to the logs during cluster events.
Use the Caché console log, the Caché Monitor and the Caché System Monitor to be alerted to problems with the database that may not be caught by the cluster software. (See the chapters Monitoring Caché Using the Management Portal, Using the Caché Monitor and Using the Caché System Monitor in the Caché Monitoring Guide for information about these tools.)