Caché High Availability Guide
Mirroring
[Back] [Next]
   
Server:docs1
Instance:LATEST
User:UnknownUser
 
-
Go to:
Search:    

Traditional high availability and data replication solutions often require substantial capital investments in infrastructure, deployment, configuration, software licensing, and planning. Caché database mirroring is designed to provide an economical solution for rapid, reliable, robust automatic failover between two Caché instances, providing an effective enterprise high-availability solution.

Traditional availability solutions that rely on shared resources (such as shared disk) are often susceptible to a single point of failure with respect to that shared resource. Mirroring reduces that risk by maintaining independent resources on the primary and backup mirror members. Further, by utilizing logical data replication, mirroring avoids the risks associated with physical replication technologies such as SAN-based replication, including out-of-order updates and carry-forward corruption.
Combining InterSystems Enterprise Cache Protocol (ECP) with mirroring provides an additional level of availability; ECP application servers treat a mirror failover as an ECP data server restart, allowing processing to continue uninterrupted on the new primary, which greatly diminishes workflow and user disruption. Configuring the two failover mirror members in separate data centers offers additional redundancy and protection from catastrophic events.
In addition to providing an availability solution for unplanned downtime, mirroring offers the flexibility to incorporate planned downtimes (for example, Caché configuration changes, hardware or operating system upgrades, and so on) on a particular Caché system without impacting the overall Service Level Agreements (SLAs) for the organization.
Finally, in addition to the failover members, a mirror can include async members, which can be configured to receive updates from multiple mirrors across the enterprise. This allows a single system to act as a comprehensive enterprise data warehouse, allowing enterprise-wide data mining and business intelligence using InterSystems DeepSee™. An async member can also be configured for disaster recovery (DR) of a single mirror, which allows it to seamlessly take the place of one of the failover members should the need arise. A single mirror can include up to 16 members, so numerous geographically dispersed DR async members can be configured. This model provides a robust framework for distributed data replication, thus ensuring business continuity benefits to the organization; for more information, see Mirror Outage Procedures in this chapter.
This chapter discusses the following topics:
Mirroring Architecture and Planning
A mirror is a logical grouping of physically independent Caché instances simultaneously maintaining exact copies of production databases, so that if the instance providing access to the databases becomes unavailable, another can take over. A mirror can provide high availability through automatic failover, in which a failure of the Caché instance providing database access (or its host system) causes another instance to take over automatically and immediately.
This section covers the following topics.
Mirror Components
The system hosting a Caché instance configured as part of a mirror is called a mirror member. (The Caché instance itself is sometimes referred to as a mirror member.) There are two types of mirror member:
Two additional components support automatic failover from one failover member to the other:
Failover Mirror Members
To enable automatic failover, the mirror must contain two failover members, physically independent systems each hosting a Caché instance. At any given time, one failover instance acts as primary, providing applications with access to the databases in the mirror, while the other acts as backup, maintaining synchronized copies of those databases in readiness to take over as primary. When the primary Caché instance becomes unavailable, the backup takes over, providing uninterrupted access to the databases without risk of data loss. See Automatic Failover Mechanics for detailed information about the automatic failover process.
Failover members communicate with each other through several communication channels using several mirror member network addresses. External clients typically connect to the mirror through a virtual IP address (VIP), which is always bound to an interface on the current primary. ECP application server connections are automatically redirected to the new primary following failover, so a VIP is not required for ECP connections.
Mirror Failover Members
See Creating a Mirror for information about configuring the failover members of a mirror.
Important:
The two failover members in a mirror are assumed to be coequal; neither is preferred as primary. For this reason, primary and backup must be considered temporary designations only. If a problem is detected on the primary and the backup is available to take over it will do so immediately, even if the problem on the primary might resolve on its own given enough time.
Because network latency between the failover members is an important factor in application performance, the relative physical locations of the failover members and the network connection between them should be chosen to minimize latency in the connection; see Network Latency Considerations for more information.
Async Mirror Members
Async members maintain asynchronous copies of mirrored databases. There are two types of async member, disaster recovery and reporting. A single mirror can include up to 16 members, so you can configure a mirror with a failover pair and up to 14 async members of either type in any combination. A mirror can even be configured with a single failover member to utilize async members without automatic failover.
Important:
Since the data on an async member is continually asynchronously updated with changes from the mirrors to which it is connected, there is no guarantee of synchronization of updates and synchronization of results across queries on the async member. It is up to the application running against the async member to guarantee consistent results for queries that span changing data.
See Configure Async Mirror Members in this chapter for information about adding an async member to a mirror.
Disaster Recovery Asyncs
A mirror can provide disaster recovery capability through a disaster recovery (DR) async member, which can be manually promoted to failover member and even become primary should both failover members become unavailable due to a disaster. A promoted DR can also be useful in performing planned maintenance on or temporarily replacing a failover member, A DR async member can belong to one mirror only, but you can configure as many as you want in a single mirror, up to the mirror member limit of 16.
Multiple DR Async Members Connected to a Single Mirror
Note:
A DR async member is never a candidate for automatic failover, which can be from one failover mirror member to another only.
Reporting Asyncs
A reporting async mirror member maintains read-only or read-write copies of selected databases for purposes such as data mining and business intelligence, and cannot be promoted to failover member. A reporting async can belong to up to 10 mirrors, allowing it to function as a comprehensive enterprise-wide data warehouse bringing together sets of related databases from separate locations.
Single Reporting Async Member Connected to Multiple Mirrors
Single Failover Mirror Configuration
A mirror can also consist of a single failover member and one or more asyncs. This configuration does not provide high availability, but can address other needs. For example, a mirror with a single failover member, at least one DR async member, and some number of reporting asyncs can provide data security and disaster recovery while supporting data collection and warehousing. To provide high availability, the failover member can be located in an OS-level failover cluster or some other high-availability configuration (see the System Failover Strategies chapter of this guide).
Single Failover Member with Multiple Async Members
ISCAgent
A process called the ISCAgent runs on each mirror member’s host system, providing an additional means of communication between mirror members. Most importantly, the ISCAgent provides a means by which one failover member can obtain information about the other when normal communication between the two has been interrupted. The ISCAgent can send data to mirror members that have been down or disconnected. The agent is also involved in failover decisions; for example, a backup that has lost contact with both the primary instance and the arbiter can contact the primary’s ISCAgent (assuming the primary’s host system is still operating) to confirm that the primary instance is truly down before taking over.
The ISCAgent is automatically installed with Caché, if not already installed. When multiple Caché instances belonging to one or more mirrors are hosted on a single system, they share a single ISCAgent.
See the sections Automatic Failover Mechanics and Configuring the ISCAgent in this chapter for detailed information about the role and configuration of the ISCAgent.
Arbiter
The arbiter is an independent system hosting an ISCAgent with which the failover members of a mirror maintain continuous contact, providing them with the context needed to safely make failover decisions when they cannot communicate directly. A single arbiter can serve multiple mirrors, but a single mirror can use only one arbiter at a time. Use of an arbiter is not required, but is strongly recommended as it significantly increases the range of failure scenarios under which automatic failover is possible.
Mirror Failover Members and Arbiter
Configuring a system as arbiter involves minimal software installation and does not require that Caché be installed (although any system hosting a Caché instance of 2015.1 or later can be used). The arbiter uses minimal system resources and can be located on a system that is hosting other services, or even a workstation. The primary requirement concerning the arbiter is that it must be located and configured to minimize the risk of unplanned simultaneous outage of the arbiter and a single failover member; see Locating the Arbiter to Optimize Mirror Availability for more information.
Mirror Synchronization
As described in the Journaling chapter of the Caché Data Integrity Guide, journal files contain a time-sequenced record of the changes made to the databases in a Caché instance since the last backup. Within a mirror, the journal data that records a change made to a database on the primary becomes the basis for making that same change to the copy of the database on the backup and asyncs. Mirrored databases are therefore always journaled on the primary, while on the backup and on DR asyncs they are always read only to prevent updates from other sources. Typically they are read-only on reporting asyncs as well.
When data recording global update operations (primarily Set and Kill operations) on mirrored databases is written to the journal on the primary, the journal records are transmitted to other mirror members. Once the journal records are received on the backup or async member, the operations recorded in them are performed on the databases on that member. This process is called dejournaling. (See Managing Database Dejournaling for important information about managing dejournaling on async members.)
Transfer of journal records from the primary to the backup is synchronous, with the primary waiting for acknowledgement from the backup at key points. This keeps the failover members closely synchronized and the backup active, as described in detail in Backup Status and Automatic Failover. An async, in contrast, receives journal data from the primary asynchronously. As a result, an async mirror member may sometimes be a few journal records behind the primary.
Note:
When a Caché instance becomes a member of a mirror, the following journaling changes to support mirroring occur:
See the Journaling chapter of the Caché Data Integrity Guide for general information about journaling.
Automatic Failover Mechanics
Mirroring is designed to provide safe automatic failover to the backup when the primary fails or becomes unavailable. This section describes the mechanisms that allow that to occur, including:
Requirements for Safe Automatic Failover
The backup Caché instance can automatically take over from the primary only if it can ensure that two conditions are met:
Automatic Failover Rules
This section describes the rules that govern the automatic failover process and ensure that both automatic failover requirements are met.
Note:
The backup does not attempt to become primary under any circumstances unless all mirrored databases marked Mount Required at Startup (see Edit a Local Database’s Properties in the “Managing Caché” chapter of the Caché System Administration Guide) are mounted, activated, and caught up, and therefore ready for use on becoming primary.
Backup Status and Automatic Failover
During normal mirror operation, the journal transfer status of the backup failover member is Active, meaning that it has received all journal data from and is synchronized with the primary. (See Mirror Synchronization for information about how the databases on the failover members are synchronized using journal data and related details; see Monitoring Mirrors for information about monitoring the status of mirror members.) An active backup receives the current journal data as it is written on the primary, and the primary waits for an active backup to acknowledge receipt of journal data before considering that data to be durable. An active backup therefore satisfies the first condition for failover.
If an active backup does not acknowledge receipt of new data from the primary within the Quality of Service (QoS) Timeout, the primary revokes the backup’s active status, disconnects the backup and temporarily enters the trouble state. While in the trouble state, the primary does not commit any new journal data (perhaps causing a pause in the application), allowing time for contact to be restored or for appropriate and safe failover decisions to take place without the two members becoming unsynchronized.
When the backup reconnects to the primary, it first catches up by obtaining all of the most recent journal data from the primary and then becomes active. When the backup has caught up by obtaining the most recent journal data from the primary and acknowledging its receipt, its active status is restored.
Automatic Failover When the Backup is Active
When the backup is active, it is eligible to take over as primary if it can confirm the second condition for failover—that the primary is not operating as primary and can no longer do so without human intervention. The backup can do this in one of three ways:
When the primary is isolated from an active backup by a network event but the backup cannot confirm safe failover conditions in one of these ways, the backup is no longer active and is subject to the failover mechanics described in the following section.
Automatic Failover When the Backup is Not Active
A backup that is not active can attempt contact the primary’s ISCAgent to confirm that the primary instance is down or force it down if it is hung, and to obtain the primary’s most recent journal data from the agent. If successful on both counts, the backup can safely take over as primary. After failover, the new primary restarts the former primary so it can become the new backup.
A backup that is not active and cannot contact the primary’s ISCAgent has no way to ensure that the primary can no longer act as primary and that it has the latest journal updates from the primary, and therefore cannot take over.
The arbiter plays no role in failover mechanics when the backup is not active.
Mirror Response to Various Outage Scenarios
This section summarizes the mirror’s response to outages of the failover members and arbiter in different combinations.
Note:
It is possible for an operator to temporarily bring the primary system down without causing a failover to occur (see Avoiding Unwanted Failover During Maintenance of Failover Members). This can be useful, for example, in the event the primary system needs to be brought down for a very short period of time for maintenance. After bringing the primary system back up, the default behavior of automatic failover is restored.
Several of the scenarios discussed here refer to the option of manually forcing the backup to become primary. For information about this procedure, see Unplanned Outage of Primary Failover Member Without Automatic Failover.
Automatic Failover in Response to Primary Outage Scenarios
While circumstances and details vary, there are several main primary outage scenarios under which an active backup failover member automatically takes over, as follows:
  1. A planned outage of the primary, for example for maintenance purposes, is initiated by shutting down its Caché instance.
    Automatic failover occurs because the active backup is instructed by the primary to take over.
  2. The primary Caché instance hangs due to an unexpected condition.
    Automatic failover occurs because the primary detects that it is hung and instructs the active backup to take over.
  3. The primary Caché instance is forced down or becomes entirely unresponsive due to an unexpected condition.
    Under this scenario, the primary cannot instruct the backup to take over. However, an active backup takes over either after learning from the arbiter that it has also lost contact with the primary or by contacting the primary’s ISCAgent and obtaining confirmation that the primary is down.
  4. The primary’s storage subsystem fails.
    A typical consequence of a storage failure is that the primary instance hangs due to I/O errors, in which case the primary detects that it is hung and instructs the active backup to take over (as in scenario 2). Under some circumstances, however, the behavior described under scenario 3 or scenario 5 may apply.
  5. The primary’s host system fails or becomes unresponsive.
    Automatic failover occurs if the active backup learns from the arbiter that it has also lost contact with the primary.
    If no arbiter is configured or if the arbiter became unavailable prior to the primary host failure, automatic failover is not possible; under these circumstances, manually forcing the backup to become primary may be an option.
  6. A network problem isolates the primary.
    If an arbiter is configured and both failover members were connected to it at the time of the network failure, the primary enters the trouble state indefinitely.
    If no arbiter is configured or one of the failover members disconnected from it before the network failure, automatic failover is not possible and the primary continues running as primary.
A backup that is not active (because it is starting up or has fallen behind) can take over under scenarios 1 through 4 above by contacting the primary’s ISCAgent and obtaining the most recent journal data. A backup that is not active cannot take over under scenarios 5 and 6 because it cannot contact the ISCAgent; under these circumstances; manually forcing the backup to become primary may be an option.
Important:
In versions of Caché prior to 2015.1, the backup Caché instance had no way under scenarios 5 and 6 to ensure that the primary could no longer act as primary and therefore, by default, could not take over automatically under these scenarios. In these older versions, under special conditions, automatic failover could be enabled for these scenarios by adding a user-provided IsOtherNodeDown() entry point to the user-defined ^ZMIRROR routine and clearing the Agent Contact Required for Failover setting. Starting in 2015.1, this mechanism no longer exists and is replaced by the arbiter-based failover mechanics described here. For more information on this change, see the Caché 2015.1 Upgrade Checklist.
Effect of Arbiter Outage
An outage of the arbiter has no direct effect on the availability of the mirror. However, if primary outage scenarios 5 or 6 in Automatic Failover in Response to Primary Outage Scenarios occur before the arbiter is restored, the backup cannot take over automatically.
Effect of Backup Outage
Some applications may experience a brief pause (approximately the QoS timeout) before the primary can resume processing. If no arbiter is configured, or if the arbiter became unavailable prior to the backup outage, the pause experienced may be slightly longer (about three times the QoS timeout). If a primary outage occurs before the backup is restored, the result is a total mirror outage.
Effect of Combined Primary and Arbiter Outage
The consequences of this scenario are covered in Automatic Failover in Response to Primary Outage Scenarios. In brief, if the backup can contact the primary’s ISCAgent, it takes over; if not, the result is a total mirror outage, and manual intervention to force the backup to become primary may be an appropriate option.
Effect of Combined Backup and Arbiter Outage
If the backup and arbiter become unavailable simultaneously (or nearly simultaneously), the primary remains in trouble state indefinitely, because it assumes it is isolated and the backup could therefore have become primary. The result is a total mirror outage. When the backup becomes available again it contacts the primary, which then resumes operation as primary. Alternatively, the primary can be forced to resume through manual intervention. If the backup and arbiter fail in sequence, the primary continues operating as primary, after the brief pause described in Effect of Backup Outage, because it knows the backup cannot have become primary.
Effect of Combined Primary and Backup Outage
The result of this combination is always a total mirror outage. See Unplanned Outage of Both Failover Members for available options in this situation.
Locating the Arbiter to Optimize Mirror Availability
Together, the failover members and arbiter provide the mirroring high availability solution (with the arbiter playing the least significant role). The arbiter is not a quorum mechanism, but rather supports each failover member in arbitrating automatic failover by providing context when it loses contact with the other failover member; as long as both failover members are in contact with the arbiter immediately prior to a primary outage of any kind and the backup remains in contact with the arbiter, automatic failover can occur. While failure of the arbiter does eliminate the possibility of automatic failover under some circumstances, it does not prevent the mirror from operating while a replacement is configured, or from providing automatic failover under many primary outage scenarios, for example scenarios 1 through 4 in Automatic Failover in Response to Primary Outage Scenarios.
For these reasons, the arbiter need not be any more highly available than either of the failover members are independently, but only located and configured so that the risk of unplanned simultaneous outage of the arbiter and a single failover member is minimized. (If both failover members fail, the mirror fails and the status of the arbiter does not matter, so risk of simultaneous outage of all three is not a consideration.)
Based on this requirement, InterSystems recommends that, in general, the arbiter be separated from the failover members to the same extent to which they are separated from each other. Specifically,
A single system can be configured as arbiter for multiple mirrors, provided its network location is appropriate for each.
The arbiter need not be hosted on a newly deployed or dedicated system; in fact, an existing host of well-established reliability may be preferable. A reporting async mirror member (see Reporting Asyncs) can serve as a suitable host. Hosting on a DR async, however, should be avoided, as promotion of the DR async (see Promoting a DR Async Member to Failover Member) under a maintenance or failure scenario could lead to the arbiter being hosted on a failover mirror member, an incorrect configuration.
Note:
As noted in Installing the Arbiter, any system with a running ISCAgent of version 2015.1 or later can be configured as arbiter, including one that hosts one or more instance of Caché. However, a system hosting one or more failover or DR async members of a mirror should not be configured as arbiter for that mirror.
Automatic Failover Mechanics Detailed
This section provides additional detail on the mechanics of failover.
The mirror’s response to loss of contact between the failover members or between a failover member and the arbiter is supported by the use of two different mirror failover modes, as follows:
Agent Controlled Mode
When a mirror starts, the failover members begin operation in agent controlled mode. If the arbiter is not available or no arbiter is configured, they remain in this mode. When in agent controlled mode, the failover members respond to loss of contact with each other as described in the following.
Primary’s Response to Loss of Contact
If the primary loses its connection to an active backup, or exceeds the QoS timeout waiting for it to acknowledge receipt of data, the primary revokes the backup’s active status and enters the trouble state, waiting for the backup to acknowledge that it is no longer active. When the primary receives acknowledgement from the backup or the trouble timeout (which is two times the QoS timeout) expires, the primary exits the trouble state, resuming operation as primary.
If the primary loses its connection to a backup that is not active, it continues operating as primary and does not enter the trouble state.
Backup’s Response to Loss of Contact
If the backup loses its connection to the primary, or exceeds the QoS timeout waiting for a message from the primary, it attempts to contact the primary’s ISCAgent. If the agent reports that the primary instance is still operating as primary, the backup reconnects. If the agent confirms that the primary is down or that it has forced the primary down, the backup behaves as follows:
Whether it is active or not, the backup can never automatically take over in agent controlled mode unless the primary itself confirms that it is hung or the primary’s agent confirms that the primary is down (possibly after forcing it down), neither of which can occur if the primary’s host is down or network isolated.
Note:
When one of the failover members restarts, it attempts to contact the other's ISCAgent and its behavior is as described for a backup that is not active.
Arbiter Controlled Mode
When the failover members are connected to each other, both are connected to the arbiter, and the backup is active, they enter arbiter controlled mode, in which the failover members respond to loss of contact between them based on information about the other failover member provided by the arbiter. Because each failover member responds to the loss of its arbiter connection by testing its connection to the other failover member, and vice versa, multiple connection losses arising from a single network event are processed as a single event.
In arbiter controlled mode, if either failover member loses its arbiter connection only, or the backup loses its active status, the failover members coordinate a switch to agent controlled mode and respond to further events as described for that mode.
If the connection between the primary and the backup is broken in arbiter controlled mode, each failover member responds based on the state of the arbiter connections as described in the following.
Primary Loses Connection to Backup
If the primary loses its connection to an active backup, or exceeds the QoS timeout waiting for it to acknowledge receipt of data, and learns from the arbiter that the arbiter has also lost its connection to the backup or exceeded the QoS timeout waiting for a response from the backup, the primary continues operating as primary and switches to agent controlled mode.
If the primary learns that the arbiter is still connected to the backup, it enters the trouble state and attempts to coordinate a switch to agent controlled mode with the backup through the arbiter. When either the coordinated switch is accomplished, or the primary learns that the backup is no longer connected to the arbiter, the primary returns to normal operation as primary.
If the primary has lost its arbiter connection as well as its connection to the backup, it remains in the trouble state indefinitely so that the backup can safely take over. If failover occurs, when the connection is restored the primary shuts down.
Note:
The trouble timeout does not apply in arbiter controlled mode.
Backup Loses Connection to Primary
If the backup loses its connection to the primary, or exceeds the QoS timeout waiting for a message from the primary, and learns from the arbiter that the arbiter has also lost its connection to the primary or exceeded the QoS timeout waiting for a response from the primary, the backup takes over as primary and switches to agent controlled mode. When connectivity is restored, the new primary restarts the former primary (if it is down) or forces it down and restarts it (if it is in the trouble state), allowing it to become backup.
If the backup learns that the arbiter is still connected to the primary, it no longer considers itself active, switches to agent controlled mode, and coordinates with the primary’s switch to agent controlled mode through the arbiter; the backup then attempts to reconnect to the primary.
If the backup has lost its arbiter connection as well as its connection to the primary, it switches to agent controlled mode and attempts to contact the primary’s ISCAgent per the agent controlled mechanics.
Mirror Responses to Lost Connections
The following illustration describes the mirror’s response to all possible combinations of lost connections in arbiter controlled mode. The first three situations represent network failures only, while the others could involve, from a failover member’s viewpoint, either system or network failures (or a combination). The descriptions assume that immediately prior to the loss of one or more connections, the failover members and arbiter were all in contact with each other and the backup was active.
The mirror's response to most combinations of connection losses in arbiter controlled mode is to switch to agent controlled mode. Therefore, once one failure event has been handled, responses to a subsequent event that occurs before all connections are reestablished are governed by the behavior described for agent controlled mode, rather than this illustration.
Mirror Responses to Lost Connections in Arbiter Mode
Preventing Automatic Failover
If you want to prevent a mirror from automatically failing over under any circumstances, the best approach is to configure a single failover member with one or more DR asyncs (see Async Mirror Members). A DR async never takes over automatically but can easily be promoted to failover member, including to primary when desired (see Promoting a DR Async Member to Failover Member).
To temporarily prevent automatic failover to backup during maintenance activity, you can temporarily demote the backup to DR async or use the nofailover option; both are described in Planned Outage Procedures, which provides procedures for performing maintenance on failover members without disrupting mirror operation.
If you require application intervention at various points in the automatic failover process, see Using the ^ZMIRROR Routine.
Mirroring Communication
This section discusses the details of communication between mirror members, including:
Network Configuration Considerations
The following general network configuration items should be considered when configuring the network between two failover members:
Mirror synchronization occurs as part of the journal write cycle on the primary failover member. It is important to allow the journal write cycle and, therefore, the mirror synchronization process to complete as soon as possible. Any delays in this process can result in performance degradation.
Note:
See Configuring a Mirror Virtual IP (VIP) for important networking requirements and considerations when using a VIP.
Network Latency Considerations
There is no hard upper limit on network latency between failover members. The impact of increasing latency differs by application. If the round trip time between the failover members is similar to the disk write service time, no impact is expected. Round trip time may be a concern, however, when the application must wait for data to become durable (sometimes referred to as a journal sync). In non-mirrored environments, the wait for data to become durable includes a synchronous disk write of journal data; in mirrored environments with an active backup, it also includes a network round trip between the failover members. Many applications never wait for data to become durable, while others wait frequently.
The mechanisms by which an application waits can include the following:
Whether the round trip time, even if relatively large, negatively affects application response time or throughput depends on the frequency with which the above occur within the application, and whether the application processes such activity in serial or in parallel.
When network latency between mirror members becomes an issue, you may be able to reduce it by fine-tuning the operating system TCP parameters that govern the maximum values of SO_SNDBUF and SO_RCVBUF, allowing the primary and backup/asyncs to establish send and receive buffers, respectively, of appropriate size, up to 16 MB. The buffer size required can be calculated by multiplying the peak bandwidth needed (see Incoming Journal Transfer Rate) by the round trip time, and roughly doubling the product for protocol overhead and future growth. For example, suppose the following conditions apply:
In this case, 60 MB * 0.05 * .33 * 2 = 2 MB minimum buffer size. There is little reason to keep the buffer size as low as possible, so an even larger minimum could be tried in this situation without concern.
Journal Data Compression
When creating or editing a mirror, you can select one of three compression settings for journal data to be transmitted from the primary to the backup, and separately for journal data to be transmitted from the primary to async members, as follows:
Choosing Uncompressed is desirable if the vast majority of the volume of database updates consist of data that is already highly compressed or encrypted, where the overall efficacy of compression is expected to be very low. In that case, CPU time may be wasted on compression. Examples include compressed images, other compressed media, or data that is encrypted before it is set into the database (using Caché's data element encryption or another encryption methodology). Use of Caché database encryption or journal encryption is not a factor in selecting compression.
Both compression and SSL encryption introduce some computational overhead that affects both throughput and latency. The overhead introduced by each is similar, but when SSL encryption is used, the addition of compression can actually reduce that overhead and improve performance by reducing the amount of data that needs to be encrypted. The specifics vary by operating system, CPU architecture, and the compressibility of application data. More specifically:
Mirror Member Network Addresses
Mirror members use several network addresses to communicate with each other. These are described in this section and referred to in Sample Mirroring Architecture and Network Configurations. Note that the same network address may be used for some or all of the mirror addresses described here.
While it is optional to configure SSL/TLS for mirror communication between the addresses described here, it is highly recommended, because sensitive data passes between the failover members, and SSL/TLS provides authentication for the ISCAgent, which provides remote access to journal files and can force down the system or manipulate its virtual IP address. For more information, see Securing Mirror Communication with SSL/TLS Security.
Sample Mirroring Architecture and Network Configurations
This section describes and illustrates several sample mirroring architectures and configurations.
Some diagrams depict a disaster recovery (DR) async member and a reporting async member in variety of locations. One or both may be omitted, multiples of each are allowed, and in general the locations depicted in different diagrams may be combined.
For purposes of illustration, sample IPv4 addresses on the organization's internal network are shown. Assume that subnets are specified by 24 bits (that is, CIDR notation a.b.c.d/24 or netmask 255.255.255.0) so addresses that are depicted on the same subnet will differ only in the fourth dot-delimited portion.
Equivalent DNS names may also be specified in place of IP addresses in the mirror configuration, except for the mirror virtual IP (VIP) address, which must be an IP address.
Mirroring Configurations within a Single Data Center, Computer Room, or Campus
The following diagrams illustrate a variety of mirroring configurations typical within a data center, computer room, or campus. Each diagram describes the appropriate network topology, and the relationship to the network addresses specified in the mirror configuration. Variations are described, and may be particularly applicable when mirror members reside in multiple locations within the campus.
Simple Failover Pair
This is the simplest mirror configuration. The failover members communicate with each other over a private network while external connections to them are made over a public network, optionally through a mirror virtual IP (VIP). The arbiter is on the external network (as recommended in Locating the Arbiter to Optimize Mirror Availability), but since it is always the failover members that initiate connections to the arbiter, the VIP is not involved in these connections.
The following IP addresses are used in this configuration:
Notes:
  1. A VIP requires both failover members to be on the same subnet.
  2. While not required for mirroring, the separate, private LAN for mirror communication depicted here is recommended for optimal control of network utilization. If such a LAN is not used, the mirror private addresses in the mirror configuration should be changed to use the addresses depicted on green backgrounds. Although the mirror private addresses as shown imply that the members are on the same subnet of this network, this is not required.
Failover Pair with DR and Reporting Homogeneously Connected
This configuration allows maximum functional flexibility for the DR async, allowing it to be promoted to replace a failover member that is down for maintenance or repair, in addition to providing disaster recovery capability. The promoted DR can function fully as backup or primary and participates in the VIP. The failover members and DR are on the same public-facing subnet for the VIP. Their private network addresses, if used, are accessible to one another (if not the same subnet, then by routing). Network topology and latency may place constraints on the physical separation possible between the DR and the two failover members.
The following IP addresses are used in this configuration:
Notes:
  1. All members that may hold or acquire the VIP must be on the same subnet.
  2. A separate, private LAN for mirror communication as depicted here is not required for mirroring, but is recommended for optimal control of network utilization. If such a LAN is not used, the mirror private addresses should be changed in the mirror configuration to use the addresses depicted in green. Although the depicted mirror private addresses imply that the members are on the same subnet of this network, this is not required.
  3. Since reporting members can never become primary, they make only outgoing connections on the mirror private network. Therefore that address need not be separately specified in the mirror configuration.
Failover Pair with DR and Reporting Anywhere on Campus
This configuration allows maximum flexibility in the locations of async members and the network connecting them. Since the DR in this configuration is not assumed to be on the VIP subnet, some alternative means must be used to redirect user connections to the DR during disaster recovery; for example, manually updating the DNS name to point to the DR async’s IP instead of the VIP, or configuring one of the mechanisms discussed in Redirecting Application Connections Following Failover. Additionally, since the DR member is not assumed to have connectivity to the mirror private network (if used), it can be promoted only when no failover member is in operation, and only to become primary.
The following IP addresses are used in this configuration:
Notes:
  1. Any member that is to acquire the Virtual IP must be on the same subnet.
  2. A separate, private LAN for mirror communication is depicted here but not required. If such a LAN not used, the mirror private addresses should be changed in the mirror configuration to use the addresses depicted in green. Although the depicted mirror private addresses imply that the failover members are on the same subnet of this network, this is not required.
Mirroring for Disaster Recovery and Reporting Only
This configuration uses mirroring to provide DR and/or reporting capabilities only. High availability is provided for the single failover member using OS failover clustering, virtualization HA or other infrastructure-level options as described in the System Failover Strategies chapter of this guide. Since mirroring is not used for automatic failover in this configuration, no VIP is depicted. If desired, a VIP can be configured for use during disaster recovery, but this requires the DR member to be on the same subnet as the failover member. Otherwise, alternative technology or procedures such as those discussed in Redirecting Application Connections Following Failover must be used to redirect user connections to the DR during disaster recovery.
The following IP addresses are used in this configuration:
Notes:
  1. A separate, private LAN for mirror communication is depicted here but not required. If such a LAN is not used, the mirror private addresses should be changed in the mirror configuration to use the addresses depicted in green. Although the depicted mirror private addresses imply that the failover members are on the same subnet of this network, this is not required.
  2. Since reporting members can never become primary, they make only outgoing connections on the mirror private network. Therefore that address need not be separately specified in the mirror configuration.
Mirroring with ECP
This diagram depicts ECP application servers added to a mirrored environment. While increasing complexity, the ECP tier allows horizontal scalability and preserves user sessions across database server failover.
The following IP addresses are used in this configuration:
Notes:
  1. ECP application servers do not use the VIP and will connect to any failover member or promoted DR member that becomes primary, so the VIP is used only for users' direct connections to the primary, if any. A VIP requires both failover members to be on the same subnet. In order for the DR member to acquire the VIP when promoted, it must also reside on the same subnet; if it does not, see Redirecting Application Connections Following Failover.
  2. The private LANs for both ECP and mirror communication shown here, while not required, are recommended for both optimal control of network utilization and ECP data privacy. Configurations with fewer networks are possible by collapsing one of the networks into another. Although the private addresses shown imply that the members are on the same subnets of these networks, the only requirement is that the addresses are routable between one another.
    When considering network layout, bear in mind that all async members require connectivity to the primary on either the primary's mirror private address or its superserver address. Thus in the depicted configuration, an async member that has access only to the green user network will not function.
  3. Since reporting members can never become primary, they make only outgoing connections on the mirror private network. Therefore that address need not be separately specified in the mirror configuration.
Mirroring Configurations For Dual Data Centers and Geographically Separated Disaster Recovery
The following diagrams depict HA and DR configurations utilizing geographical separation for recovery from disasters affecting a data center, campus, or geographic region. Reporting members are omitted from these diagrams for simplicity of illustration, but may be added in either of the locations just as depicted in the single campus configurations.
All of the following configurations require a strategy for redirecting connections to the primary when a member in the other location becomes primary. For geographically separated locations, a VIP may be difficult or impossible to configure because it requires the subnet to be stretched between the two locations. Even if configured, it may not be sufficient, as described in the paragraphs that follow. Alternative technology, hardware, or procedures, such as those discussed in Redirecting Application Connections Following Failover, provide other means of redirecting connections. Whether utilizing a stretched subnet or not, a VIP is extremely useful for automatic failover between two members within a single data center, and its use is depicted in these diagrams for that purpose.
A stretched subnet for VIP is typically useful for internal intranet applications. With it, users and systems with a connection, or VPN access, to the LAN/WAN depicted in green can access the primary in either location over its VIP.
For internet-facing applications, on the other hand, a stretched subnet for VIP does not provide a solution for connectivity in a disaster. The main data center’s DMZ presents the application's internet-facing IP address and/or DNS names as a proxy for the internal mirror VIP; in the event of a disaster, they may need to be externally transferred to the other data center. Solutions involve either sophisticated external routing or one of the techniques described in Redirecting Application Connections Following Failover, any one of which obviates the need for a stretched subnet.








Failover Pair with Local DR and Geographically Separated DR
The local DR async provides contingency for events affecting one or both of the failover members. The local DR can be promoted to replace one of the failover members that is down for maintenance or repair, or to recover from a disaster affecting both failover members. The geographically separated DR is used to recover from disasters affecting the entire main data center or campus.
The following IP addresses are used in this configuration:
Notes:
  1. See preceding discussion of VIP.
  2. When possible, making the mirror private network (if used at all) accessible to the DR data center through the data center interconnect (WAN) offers some additional functional flexibility for member J. This does not require stretching the subnet, only that the traffic on this network is routed between the data centers. In this configuration, when J is promoted, it can connect as backup to the primary in the main data center. If the DR does not have access to the mirror private network, it can be promoted only to function as primary, and that only when no failover member is in operation. The flexibility mentioned here is primarily useful in configurations in which the VIP is stretched and the application is not substantially impacted by latency between the data centers.
Failover Pair with Geographically Separated, Fully Redundant DR Environment
In the event of disaster affecting Data Center 1, two DR members in Data Center 2 are promoted, providing a completely redundant alternate HA environment. The following IP addresses are used in this configuration:
Notes:
  1. See preceding discussion of VIP. This illustration does not assume a stretched subnet; instead, upon transitioning to Data Center 2, the mirror is to be reconfigured to use a different VIP for subsequent automatic failovers within that data center. External technology, hardware, or procedures, as discussed in Redirecting Application Connections Following Failover, are then used to redirect connections to the new VIP address.
  2. When possible, giving both data centers access to the mirror private network (if used) through the data center interconnect (WAN) adds functional flexibility. This does not require stretching the subnet, only that the traffic on this network is routed between the data centers. In that configuration, a promoted DR member in one data center can connect as backup to the primary in the other. This is useful mainly in configurations in which the VIP is stretched and the application is not substantially impacted by latency between the data centers. (If the DR has no access to the mirror private network, it can be promoted only to function as primary, and that only when no failover member is operating.)
  3. In the event that Data Center 1 is completely offline and members J and K are promoted to failover members, a new arbiter can be made available in Data Center 2 and the mirror configuration can be updated with the IP address of the new arbiter. The depicted configuration is not intended to operate for extended periods with two failover members in opposite data centers; if operated in this manner, an arbiter in a separate, third location (the internet in this depiction) is recommended. See Locating the Arbiter to Optimize Mirror Availability for more details.
Geographically Separated Failover Pair
This configuration utilizes two machines in separate locations to achieve both high availability and disaster recovery needs with minimal hardware. Network latency between the failover members is an important consideration, but its impact, if any, depends on the application; see Network latency considerations for more information.
Mirroring does not prefer one failover member over another to act as primary, and a failover may occur as a result of any type of outage, even if the problem on the primary turns out to have been transient. Therefore, this configuration is best used with no implicit preference for the primary running in a particular data center.
Use of a VIP may or may not be possible in this configuration for reasons described in the preceding discussion. Since failover between the two data centers happens automatically, any alternative strategy employed must provide rapid and automatic redirection of users to the new primary; strategies that require manual intervention are typically not sufficient.
The following IP addresses are used in this configuration:
Notes:
  1. The arbiter is best placed in a third location in this configuration. See Locating the Arbiter to Optimize Mirror Availability for more details.
  2. A private network for mirror communication running over a data center interconnect (WAN) is depicted here but not required.
Redirecting Application Connections Following Failover or Disaster Recovery
When the backup failover member becomes primary through automatic failover or a DR async is manually promoted to primary as part of disaster recovery, some mechanism for redirecting application connections to the new primary is required. There are numerous ways to accomplish this, some of which are discussed in detail in this chapter. One solution may apply to both automatic failover and DR promotion, or solutions may be combined, for example a mirror VIP for automatic failover and DNS update for DR promotion.
Built-in Mechanisms
The following mechanisms can be included in the mirror configuration, as shown in Sample Mirroring Architecture and Network Configurations, to address application redirection:
External Technologies
The following mechanisms can be implemented in conjunction with mirroring to address application redirection:
Planning a Mirror Virtual IP (VIP)
As described in Built-in Mechanisms, when a mirror VIP is in use and a member becomes primary, the VIP is reassigned to the new primary, which allows all external clients and connections to interact with a single static IP regardless of which failover member is currently serving as primary.
During the failover process, connected clients that experience a network disconnect are able to reconnect once the backup has become primary. If a VIP is configured, the backup completes the failover only if it is successfully able to assign the VIP; otherwise, the failover process is aborted and the mirror requires manual intervention.
In preparing to set up a mirror VIP, consider the following:
Important:
If you are configuring a mirror VIP on a Windows Vista, Windows 7, or Windows Server 2008 system, and clients will connect to the failover members from different subnets, you must install and start the NDISISC driver after configuring the VIP. To do so, use the following procedure:
  1. Within Windows Control Panel, open Network and Sharing Center, then select Change adapter settings to display the Network Connections panel.
  2. Right-click the network adapter (interface) matching the interface name configured for the mirror VIP and select Properties.
  3. On the Properties dialog, click Install... then select Protocol and click Add....
  4. On the next dialog, select Have Disk, then browse to and select the file install-dir\ndis\ndis.inf. Click OK and confirm that you want to install the driver.
  5. As an Administrator, issue the command sc start ndisisc on the command line to start the NDISISC driver.
Contact the InterSystems Worldwide Response Center (WRC) with any questions about installing the NDIS driver.
If one or more of a mirror’s members is a non-root Caché instance on a UNIX® or Linux system, as described in Caché Non-root Installation in the chapter “Installing Caché on UNIX® and Linux” in the Caché Installation Guide, a mirror VIP cannot be used.
If one or more of a mirror’s members is a Caché instance running on an Oracle Solaris non-global zone with ip-type=shared, a mirror VIP cannot be used.
Mirroring in a Virtualized Environment
The use of mirroring in a virtualized environment, in which the Caché instances constituting a mirror are installed on virtual hosts, creates a hybrid high availability solution combining the benefits of mirroring with those of virtualization. While the mirror provides the immediate response to planned or unplanned outages through automatic failover, virtualization HA software automatically restarts the virtual machine hosting a mirror member following an unplanned machine or OS outage. This allows the failed member to quickly rejoin the mirror to act as backup (or to take over as primary if necessary).
When a mirror is configured in a virtualized environment, the following recommendations apply:
Note:
For guidelines concerning configuration, system sizing and capacity planning in general when deploying Caché 2015.1 and later in a VMware ESXi 5.5 and later environment, written by an InterSystems senior technology architect, see InterSystems Data Platforms and performance – Part 9 Caché VMware Best Practice Guide on InterSystems Developer Community.
Limiting Access to the Backup Failover Member
While the system hosting the backup failover member of a mirror may have unused resources or capacity, or you may want to run read only queries on its mirrored databases, InterSystems recommends the best practice of dedicating the host to its role as backup mirror member only. Any mirror-related or non-mirror use of the backup can have the following effects:
For these reasons, an async member, not the backup, should be used if user activity must be offloaded from the primary.
Installing Multiple Mirror Members on a Single Host
The Caché instances that make up a mirror are typically installed on separate physical or virtual hosts, but this is not a requirement. Assuming the capacity of the system is sufficient to handle the resource loads involved without incurring reduced performance, multiple mirror members can be installed on the same host; individual circumstances will determine whether this is feasible, and how many mirror members can be cohosted.
When cohosting multiple failover members, bear in mind that failover mirroring assumes that the members are coequal; there is no preferred primary member. For this reason, the best practice when placing failover member instances on separate hosts is to make the hosts as similar as possible and roughly equal in capacity. Cohosting failover members has the potential to go outside the bounds of this model. For example, if five mirrors are created on five separate hosts and then five Caché instances on one host are added to the mirrors as second failover members, the mirrors may initially operate with primaries on separate hosts and all backups cohosted on a single system. But if there are two simultaneous or nearly simultaneous outages resulting in failover, the single system is now hosting two primaries and three backups, which may be too large a load for it to handle with adequate performance.
When cohosting multiple mirror members, ensure that each mirror uses a unique set of ports on each machine (see Mirror Member Network Addresses), and ensure that other mirror members that are not cohosted use the same ports. For example, two primaries running on two separate hosts might both use port 1972, but if they are both replaced by cohosted DR asyncs, as described in the previous item, the new primaries cannot do so. If one primary uses port 1972 and another 1973 and these same ports are configured on the asyncs, the asyncs are ready for simultaneous promotion, and when it happens client can access the mirror using the same ports as before the outages. In addition, each mirror must have its own VIP.
When multiple Caché instances belonging to one or more mirrors are cohosted, they share a single ISCAgent.
The cohosting of mirror members has no impact on the network location of the arbiter for each mirror, as described in Locating the Arbiter to Optimize Mirror Availability. The mirrors involved can share an arbiter or use separate arbiters, as long as the failover members and arbiter(s) are appropriately located.
Configuring Mirroring
This section provides information and procedures for setting up, configuring and managing mirrors and mirror members.
Mirror Configuration Guidelines
In order to provide a robust, economical HA solution, mirroring is designed to be adaptable to a wide range of system configurations and architectures. However, InterSystems recommends that you adhere to the following general configuration guidelines:
Installing the Arbiter
To extend automatic failover to the widest possible range of outage scenarios, as described in Automatic Failover Mechanics, InterSystems recommends that you configure an arbiter for each mirror. As detailed in Locating the Arbiter to Optimize Mirror Availability, the recommended network location for the arbiter depends on the locations of the failover members. A single system can be configured as arbiter for multiple mirrors, provided its location is appropriate for each.
To act as arbiter, a system must have a running ISCAgent process of version 2015.1 or later. Because the ISCAgent is installed with Caché, any system that hosts one or more instances of Caché version 2015.1 or later meets this requirement and can be configured as arbiter without further preparation; however, a system hosting one or more failover or DR async members of a mirror should not be configured as arbiter for that mirror.
Systems that do not host a Caché instance of version 2015.1 or later (excluding OpenVMS systems) can be prepared to act as arbiter by installing the ISCAgent using a kit for this purpose; if the system hosts a pre-2015.1 Caché instance, the kit upgrades the existing ISCAgent. To prepare such a system, download the ISCAgent installation kit appropriate to your arbiter system’s platform from InterSystems and then, to install or upgrade the ISCAgent:
Note:
There are ISCAgent installation kits for all platforms on which Caché is supported, with the exception of OpenVMS; see InterSystems Supported Platforms for a list of supported platforms. To prepare an OpenVMS system to act as arbiter, you must install a Caché instance of version 2015.1 or later.
Ensure that the ISCAgent process on the arbiter system is configured to start at system startup; see Starting and Stopping the ISCAgent for more information.
Starting the ISCAgent
A Caché instance cannot be added to a mirror as a failover or DR async member unless the ISCAgent process is running on its host system. The ISCAgent must be configured to start automatically at system startup; see Starting and Stopping the ISCAgent for more information.
Securing Mirror Communication with SSL/TLS Security
To provide security within a mirror, you can configure its members to use SSL/TLS when communicating with each other. When you require the use of SSL/TLS when creating the mirror, all members must use SSL/TLS for all communication between them.
See Creating a Mirror for information about creating a mirror with SSL/TLS security; see Editing or Removing a Failover Member for information about adding SSL/TLS security to an existing mirror.
For a single, comprehensive step-by-step guide to creating a mirror with SSL/TLS security, written by an InterSystems Senior Support Specialist, see Creating SSL-Enabled Mirror Using Public Key Infrastructure (PKI) on InterSystems Developer Community.
Important:
Use of SSL/TLS with mirroring is highly recommended. Disabling SSL/TLS for a mirror is strongly discouraged.
The use of SSL/TLS for mirror communication by a mirror member requires proper SSL/TLS setup on the system hosting the mirror member instance; see Creating and Editing SSL/TLS Configurations for a Mirror in the “Using SSL/TLS with Caché” chapter of the Caché Security Administration Guide for more information.
The use of encrypted journal files in mirroring also requires preparation; for detailed information about journal encryption, see Activating Journal Encryption in a Mirror in this chapter and the Managed Key Encryption chapter of the Caché Security Administration Guide.
Using the ^MIRROR Routine
Most mirroring configuration, management and status operations are available in the management portal and also in the ^MIRROR routine, which is executed in the %SYS namespace. However, some operations are available only in the ^MIRROR routine, including forcing the backup failover member to become the primary failover member (see Unplanned Outage of Primary Without Automatic Failover). The procedures provided in this chapter describe the management portal operation if available, but the ^MIRROR option providing the equivalent operation is always noted.
Creating a Mirror
Creating a mirror involves configuring the primary failover member, typically a backup failover member (although this is not required), and optionally one or more async members. After the mirror is created, you can add databases to the mirror.
Important:
Before you can add a Caché instance to a mirror as failover member or async, you must ensure that the ISCAgent process has been started as described in the Starting and Stopping ISCAgent section in this chapter.
The procedure for adding backup and async members requires an additional step if, as recommended by InterSystems, you configure the mirror to use SSL/TLS (see Securing Mirror Communication with SSL/TLS Security). When this is the case, each new member must be approved on the primary before joining the mirror.
To create and configure a mirror, use the following procedures:
After you have created the mirror and configured the failover members and optionally one or more async members, add databases to the mirror using the procedures in the Adding databases to a mirror section of this chapter.
Create a Mirror and Configure the First Failover Member
The following procedure describes how to create a mirror and configure the first failover member.
  1. Navigate to the [Home] > [Configuration] > [Create Mirror] page of the management portal on the first failover member and click Create a Mirror. If the option is not active, mirroring has not been enabled; first click Enable Mirror Service, then select the Service Enabled check box and click Save, then select the Create a Mirror option.
  2. On the [Home] > [Configuration] > [Create Mirror] page, enter the following information in the Mirror Information section:
    1. Mirror Name — Enter a name for the mirror.
      Note:
      Valid names must be from 1 to 15 alphanumeric characters; lowercase letters are automatically replaced with uppercase equivalents.
    2. Require SSL/TLS — Specify whether or not you want to require SSL/TLS security for all communication within the mirror (as recommended) by selecting or clearing the check box. If you select Require SSL/TLS and the instance does not already have a valid SSL/TLS configuration for mirroring, before completing the procedure you must click the Set up SSL/TLS link and create the needed SSL/TLS configuration on this member. (Instructions for creating the SSL/TLS configuration are contained in Creating and Editing SSL/TLS Configurations for a Mirror in the “Using SSL/TLS with Caché” chapter of the Caché Security Administration Guide. You can also cancel the Create Mirror procedure and navigate to the SSL/TLS Configurations page ([System Administration] > [Security] > [SSL/TLS Configurations]) from the portal home page.) If the instance does have a valid SSL/TLS configuration for mirroring, the link is Edit SSL/TLS instead, and you need not use it when selecting Require SSL/TLS unless you want to modify that configuration.
    3. Use Arbiter — Specify whether or not you want to configure an arbiter (as recommended) to enable automatic failover for the widest possible range of outage scenarios, as described in Automatic Failover Mechanics. If you select Use Arbiter, you must supply the hostname or IP address of the system you want to configure as arbiter and the port used by its ISCAgent process (2188 by default). See Locating the Arbiter to Optimize Mirror Availability and Installing the Arbiter for additional information about the arbiter.
    4. Use Virtual IP — Specify whether or not you want to use a Virtual IP address by selecting or clearing the check box. If you select Use Virtual IP, you are prompted for an IP address, Classless Inter-Domain Routing (CIDR) mask, and network interface.
      Important:
      See Configuring a Mirror Virtual IP (VIP) for requirements and important considerations before configuring a VIP.
    5. Compression Mode for Failover Members, Compression Mode for Async Members — Specify whether to compress journal data before transmission from the primary to the backup and to async members, respectively; see Journal Data Compression for more information. The default setting for both is System Selected, which optimizes for response time between the failover members and for network utilization between the primary and asyncs.
  3. Enter the following information in the Mirror Failover Information Section:
  4. Click Advanced Settings to display and edit additional mirror settings, as follows:
  5. Click Save.
Note:
You can also use the ^MIRROR routine (see Using the ^MIRROR Routine) to create a mirror. When you execute ^MIRROR on a Caché instance without an existing mirror configuration, the Enable Mirror Service option is available if mirroring is not yet enabled. Once mirroring is enabled, the Create a Mirror option is available and provides an alternative means of creating a mirror and configuring the instance as the primary failover member. The SYS.Mirror.CreateNewMirrorSet() mirroring API method can also be used for this purpose.
Configure the Second Failover Member
Follow this procedure to configure the second failover member of the mirror.
  1. Navigate to [Home] > [Configuration] > [Join as Failover] in the management portal on the second failover member. If the Join as Failover option is not available, mirroring has not been enabled; first click Enable Mirror Service, then select the Service Enabled check box and click Save, then select the Join as Failover option.
  2. On the Join Mirror as Failover page, in the Mirror Information section, enter the mirror name you specified when you configured the first failover member.
  3. Enter the following information in the Other Mirror Failover Member’s Info section:
  4. Click Next to retrieve and display information about the mirror and the first failover member. In the Mirror Failover Member Information section:
  5. Click Advanced Settings to display the Quality of Service Timeout setting you specified when you configured the first failover member; this setting cannot be changed on the second failover member.
  6. Click Save.
If you configured the mirror to require SSL/TLS, you are reminded that you must complete the process of adding the second failover member to the mirror by authorizing the second failover member on the first failover member, as described in the following section.
Note:
You can also use the ^MIRROR routine (see Using the ^MIRROR Routine) to configure the second failover member. When you execute ^MIRROR on a Caché instance without an existing mirroring configuration, the Enable Mirror Service option is available if mirroring is not yet enabled. Once mirroring is enabled, the Join Mirror as Failover Member option is available and provides an alternative means of both configuring the backup failover member and adding it to the mirror. The SYS.Mirror.JoinMirrorAsFailoverMember() mirroring API method can also be used for this purpose.
Authorize the Second Failover Member or Async (SSL/TLS Mirrors Only)
If you configured the mirror to require SSL/TLS, an additional step is needed after you configure the second failover member or configure an async member. On the system on which you created the mirror and configured the first failover member, you must authorize the new mirror member, as follows:
  1. Navigate to the [Home] > [Configuration] > [Edit Mirror] page of the Management Portal.
  2. At the bottom of the page, a Pending New Members area lists members that have been added to the mirror. Select the members you want to authorize, click Authorize, and confirm. (The SSL certificate of the second failover member is automatically verified when the member is added.)
  3. The information in the Mirror Member Information section of the Edit Mirror page now includes the members you added. (See Mirror Member Network Addresses for information about the addresses displayed in this list.)
If you have added an async of a Caché version earlier than 2015.2, however, you must instead use the Add New Async Member button on the Edit Mirror page on the primary; see Editing or Removing a Failover Member for more information. You can also use the Add Async Member Not In Pending List option on the Mirror Configuration menu of the ^MIRROR routine to authorize an async member of a Caché version earlier than 2015.2. (If one failover member is Caché 2015.2 or later, the other must be as well, as failover members must be of the same version.)
Note:
The Authorize/Reject Pending New Members option on the Mirror Configuration menu of the ^MIRROR routine on the first failover member can be also used to authorize new failover or async members in a mirror configured to require SSL/TLS.
The SYS.Mirror.AddFailoverMember() mirroring API method can be used to authorize the second failover member in a mirror configured to require SSL/TLS, and the Config.MapMirrors.Create() API method can be used to create an authorized member (failover or backup).
For information about authorizing X.509 DN updates on members of a mirror requiring SSL/TLS (for example when a member’s certificate is replaced), see Authorizing X.509 DN Updates (SSL/TLS Only).
Review Failover Member Status in the Mirror Monitor
As described in Monitoring Mirrors, you can use the Mirror Monitor to see information about the failover members in a mirror, including their current status (role) in the mirror. Use the mirror monitor to confirm that your mirror and its failover members are now set up as intended, as follows:
  1. On the first failover member you configured, display the Mirror Monitor by navigating to the [Home] > [Mirror Monitor] page.
  2. In the Mirror Failover Member Information area, the mirror member names and network address of the two failover members are listed.
  3. The Mirror Member Status area should show the first failover member you configured as Primary in the Status column, and the second as Backup. As discussed in Mirror Member Journal Transfer and Dejournaling Status, the Journal Transfer status of the backup should be Active, and its Dejournaling status should be Caught up.
  4. In the Arbiter Connection Status area, if you configured an arbiter, its network address and agent port number are displayed. Failover Mode should be Arbiter Controlled and Connection Status should be Both failover members are connected to the arbiter; if this is not the case, the arbiter may not have been correctly installed, its ISCAgent process may not be running, or the network address or port number you supplied may be incorrect. A network problem preventing contact with the arbiter by one or both failover members could also cause the Failover Mode to be Agent Controlled.
The same information is displayed in the Mirror Monitor on the backup failover member.
Configure Async Mirror Members
For each async member you want to configure, use the following procedure. A mirror with a failover pair can include up to 14 reporting or disaster recovery (DR) async members. A single Caché instance can be a reporting async member of up to 10 mirrors, but an instance can be a DR async in one mirror only. Once you have configured an instance as either a read-only or a read-write reporting async, it can be added to other mirrors only as a reporting async member of that type. (A reporting async member’s type can be changed for all mirrors to which it belongs, however, as described in Editing the Mirror Configuration on an Async Member.)
Note:
The procedure for adding an instance to a mirror as a reporting async member is the same whether you are using the Join as Async option as described here or the Join a Mirror button on the Edit Async Configurations page as described in Editing the Mirror Configuration on an Async Member, except that the Join a Mirror button on the Edit Async Configurations page can be used only on reporting async members, as a DR async can belong to one mirror only.
  1. Navigate to [Home] > [Configuration] > [Join as Async] in the management portal; if the Join as Async option is not available, choose Enable Mirror Service and enable the service.
  2. On the Join Mirror as Async page, enter the mirror name you specified when you created the mirror at the Mirror Name prompt.
  3. Select either the primary or the backup failover member, and in the Mirror Failover Member’s Info section, enter the information for the member you selected at each of the following prompts:
    1. Agent Address on Failover System — Enter the Superserver Address you specified when you configured the selected failover member.
    2. Mirror Agent Port — Enter the ISCAgent port you specified for the selected failover member.
    3. Caché Instance Name — Enter the name of the Caché instance you configured as the selected failover member.
  4. Click Next to verify the failover member’s information and move to the Async Member Information section. In this section, enter the following information:
    1. Async Member Name — Specify a name for the async member you are configuring on this system (defaults to a combination of the system host name and the Caché instance name). Mirror member names can contain alphanumeric characters, underscores, and hyphens.
      Note:
      The mirror member name cannot be changed, and will therefore be used when a reporting async member joins additional mirrors in the future.
    2. Async Member Address — Enter the IP address or host name that external systems can use to communicate with this async member.
      Note:
      The Async Member Address you provide becomes the async member’s superserver address and mirror private address (see Mirror Member Network Addresses). If you want these to be different, for example when you want to place a DR async’s mirror private address on the mirror private network while leaving its superserver address on the external network, you can update the async’s addresses on the primary after adding it to the mirror; see Updating Mirror Member Network Addresses for more information.
    3. Agent Address — Enter the address that other mirror members attempting to contact this member’s ISCagent will try first; see Mirror Member Network Addresses and Updating Mirror Member Network Addresses.
    4. Async Member System Type — Select one of the following types from the drop-down list. A single Caché instance can be a reporting async member of multiple mirrors, but can be a DR async member of only one mirror.
      • Disaster Recovery (DR) — This option is for a system on which read-only copies of all of the mirrored databases in a single mirror are maintained, making it possible to promote the DR async member to failover member when one of the failover members fails. See Promoting a DR Async Member to Failover Member in this chapter for more information about DR async promotion.
        Important:
        When the mirror is configured to use VIP, a disaster recovery async member must have direct TCP/IP connectivity to the failover members; see Configuring a Mirror Virtual IP (VIP) for more information.
      • Read-Only Reporting — This option is used to maintain read-only copies of the mirrored databases (or a subset of the mirrored databases) from one or more mirrors for purposes of enterprise reporting and data mining in which there is no requirement for data to be modified or added.
      • Read-Write Reporting — This option is used to maintain read-write copies of the mirrored databases (or a subset of the mirrored databases) from one or more mirrors as data sources for reporting/business intelligence operations in which data modification or the addition of data during analysis must be enabled.
    5. Set up SSL/TLS — If the mirror requires SSL/TLS and the instance does not already have a valid SSL/TLS configuration for mirroring, an error message and this link are included. Before completing the procedure, you must click the link and create the needed SSL/TLS configuration on this member. (Instructions for creating the SSL/TLS configuration are contained in Creating and Editing SSL/TLS Configurations for a Mirror in the “Using SSL/TLS with Caché” chapter of the Caché Security Administration Guide. You can also cancel the Join as Async procedure and navigate to[System Administration] > [Security] > [SSL/TLS Configurations] from the portal home page.)
    6. Edit SSL/TLS — If the mirror requires SSL/TLS and the instance does have a valid SSL/TLS configuration for mirroring, this link is displayed instead of Set up SSL/TLS; you can use it to edit the existing SSL/TLS configuration if you wish. The instance’s X.509 Distinguished Name is also displayed.
  5. Click Save.
If you configured the mirror to require SSL/TLS, you are reminded that you must complete the process of adding the async member to the mirror by authorizing the async member on the first failover member, as described in Authorize the Second Failover Member or Async (SSL/TLS Only).
Note:
You can also use the ^MIRROR routine (see Using the ^MIRROR Routine) to configure async mirror members. When you execute ^MIRROR on a Caché instance for which mirroring is enabled, the Join Mirror as Async Member (or Join Another Mirror as Async Member) option on the Mirror Configuration menu is available and provides an alternative means of configuring an async member and adding it to the mirror. The SYS.Mirror.JoinMirrorAsAsyncMember() mirroring API method can also be used to configure an async member.
After an instance has been added to one mirror as an async member using the procedure described in this section, you can use the Join a Mirror button on the Edit Async page (see Editing the Mirror Configuration on an Async Member) to add it to additional mirrors, but as the same type of async only.
Adding Databases to a Mirror
Only a local database on the current primary failover member can be added to a mirror; it is added on the primary first, then on the backup, and then on any desired async members. All mirrored databases must be journaled.
You must add the same set of mirrored databases to both the primary and backup failover members, as well as to any DR async members; which mirrored databases you add to reporting async members depends on your reporting needs. The namespaces and global/routine/package mappings associated with a mirrored database must be the same on all mirror members, including all async members on which the database exists. The mirrored databases on the backup failover member must be mounted and caught up (see Activating and Catching up Mirrored Databases) to be able to take over as the primary in the event of a failover; the mirrored databases on a DR async member must be mounted and caught up to make it suitable for promotion to failover member.
The procedure for creating a mirrored database (that is, adding a new database containing no data) is different from that for adding an existing database to the mirror. Global operations on a database created as a mirrored database are recorded in mirror journal files from the beginning, and the mirror therefore has access to all the data it needs to synchronize the database across mirror members. But global operations on an existing database before it was added to a mirror are contained in non-mirror journal files, to which the mirror does not have access. For this reason, an existing database must be backed up on the primary failover member after it is added to the mirror and restored on the backup failover member and on any async members on which it is to be located. Once this is done, you must activate and catch up the database to bring it up to date with the primary.
Mirrored Database Considerations
Bear the following points in mind when creating and adding mirrored databases:
Create a Mirrored Database
To create a mirrored database, follow this procedure.
Note:
You can also use the ^DATABASE routine to create mirrored databases; see Creating a Mirrored Database Using the ^DATABASE Routine in this chapter.
  1. On the current primary failover member, navigate to the [Home] > [Configuration] > [System Configuration] > [Local Databases] page of the management portal, and click the Create New Database button.
  2. Follow the procedure in the Create a Local Database section of the “Configuring Caché” chapter of the Caché System Administration Guide. On the second panel, select Yes for Mirrored database? and enter a name for the database within the mirror; the default is the local database name you provided. The leading character of the mirror database name must be alphabetic or an underscore, and the rest must be alphanumeric characters, hyphens and underscores. Mirror database names are case-insensitive, thus two names cannot differ only in case; if you enter a mirror database name that is already included in the mirror, the new database cannot be added to the mirror and must be deleted. (Names of mirrored databases created under earlier versions of Caché may be stored in lowercase or mixed case, but the addition of databases with duplicate uppercase names is still precluded.)
    On an async member that belongs to more than one mirror, you must also select the mirror the database will belong to.
    Note:
    When you select Yes for Mirrored database, Journal globals is automatically locked to Yes.
  3. Confirm the procedure to create the database and add it to the mirror on the primary.
  4. On the backup failover member, and on each async member to which you want to add the mirrored database, follow the previous three steps, taking care to enter the correct mirror database name from the primary as the mirror database name on each of the other members. (The local database names do not have to match.)
    Note:
    If you attempt to add a new database to the mirror on a non-primary member that was not created as a mirrored database on the primary, but rather added to the mirror after it was created, an error message notes this and you cannot complete the operation.
Important:
If the first mirror journal file for a mirrored database has been purged from the primary, the database can no longer be created as a mirrored database on another member; instead, you must make a backup on the primary and restore it on the backup or async, as described in Add an Existing Database to the Mirror. For this reason, it is best to create the database on the backup and async members as soon as possible after creating it on the primary. (For information about when mirror journal files are purged on the primary, see Purge Journal Files in the “Journaling” chapter of the Caché Data Integrity Guide.)
Add an Existing Database to the Mirror
Use the procedure that follows to add one or more existing databases to a mirror.
Note:
If a backup failover member or async member has a different endianness than the primary failover member, you cannot use this procedure, but must instead use the procedure that follows it for copying the database’s CACHE.DAT file, inserting a step to convert the endianness of the database; see Member Endianness Considerations for more information.
The SYS.Mirror.AddDatabase() mirroring API method provides an alternative means of adding existing databases to a mirror.
  1. On the current primary failover member, navigate to the [Home] > [Configuration] > [System Configuration] > [Local Databases] page of the management portal, and click the Add to Mirror button.
  2. From the listed databases (non-system databases not already in the mirror) select those you want to add and click Add. You must enter a name for each database within the mirror; the default is the local database name you provided. The leading character of the mirror database name must be alphabetic or an underscore, and the rest must be alphanumeric characters, hyphens and underscores. Mirror database names are case-insensitive, thus two names cannot differ only in case; if you enter a mirror database name that is already included in the mirror, the operation fails. (Names of mirrored databases created under earlier versions of Caché may be stored in lowercase or mixed case, but the addition of databases with duplicate uppercase names is still precluded.)
    To run the task in the background, select Run add in the background; if you select five or more databases, the task is automatically run in the background. Confirm the procedure to add the selected databases to the mirror on the primary.
    You can also add an individual database to the mirror by clicking its name to edit its properties and clicking the Add to Mirror <mirrorname> link, then clicking Add and confirming the procedure. (If journaling is not enabled on the database, Databases must be journaled to be mirrored is displayed in place of this link; to enable it, select Yes from the Global Journal State drop-down list.) Alternatively, the Add Mirrored Database(s) option on the Mirror Management menu of the ^MIRROR routine also lets you add an individual database. In either case, you can accept the default of a mirror database name the same as the local name, or enter a different one.
    Note:
    If a backup failover member or async member has a different endianness than the primary failover member, you must use the procedure described in Member Endianness Considerations to add the database to the backup or async member after adding it to the primary, rather than adding it on that member as described in the following steps.
  3. Once the database has been added to the mirror, back it up on the primary failover member. Review Backup Strategies, Restoring from Backup, and Mirrored Database Considerations in the “Backup and Restore” chapter of the Caché Data Integrity Guide for information about different backup techniques and the corresponding restore procedures.
    Important:
    If the database you are copying is encrypted on the primary, the key with which it is encrypted must also be activated on the backup (and asyncs, if any), or the database must be converted to use a key that is activated on the destination system using the cvencrypt utility (as described in the Converting an Encrypted Database to Use a New Key section of the “Using the cvencrypt Utility” chapter of the Caché Security Administration Guide).
    Ensuring that a mirrored database is synchronized after it is restored from backup (see the following step) requires that the journal files from the time of the backup on the primary failover member are available and online; for example, if the relevant journal files have been purged, you must make and restore a more up to date backup. For general information about restoring mirror journal files see Restoring Mirror Journal Files in the “Journaling” chapter of the Caché Data Integrity Guide; for information about purging mirror journal files see Purge Journal Files in the “Journaling” chapter of the Caché Data Integrity Guide.
  4. On the backup failover member and each connected async member, do the following:
    1. If a local database with the same local name and database directory as the mirrored database you just added on the primary failover member does not already exist, create it.
    2. Restore the backup you made of the mirrored database on the primary, overwriting the existing database. The procedure for this depends on the restore method you are using, as follows:
      • Caché online backup restore (^DBREST routine) — This routine automatically recognizes, activates and catches up a mirrored database on the backup and async members. For more information see Mirrored Database Considerations in the “Backup and Restore” chapter of the Caché Data Integrity Guide.
        Note:
        When a mirrored database is restored on a non-primary member, the data to begin the automatic synchronization process may not have been sent yet. If the required data does not arrive within 60 seconds, the process begins anyway; those databases may not catch up if the data does not arrive before it is required, however, in which case a message regarding the database(s) that had the problem is logged in the cconsole.log file. (During database creation this process would affect only one database, but it also applies to catching up in other situations in which multiple databases are involved.)
      • External backup restore or cold (offline) backup restore — Both of these methods require that you manually activate and catch up the mirrored databases after they are restored and mounted on the backup failover member or async member, as described in Activating and Catching up Mirrored Databases, immediately following.
As an alternative to the previous procedure, after adding an existing database to the mirror on the primary, you can copy the databases’s CACHE.DAT file from the primary to the backup and async members instead of backing up and restoring the database. To do so, use this procedure:
  1. Ensure that there is a placeholder target database on the backup and each async member.
  2. On both failover members and each aysnc member, make sure the source and target databases are not mounted (see Maintaining Local Databases in the “Managing Caché” chapter of the Caché System Administration Guide).
  3. Copy the mirrored CACHE.DAT file from the primary failover member to the database directory of the placeholder target database on the backup and each async member, overwriting the existing CACHE.DAT file.
    Note:
    If the database you are copying is encrypted on the primary, the key with which it is encrypted must also be activated on the backup (and asyncs, if any), or the database must be converted to use a key that is activated on the destination system using the cvencrypt utility (as described in the Converting an Encrypted Database to Use a New Key section of the “Using the cvencrypt Utility” chapter of the Caché Security Administration Guide).
  4. Mount the database on the all members.
  5. Activate and catch up the mirrored databases on the backup failover member and async member(s) as described in Activating and Catching up Mirrored Databases in this chapter.
Note:
When you are adding an existing mirrored database to an async member, you can back up the database on (or copy the CACHE.DAT file from) the backup failover member or another async member, assuming it is fully caught up, instead of the primary. This may be more convenient, for example if the primary is in a different data center than the async on which you will be restoring the backup. Do not use a member as the source, however, unless you have a high degree of confidence in the consistency of its databases.
Activating and Catching Up Mirrored Databases
You can activate and/or catch up mirrored databases on the backup failover member and async members using the Mirror Monitor.
As noted in Add an Existing Database to a Mirror, a newly added mirrored database containing data can be automatically synchronized with the primary through use of the ^DBREST routine to restore the backup from the primary failover member. If some other method is used, it must be activated and caught up on the backup failover member and async members.
To activate and catch up mirrored databases, do the following on the backup failover member and async members:
  1. Navigate to the [Home] > [Mirror Monitor] page.
  2. On an async member, click the Details link for the mirror containing the database(s) you want to take action on, if necessary.
  3. The Mirrored databases list shows the status of each database, as described in Using the Mirror Monitor. Among other possible statuses, Needs Catchup indicates that the Catchup operation is needed, Needs Activation indicates that both the Activate and Catchup operations are needed, and Catchup Running shows that the Catchup operation is currently running on the database.
  4. Select the Activate or Catchup link to perform an operation on a single database, or select Activate or Catchup from the Select an action drop-down and click Go to open a dialog in which you can select multiple databases from a list of all those for which the action is appropriate to apply it to all of them at once. When you do this, the Activate and Catchup tasks are run in the background. When you select Catchup, databases of both Needs Activation and Needs Catchup status are displayed; both Activate and Catchup are applied to any Needs Activation databases you select.
You can also use the Mirrored databases list to mount or dismount one or more mirrored databases, or to remove one or more databases from the mirror as described in Removing Mirrored Databases from a Mirror.
Note:
The Activate or Catchup mirrored database(s) option on the Mirror Management menu in the ^MIRROR routine and the SYS.Mirror.ActivateMirroredDatabase() and SYS.Mirror.CatchupDB() mirroring API methods provide alternative means of activating/catching up mirrored databases.
When you use the Mirrored databases list, the Databases page of the management portal (see the Managing Caché chapter of the Caché System Administration Guide), or the ^DATABASE routine (see the Using Character-based Security Management Routines chapter of the Caché Security Administration Guide) to mount a mirrored database, you can choose whether or not to catch up the database following the mount operation.
Editing or Removing Mirror Members
The following procedures describe how to edit or remove the mirror configuration on a mirror member, including deleting a mirror altogether, and how to remove databases from a mirror when you are not removing a mirror configuration.
Note:
Several options on the Mirror Configuration menu of the ^MIRROR routine provide alternative means for editing mirror configurations. The specific options available depend on whether the routine is used on a failover member or async member.
Clearing the FailoverDB Flag on Reporting Async Mirror Members
As described in Async Mirror Members, an async member must be of one of three types:
When a mirrored database is added to a DR or read-only reporting async, it is mounted as Read-Only, and the FailoverDB flag, which is set when the database is created on the primary, remains set on the async’s copy to
When a mirrored database is added to a read-write reporting async, on the other hand, the FailoverDB flag is cleared to allow Read-Write mounting of the database. A mirrored database with the FailoverDB flag cleared can never be used as the mirror’s primary copy.
On a DR async, the FailoverDB flag can never be cleared. The flag can be manually cleared on reporting asyncs, however.
On a read-only reporting async, clearing the FailoverDB flag changes the database to read-write, which is typically not desirable. In most cases, therefore, including when you change the async type from Disaster Recovery (DR) to Read-Only Reporting (see Editing or Removing an Async Member), you can leave the FailoverDB flag set on all databases on a read-only reporting async.
When you change an async member’s type from Disaster Recovery (DR) or Read-Only Reporting to Read-Write Reporting, you are offered the option of clearing all the FailoverDB flags. Because the FailoverDB flag on a mirrored database requires it to remain read-only, you will typically want to use this option. If you want to keep one or more mirrored databases read-only on the read-write reporting async, however, you can use the individual Clear Flag links in the Mirrored Databases list to make individual databases read-write and leave the rest as read-only.
Databases added to an async member after you change its type are mounted and flagged according to the member’s new type, as previously described. The Clear FailoverDB Flags button always allows you to clear the flag from all databases at any time on either type of reporting async.
You cannot manually set the FailoverDB flag; this flag is set only when a mirrored database is added to a DR or read-only reporting async.
Removing the Mirrored Database Attribute When Removing a Mirror Member
When you remove a member from a mirror, you are always given the option of removing the mirror attribute from the mirrored databases belonging to the mirror. The consequences are as follows:
When you remove an individual database from the mirror on a backup or async member, the mirrored database attribute is automatically removed.
Editing or Removing an Async Member
  1. Navigate to the [Home] > [Configuration] > [Edit Async] page of the management portal.
  2. Use the Remove Mirror Configuration button to remove a DR async from its mirror or a reporting async from all mirrors to which it belongs and remove the instance’s mirror configuration entirely. (To remove a reporting async from a single mirror, use the Leave mirror link described later in this procedure.)
    You are given the option of removing the mirror attribute from the mirrored databases on the member; see Removing the Mirrored Database Attribute When Removing a Mirror Member for information about this decision.
    You are also given the option of removing the instance’s SSL/TLS configuration (see Securing Mirror Communication with SSL/TLS Security).
    After using the Remove Mirror Configuration button to remove the instance’s mirror configuration entirely, you must restart Caché.
    Note:
    The Remove Mirror Configuration option on the Mirror Configuration menu of the ^MIRROR routine (see Using the ^MIRROR Routine) provides an alternative method for removing an async member’s mirror configuration entirely.
  3. Use the Join a Mirror button to add a reporting async member to another mirror (it can belong to a maximum of 10); the procedure is the same as that described in Configure Async Mirror Members for adding an async member to its first mirror, except that the member name and async type (read-only or read-write) cannot be changed. This button is not available on a DR async member; to join another mirror, you must first change the Async Member System Type as described in a later step.
  4. As described in Clearing the FailoverDB Flag on Reporting Async Mirror Members, you can use the Clear FailoverDB Flags button to clear the FailoverDB flag on all mirrored databases on a read-only reporting async, or after you change the async system type from Disaster Recovery (DR) to Read-Write or Read-Only Reporting.
  5. The following settings in the Mirror Member Information section can be modified for the async member you are editing except the mirror member name. After you have changed one or more, click Save.
  6. The Mirrors this async member belongs to list shows you all the mirrors the instance belongs to as an async member. Each entry provides three links for changes.
  7. The Mirrored Databases list shows you all mirrored databases on the async member. If the instance is a DR async member, these should include all mirrored databases on the mirror’s failover members, and the FailoverDB flag should be set on each.
  8. In a mirror that uses SSL/TLS, select Authorize Pending DN Updates (if it appears) to authorize pending DN updates from the primary so that the async can continue to communicate with the primary. See Authorizing X.509 DN Updates (SSL/TLS Only) for information about authorizing DN updates.
Editing or Removing a Failover Member
  1. Navigate to the [Home] > [Configuration] > [Edit Mirror] page of the management portal.
  2. Use the Remove Mirror Configuration button on the backup failover member to remove it from the mirror and remove the Caché instance’s mirror configuration entirely.
    Important:
    To remove a mirror entirely, begin by removing all async members from the mirror, then remove the backup failover member, and finally remove the primary failover member.
    When removing a failover member from the mirror, you are given the option of removing the mirror attribute from the mirrored databases on the member; see Removing the Mirrored Database Attribute When Removing a Mirror Member for information about this decision. This is especially significant when you are removing the primary failover member, thereby permanently deleting the mirror.
    On the backup, you are also given the option of removing the instance’s SSL/TLS configuration (see Securing Mirror Communication with SSL/TLS Security).
    You can also use the Remove Other Mirror Member button on the primary to remove the backup or an async from the mirror. You can use the Remove Other Mirror Member button on the backup to remove an async from the mirror.
    After using the Remove Mirror Configuration button or the Remove Other Mirror Member button to remove an async or backup member’s mirror configuration entirely, you must restart Caché.
    Note:
    The Remove Mirror Configuration option on the Mirror Configuration menu of the ^MIRROR routine (see Using the ^MIRROR Routine) provides an alternative method for removing n failover member’s mirror configuration entirely.
    You can temporarily stop mirroring on the backup failover member; see Stopping Mirroring on Backup and Async Members.
  3. To remove the primary failover member from the mirror and remove the mirror entirely (which you can do only if the primary is the last member remaining in the mirror), use this procedure:
    1. Use the Remove Mirror Configuration button on the Edit Mirror page; a dialog displays that lets you clear the JoinMirror flag from the instance.
    2. After clearing the flag, restart the instance.
    3. Navigate to the [Home] > [Configuration] > [Edit Mirror] page and use the Remove Mirror Configuration button again.
  4. When you add an async member of a Caché version earlier than 2015.2 to a mirror requiring SSL/TLS (see Configure Async Mirror Members), use the Add New Async Member button to authorize the X.509 DN of the new async (see Authorize the Second Failover Member or Async (SSL/TLS only)). The Add Async Member to Mirror wizard displays; enter the superserver address and port of the new async, click Next, and authorize the async member. You can also use the button to update the DN of a pre-2015.2 async on a 2015.2 primary (see Authorizing X.509 DN Updates (SSL/TLS Only))
  5. In the Mirror Information section, you cannot edit the Mirror Name; the remaining settings can be modified on the primary failover member only.
  6. The Mirror Member Information section lists the member name and type, instance directory, and network addresses of each mirror member. On the primary, click a member name to update that member’s network information (except for the member’s Superserver port, which must be updated locally; see Updating Mirror Member Network Addresses).
    If the backup is currently connected to the mirror, you cannot change any network information except the backup’s Superserver port; if the backup is disconnected and network information for the primary has changed, you can update the primary’s information here so that the backup can reconnect when desired. See Updating Mirror Member Network Addresses for important information about updating the network addresses of mirror members.
  7. On the primary in a mirror that uses SSL/TLS, select the Authorize/Reject Pending New Members link (if it appears) to authorize new members so they can join the mirror, or the Authorize/Reject Pending DN Updates link (if it appears) to authorize DN updates on other members so that mirror communication can continue. On the backup, select Authorize Pending DN Updates (if it appears) to authorize pending DN updates from the primary so that the backup can continue to communicate with the primary. See Authorizing X.509 DN Updates (SSL/TLS Only) for information about authorizing DN updates.
  8. Click Save.
Remove Mirrored Databases from a Mirror
You can convert a database from mirrored to unmirrored, local use by removing it from the mirror, which you do through the Mirror Monitor (see Monitoring Mirrors for more information about the Mirror Monitor).
Note:
Alternatively, you can remove mirrored databases from a mirror by selecting the Remove mirrored database(s) option from the Mirror Management main menu list of the ^MIRROR routine (see Using the ^MIRROR Routine).
When you remove a database from a mirror on an async, the failover members are unaffected; the database remains a part of the functioning mirror. Once you have removed it from a failover member, however, it must be removed from the other failover member and any async members on which it is mirrored. To entirely remove a database from a mirror, start by removing it from the primary failover member, then the backup failover member, then any async members.
Important:
Removing a database from a mirror on the primary is a permanent action. Once a mirrored database is removed on the primary, returning it to the mirror later will require the procedures used for adding an existing database to the mirror for the first time.
To remove a database from a mirror, do the following on either failover system:
  1. Navigate to the [Home] > [Mirror Monitor] page on the primary failover member.
  2. In the Mirrored databases list, click Remove in the row of the database you wish to remove from the mirror.
    If you want to remove more than one database at a time, select Remove from the Select an action drop-down and click Go to open a dialog in which you can select multiple mirrored databases and remove all of them at once.
Activating Journal Encryption in a Mirror
As described in the Managed Key Encryption chapter of the Caché Security Administration Guide, you can activate encryption of journal files for any Caché instance. When doing so, bear in mind three important considerations:
As an example, suppose a mirror consists of failover members A (current primary) and B (current backup), DR async D, and reporting asyncs R1 and R2. If you wanted to activate journal encryption within the mirror, you would follow this procedure:
  1. If the mirror does not currently require SSL/TLS security (see Securing Mirror Communication with SSL/TLS Security), configure it to do so using the procedure described in Editing or Removing a Failover Member.
  2. Select the encryption key or keys that will be used to encrypt journal files on A, B, and D. These can all be different if desired.
  3. Load and activate the selected key or keys on A, B, and D. Optionally, you can activate the keys on R1 and R2 as well, although this is not necessary unless they may be changed to DR asyncs.
  4. Activate journal encryption on A, B, D, R1, and R2 in that order.
  5. When adding a DR asyncs to the mirror after journal encryption is activated, ensure that the journal encryption key or keys in use on A, B and D are activated on the new DR async before it is added.
Note:
While database encryption on a mirror member requires preparation as on any system, there are no specific mirroring-related requirements for database encryption. InterSystems does recommend, however, that you encrypt a mirrored database on all mirror members for the greatest possible security. For this reason, when you add a mirrored database that is encrypted on the primary to another member without encrypting it, a warning is sent to the console log.
Configuring ECP Connections to a Mirror
If you configure one failover member of a mirror as an ECP data server, you must also configure the other failover member and any DR async members as ECP data servers with the same Maximum number of application servers setting. An ECP application server connection to such an ECP data server mirror must be explicitly configured as a mirror connection. Use the following procedure to configure a mirror as an ECP data server and configure ECP application server connections to the mirror.
  1. On each failover member and any DR async members, navigate to the [Home] > [Configuration] > [ECP Settings] page of the management portal, and configure the system as an ECP data server; for information see Configuring an ECP Data Server in the “Configuring Distributed Systems” chapter of the Caché Distributed Data Management Guide.
  2. On each ECP application server, navigate to the [Home] > [Configuration] > [ECP Settings] page, click Add Remote Data Server, and enter the following information:
    1. Server Name — Enter the instance name of the primary failover member.
    2. Host DNS Name or IP Address — Enter the IP address or host DNS name of the primary failover member. (Do not enter the mirror VIP; see Configuring a Mirror Virtual IP (VIP) for more information.)
    3. IP Port — Enter the superserver port number of the primary failover member whose IP address or host DNS name you specified in the Host DNS Name or IP Address text box.
    4. Mirror Connection — Select this check box.
  3. Click Save.
  4. Navigate to [Home] > [Configuration] > [Remote Databases] and select Create New Remote Database to launch the Database Wizard.
  5. In the Database Wizard, select the ECP data server from the Remote server drop-down list, then click Next.
  6. Select the database you want to access over this ECP channel from the list of remote databases.
    You can select both non-mirrored databases (databases listed as :ds:DB_name) and mirrored databases (databases listed as :mirror:mirror_name:mirror_DB_name).
    Note:
    A mirrored database path in the format :mirror:mirror_name:mirror_DB_name: can also be used in an implied namespace extended global references (see Extended Global References in the “Global Structure” chapter of Using Caché Globals).
Important:
A failover mirror member does not accept ECP connections that are not configured as mirror connections; an ECP data server that is not a mirror member does not accept ECP connections that are configured as mirror connections. This means that if you add an existing ECP data server to a mirror, or remove one from a mirror, the data server must be removed as a remote data server on all ECP application servers and added again using the preceding procedure, with the Mirror Connection check box selected or cleared as appropriate.
An ECP application server that is connected to a mirror must be running Caché 2010.2 or later.
After configuring ECP application servers to connect to a mirror, perform failover tests by gracefully shutting down the current primary to ensure that the application servers connect to the mirror regardless of which failover member is primary.
Configuring a Mirror Virtual IP (VIP)
As described in Planning a Mirror Virtual IP (VIP), you can configure a mirror virtual address that allows external applications to interact with the mirror using a single address, ensuring continuous access on failover.
After configuring Caché for the mirror VIP and then configuring the mirror VIP, perform failover tests by gracefully shutting down the current primary (as described in Planned Outage Procedures) to ensure that applications can continue to connect to the mirror regardless of which failover member is primary.
Important:
If one or more of a mirror’s members is a non-root Caché instance on a UNIX® or Linux system, as described in Caché Non-root Installation in the chapter “Installing Caché on UNIX® and Linux” in the Caché Installation Guide, a mirror VIP cannot be used.
If one or more of a mirror’s members is a Caché instance running on an Oracle Solaris non-global zone with ip-type=shared, a mirror VIP cannot be used.
Note:
See Promoting a DR Async Member to Failover Member for important information about promoting a DR async to primary when a VIP is in use.
Configuring Caché for a Mirror VIP
To ensure that the management portal and Caché Studio can seamlessly access the mirror regardless of which failover member is currently the primary, it is recommended that the failover members be configured to use the same superserver and web server port numbers.
ECP application servers do not use a mirror’s VIP. When adding a mirror as an ECP data server (see Configuring ECP Connections to a Mirror), do not enter the virtual IP address (VIP) of the mirror, but rather the DNS name or IP address of the current primary failover member. Because the application server regularly collects updated information about the mirror from the specified host, it automatically detects a failover and switches to the new primary failover member. For this reason, both failover members and any DR async members must be configured as ECP data servers with the same Maximum number of application servers setting. See Configuring ECP Connections to a Mirror for further ECP considerations.
When configuring one or both failover members as license servers, as described in the Managing Caché Licensing chapter of the Caché System Administration Guide, specify the actual hostname or IP address of the system you are configuring as the Hostname/IP Address; do not enter the VIP address.
Configuring a Mirror VIP
To configure a mirror VIP, you must enter the following information:
Configuring the ISCAgent
The ISCAgent runs securely on a dedicated, configurable port (2188 by default) on each mirror member. When the agent receives an incoming network connection which directs it to a mirrored instance, it executes cuxagent in that instance to escalate to the privileges necessary to administer the mirror member. If the mirror is configured to require SSL/TLS, the incoming connection is authenticated before any actions are performed.
When multiple Caché instances belonging to one or more mirrors are hosted on a single system, they share a single ISCAgent.
This section provides information on managing the ISCAgent in the following ways:
Starting and Stopping the ISCAgent
The ISCAgent, which is installed when you install or upgrade Caché, runs as user iscagent and as a member of the iscagent group by default. To acquire the group privilege, which is necessary to execute the cuxagent utility that provides it with access to a Caché instance (as described in ISCAgent), the ISCAgent must be started automatically during system startup or by a user with root privileges. Once it has assigned itself the needed user and group privileges, the ISCAgent discards all root privileges.
The ISCAgent must be configured to start automatically when the system starts on each failover and DR async mirror member. InterSystems provides platform-specific control scripts that can be added to the initialization process by a system administrator, as described in the following sections. (Consult your operating system documentation for detailed system startup configuration procedures.)
Starting the ISCAgent on UNIX® and Mac OS X Systems
On UNIX® and Mac OS X platforms, run the ISCAgent start/stop script, which is installed in the following locations, depending on the operating system:
For example, to start the ISCAgent on the IBM AIX® platform, run the following command as root: /etc/rc.d/init.d/ISCAgent start; to stop it, run the command /etc/rc.d/init.d/ISCAgent stop.
Additional ISCAgent considerations on UNIX®/Linux platforms include the following:
Starting the ISCAgent on Linux Systems
On Linux systems supporting systemd (such as SUSE Linux Enterprise Server 12, SP1 or later), the /etc/systemd/system/ISCAgent.service file is installed, providing support for management of the ISCAgent using systemd. On any such system, the following commands can be used to start, stop and display the status of the ISCAgent:
systemctl start ISCAgent.service
systemctl stop ISCAgent.service
systemctl status ISCAgent.service
To control whether the ISCAgent starts on system boot on a system that supports systemd, use the following commands:
sudo systemctl enable ISCAgent.service
sudo systemctl disable ISCAgent.service
By default, systemd services are disabled. You can use systemctl to start and stop the service on demand, even when it is disabled.
The ISCAgent.service file does not read the location of the Caché registry and shared support files from the CACHESYS environment variable (see Caché Installation Directory in the preface of the Caché Installation Guide), but instead is installed with /usr/local/etc/cachesys as the location. You can edit ISCAgent.service to specify a different registry directory if required.
On all Linux systems, the ISCAgent start/stop script described in Starting the ISCAgent on UNIX® and Mac OS X Systems is installed in /etc/init.d/ISCAgent. If systemd is not supported, use the commands described in that section to start and stop the ISCAgent.
The remainder of the information provided in Starting the ISCAgent on UNIX® and Mac OS X Systems also applies to Linux systems supporting systemd.
Important:
Although it is possible to use either the systemctl commands or the /etc/init.d/ISCAgent script on a Linux system that supports systemd, you must choose one method and use it exclusively, without switching back and forth. The ISCAgent should always be stopped using the method with which it was started.
When you upgrade Caché on such a Linux system, a running ISCAgent is automatically restarted using systemd. If you are using the /etc/init.d/ISCAgent script to manage the ISCAgent, stop the agent before performing the upgrade so that it is not automatically restarted, then restart it using the script after the upgrade.
When changing from using the /etc/init.d/ISCAgent script to using systemctl commands, before starting the agent with systemctl for the first time, do the following as root:
  1. Run the command the following command:
    systemctl status ISCAgent
  2. If the output from the command contains this warning:
    Warning: Unit file changed on disk, 'systemctl daemon-reload' recommended.
    run the following command:
    systemctl daemon-reload
  3. When the previous command has completed, run systemctl status ISCAgent again to confirm that the warning does not appear.
Starting the ISCAgent for Non-root Instances on UNIX®/Linux and Mac OS X Systems
Although Caché is typically installed as root, on UNIX®/Linux and Mac OS X Systems it is possible for an instance to be installed and run by another user. Non-root installation is described in Caché Non-root Installation in the “Installing Caché on UNIX® and Linux” chapter of the Caché Installation Guide.
The ISCAgent for a non-root instance is started by the installing user running the ISCAgentUser script, located in the directed defined by the CACHESYS environment variable, in the background, for example:
nohup <CACHESYS_directory>/ISCAgentUser &
While it may not be possible to configure the ISCAgent to start automatically when the system starts, this remains the first choice if it can be achieved. When the mirror includes two failover members, the best practice is to start the agent as soon as possible after the system boots, even if you don't intend to start Caché; this aids in recovery in certain situations, such as that described in Unplanned Outage of Both Failover Members.
Starting the ISCAgent on Microsoft WindowsSystems
On Microsoft Windows systems, start the ISCAgent process as follows:
  1. In the Microsoft Windows Control Panel, select Services from the Administrative Tools drop-down list, and double-click ISCAgent to display the ISCAgent Properties window.
  2. On the Extended tab, click Start to start, or Stop to stop ISCAgent.
  3. On the Extended tab, select Automatic from the Startup type drop-down list.
Starting the ISCAgent on OpenVMS Systems
On OpenVMS systems, the RunAgent and StopAgent scripts are located in the instance [.BIN] subdirectory. To start the ISCAgent process, run the @RUNAGENT command from the [.BIN] subdirectory.
Customizing the ISCAgent Port Number
As described in the ISCAgent section of this chapter, the default ISCAgent port is 2188. While this is typically all that is needed, you can change the port number if required, as described in the following subsections:
Customizing the ISCAgent Port Number on UNIX®/Linux Systems
The ISCAgent process, by default, starts on port 2188. To customize the port on a UNIX®/Linux system, do the following:
  1. Create the file /etc/iscagent/iscagent.conf, or edit it if it already exists.
  2. Add the following line, replacing <port> with the desired port number:
    application_server.port=<port>
Customizing the ISCAgent Port Number on Microsoft Windows Systems
The ISCAgent process, by default, starts on port 2188. To customize the port on a Windows system, do the following:
  1. Create the file <windir>\system32\iscagent.conf, or edit it if it already exists.
  2. Add the following line, replacing <port> with the desired port number:
    application_server.port=<port>
Customizing the ISCAgent Port Number on OpenVMS Systems
The ISCAgent process, by default, starts on port 2188. To customize the port on an OpenVMS system, do the following:
  1. Create the file iscagent.conf in the instance [.BIN] subdirectory, or edit it if it already exists.
  2. Add the following line, replacing <port> with the desired port number:
    application_server.port=<port>
Customizing the ISCAgent Interface
The ISCAgent binds to the default (or configured) port on all available interfaces. While this is typically all that is needed, you can change the ISCAgent to bind to the interface serving a specific address if required. The procedure is described in the following subsections:
Customizing the ISCAgent Interface on UNIX®/Linux Systems
The ISCAgent process binds to the default (or configured) port on all available interfaces. To customize the ISCAgent to bind to the interface serving a specific address on a UNIX®/Linux system, do the following:
  1. Create the file /etc/iscagent/iscagent.conf, or edit it if it already exists.
  2. Add the following line, replacing <ip_address> with the address served by the desired interface:
    application_server.interface_address=<ip_address>
    To explicitly bind to all available interfaces (i.e., the default), specify * as follows: application_server.interface_address=*.
Customizing the ISCAgent Interface on Microsoft Windows Systems
The ISCAgent process binds to the default (or configured) port on all available interfaces. To customize the ISCAgent to bind to the interface serving a specific address on a Windows system, do the following:
  1. Create the file named <windir>\system32\iscagent.conf, or edit it if it already exists
  2. Add the following line, replacing <ip_address> with the address served by the desired interface:
    application_server.interface_address=<ip_address>
    To explicitly bind to all available interfaces (i.e., the default), specify * as follows: application_server.interface_address=*.
Customizing the ISCAgent Interface on OpenVMS Systems
The ISCAgent process binds to the default (or configured) port on all available interfaces. To customize the ISCAgent to bind to the interface serving a specific address on an OpenVMS system, do the following:
  1. Create the file named iscagent.conf in the instance [.BIN] subdirectory, or edit it if it already exists.
  2. Add the following line, replacing <ip_address> with the address served by the desired interface:
    application_server.interface_address=<ip_address>
    To explicitly bind to all available interfaces (i.e., the default), specify * as follows: application_server.interface_address=*.
Configuring the Quality of Service (QoS) Timeout Setting
The Quality of Service Timeout (QoS timeout) setting plays an important role in governing failover member and arbiter behavior by defining the range of time, in milliseconds, that a mirror member waits for a response from another mirror member before taking action. The QoS timeout itself represents the maximum waiting time, while the minimum is one half of that. A larger QoS timeout allows the mirror to tolerate a longer period of unresponsiveness from the network or a host without treating it as an outage; decreasing the QoS allows the mirror to respond to outages more quickly. Specifically, the QoS timeout affects the following situations:
See Automatic Failover Mechanics for complete and detailed information about the role the QoS timeout plays in the behavior of the failover members and the arbiter.
The default QoS is 8 seconds (8000 ms) to allow for several seconds of intermittent unresponsiveness that may occur on some hardware configurations. Typically, deployments on physical (non-virtualized) hosts with a dedicated local network can reduce this setting if a faster response to outages is required. A mirror that is upgraded from a Caché version earlier than 2015.2 may retain the previous default of 2000 ms.
The Quality of Service Timeout setting can be adjusted on the Create Mirror page or the primary failover member’s Edit Mirror page.
Note:
The QoS timeout can also be adjusted using the Adjust Quality of Service Timeout parameter option on the Mirror Configuration menu of the ^MIRROR routine (see Using the ^MIRROR Routine).
Using the ^ZMIRROR Routine
The user-defined ^ZMIRROR routine allows you to implement your own custom, configuration-specific logic and mechanisms for specific mirroring events, such as a failover member becoming primary.
The ^ZMIRROR routine contains the following entry points. All provide appropriate defaults if they are omitted.
Converting a Shadow to a Mirror
Mirroring provides a utility that allows you to convert a shadow source and destination and the shadowed databases mapped between them to a mirror with primary failover member, backup failover or async member, and mirrored databases. If the shadow has multiple destinations, they can all be converted to mirror members. To convert a shadowing setup to a mirror, follow this procedure:
  1. Create a mirror, making the shadow source Caché instance the primary failover member, as described in Create a mirror and configure the first failover member.
  2. Add the destination shadow Caché instance to the mirror as backup failover member, as described in Configure the second failover member, or async member, as described in Configure async mirror members.
  3. Add the shadowed databases to the mirror on the primary failover member (shadow source), as described in Add an existing database to the mirror.
  4. Back up the mirrored databases on the primary, stop shadowing, and restore the mirrored databases on the backup/async; perform the backup and restore as described in Add an existing database to the mirror.
If preferred, you can replace the final step in the preceding procedure with the following steps:
  1. Allow the backup or async (shadow destination) to catch up so that its shadow checkpoint is at least to the point of the first mirror journal file and the shadow is not tracking any open transactions started in a non-mirror journal file.
  2. Stop shadowing.
  3. On the backup or async (shadow destination), open Caché Terminal, enter zn "%SYS" to switch to the %SYS namespace, and enter ConvertShadowDatabases^MIRROR. The utility does the following
    1. Prompts you for the name of the mirror and the name of the shadow (for both prompts, no entry is required when the instance belongs to just one).
    2. Prompts you for the list of databases to convert and their mirror database names.
    3. Converts the specified databases to mirrored database in the specified mirror.
    4. Prompts you to activate and catch up the converted databases (see Activating and Catching up Mirrored Databases).
    5. Prompts you to remove the converted databases from the shadow.
Managing Mirroring
This section covers topics related to managing and maintaining operational Caché mirrors.
Monitoring Mirrors
You can monitor the operation of an existing mirror using one of two methods:
Both methods display information about the operating status of a mirror and its members and the incoming journal transfer rate, as well as about mirrored database status. In addition, the Mirror Monitor lets you perform several operations on the mirrored databases.
Monitoring Mirroring Communication Processes describes the mirror communication processes that run on mirror members.
Note:
Basic mirror member information, including a link to the Mirror Monitor, also appears in the management portal home page message pane (see Management Portal Message Pane in the “Using the Management Portal” chapter of the Caché System Administration Guide.
Many database and mirror-related actions, such as mounting or dismounting a database and adding a database to or removing it from a mirror, are logged in the console log (see Monitoring Log Files in the “Monitoring Caché Using the Management Portal” chapter of the Caché Monitoring Guide).
Using the Mirror Monitor
To display the Mirror Monitor, navigate to the [Home] > [Mirror Monitor] page on any mirror member.
On a failover member, the Mirror Monitor contains the following buttons and sections:
On an async member, the Mirror Monitor contains the following buttons and sections:
Mirror Member Journal Transfer and Dejournaling Status
When a Caché instance belongs to a mirror, its member type and status, journal transfer status, and dejournaling status are displayed by the Mirror Monitor and the ^MIRROR routine Status Monitor option, as described in Monitoring Mirrors.
The following table describes the possible types and statuses displayed:
Type
Status
Description
Failover
Primary
Current primary.
Backup
Connected to primary as backup.
In Trouble
As primary, in a trouble state due to lost connection with the backup; see Automatic Failover Mechanics for complete information about the varying circumstances under which the primary can enter a temporary or indefinite trouble state.
Disaster Recovery
Connected
Connected to primary as async.
Read-Only Reporting
Read-Write Reporting
(any of the above)
Transition
In a transitional state that will soon change when initialization or another operation completes; this status prompts processes querying a member’s status to query again shortly.
When there is no operating primary, a failover member can report this status for an extended period while it retrieves and applies journal files in the process of becoming Primary.
Synchronizing
Starting up or reconnecting after being stopped or disconnected, retrieving and applying journal files in order to synchronize the database and journal state before becoming Backup or Connected.
Waiting
Unable to complete an action, such as becoming primary or connecting to primary; will retry indefinitely, but user intervention may be required. See console log for details.
Stopped
Mirroring on member stopped indefinitely by user; see console log for details.
Crashed
Mirror no longer running due to unexpected condition; see console log for details.
Down
Displayed on other members for a member that is down or inaccessible.
Indeterminate
Not Initialized
The mirror configuration is not yet loaded or Caché is down.
Note:
The Type and Status fields contain different values for mirror members running versions of Caché prior to 2013.1.
Mirror member type and status can also be obtained using the %SYSTEM.Mirror.GetMemberType() and %SYSTEM.Mirror.GetMemberStatus() methods.
For backup and async mirror members, Journal Transfer indicates whether a mirror member has the latest journal data from the primary and, if not, how far behind journal transfer is, while Dejournaling indicates whether all of the journal data received from the primary has been dejournaled (applied to the member’s mirrored databases) and, if not, how, how far behind dejournaling is. The following tables describe the possible statuses for these fields displayed by the Mirror Monitor and ^MIRROR. (These fields are always N/A for the primary.)
Journal Transfer
Description
Active (backup only)
The backup has received the latest journal data from and is synchronized with the primary. (See in Backup Status and Automatic Failover for more information about Active backup status.) Note that the backup can be Active even if its Dejournaling status is not Caught up; as long as the backup has all the needed journal files, they can be dejournaled even after it has lost its connection to the primary.
Caught up
On the backup, indicates that the backup has received the latest journal data from the primary, but is not fully synchronized in that the primary is not waiting for it to acknowledge receipt of journal data. This status is often transient, as when the backup reconnects to the mirror.
On an async, indicates that the async has received the latest journal data from and is synchronized with the primary.
time behind
The member is a specific amount of time behind the primary, with time representing the amount of time elapsed between the timestamp of the last journal block the member received and the current time.
 
 
Disconnected on time
The member was disconnected from the primary at the specified time.
Dejournaling
Description
Caught up
All journal data received from the primary has been dejournaled (applied to the member’s mirrored databases).
time behind
Some journal data received from the primary has not yet been dejournaled, with time representing the amount of time elapsed between the timestamp of the last dejournaled journal block and the last journal block received from the primary.
 
 
Disconnected on time
The member was disconnected from the primary at the specified time.
Warning! Some databases need attention. At least one mirrored database is not in a normal state; databases should be checked.
Warning! Dejournaling is stopped. Dejournaling has been stopped by an operator or because of an error; see Managing Database Dejournaling.
Caught Up in the Dejournaling field for an Active backup failover member or Caught Up in both the Dejournaling field and the Journal Transfer field for an async member indicates that the member has received the most recent journal data from the primary and applied the most recent global operations contained in that data. If the member is not caught up, the amount of time elapsed since generation of the most recent journal data or writing of the most recent operation on the primary is displayed instead.
Incoming Journal Transfer Rate
Below the mirror member status list on backup and async members, the rate at which journal data has arrived from the primary since the last time the Mirror Monitor was refreshed is displayed under Incoming Journal Transfer Rate for This Member.
When the Mirror Monitor page is first loaded, this area displays the text --- (will be displayed on refresh). When the page is next refreshed, either manually or because automatic refresh is set to On at the top of the page, the information displayed depends on whether the incoming journal data is compressed (see Journal Data Compression), as follows:
Mirrored Database Status
Important:
On backup and DR async members, the Missing Mirrored Databases Report on the Mirror Monitor page alerts you to any mirrored databases that are present on the primary but not on the current member. This is very important, as the backup, or a DR async if promoted to backup, cannot successfully take over in the event of a primary outage if it does not have the full set of mirrored databases. The full mirror database name of each missing database is listed. The Missing Mirrored Databases Report is not displayed if there are no missing databases.
On all members, the Mirrored Databases list on the Mirror Monitor page displays one of the following statuses for each database listed:
Status
Description
Normal (primary only) The mirrored database is writable (if not a read-only database) and global updates are being journaled.
Dejournaling (backup and async)
The database has been activated and caught up and the mirror is applying journal data to the database.
Needs Catchup
The database has been activated but not caught up yet; the user-initiated Catchup operation is needed.
Needs Activation The database has not been activated yet; the user-initiated Activate and Catchup operations are needed.
Catchup Running The user-initiated Catchup operation is running on the database.
Dejournaling Stopped Dejournaling has been stopped by an operator or an error; see Stopping Mirroring on Backup and Async Members and Managing Database Dejournaling.
Database Dismounted The database is dismounted.
Obsolete The mirrored database is obsolete and should be removed from the mirror.
On the primary, the Next Record to Dejournal column contains N/A if the status of the database is Normal. Otherwise, the column includes the following:
The status of a database and the operations related to it (Activate and Catchup) are discussed in Activating and Catching Up Mirrored Databases; the operations are available in the drop-down below the list. You can also use the dropdown to mount dismounted databases (but not to dismount mounted databases). You can use the Remove link or select Remove from the drop-down to remove a listed database from the mirror; see Remove Mirrored Databases from a Mirror for more information.
Using the ^MIRROR Status Monitor
The ^MIRROR routine provides a character-based mirror status monitor. The ^MIRROR Status Monitor option displays the status of the mirror members including type, status, journal transfer latency and dejournal latency (see Mirror Member Journal Transfer and Dejournaling Status). The monitor can be run on any mirror member, but running it on a failover member provides information about the arbiter configuration and about all connected async members, which running it on an async member does not.
To start the status monitor, open a Terminal window, run the ^MIRROR routine (see Using the ^MIRROR Routine) in the %SYS namespace, and select Status Monitor from the Mirror Status menu. The following is a sample of output from the monitor when run on a failover member:
Status of Mirror MIR25FEB at 17:17:53 on 02/27/2014

Member Name+Type            Status     Journal Transfer  Dejournaling
--------------------------  ---------  ----------------  --------------
MIR25FEB_A
     Failover               Primary    N/A               N/A
MIR25FEB_B
     Failover               Backup     Active            Caught up
MIR25FEB_C
     Disaster Recovery      Connected  Caught up         Caught up
MIR25FEB_D
     Read-Only Reporting    Connected  Caught up         Caught up

Arbiter Connection Status: 
     Arbiter Address: 127.0.0.1|2188 
     Failover Mode: Arbiter Controlled 
     Connection Status: Both failover members are connected to the arbiter 

Press RETURN to refresh, D to toggle database display, Q to quit,
 or specify new refresh interval <60>
When you run the status monitor on an async member, only the failover members and that async are listed, and the status of dejournaling on the async (running or stopped) is also shown, for example:
Status of Mirror MIR25FEB at 17:17:53 on 02/27/2014

Member Name+Type            Status     Journal Transfer  Dejournaling
--------------------------  ---------  ----------------  --------------
MIR25FEB_A
     Failover               Primary    N/A               N/A
MIR25FEB_B
     Failover               Backup     Active            Caught up
MIR25FEB_C
     Disaster Recovery      Connected  Caught up         Caught up
 Dejournal Status: running (process id: 12256)

Press RETURN to refresh, D to toggle database display, Q to quit,
 or specify new refresh interval <60>
By default, information about mirrored databases is not displayed. Enter d at the prompt to list information about each database in the mirror, including name, directory, status, and next record to dejournal as described in Using the Mirror Monitor, for example:
Mirror Databases:
                                                                   Last Record
Name           Directory path                          Status      Dejournaled
-------------  -----------------------------------     ----------- -----------
MIR25FEB_DB1   C:\InterSystems\20142209FEB25A\Mgr\MIR25FEB_DB1\
                                                       Active
   Current,c:\intersystems\20142209feb25a\mgr\journal\MIRROR-MIR25FEB-20140227.001,40233316
MIR25FEB_DB2   C:\InterSystems\20142209FEB25A\Mgr\MIR25FEB_DB2\
                                                       Active
   Current,c:\intersystems\20142209feb25a\mgr\journal\MIRROR-MIR25FEB-20140227.001,40233316
Monitoring Mirroring Communication Processes
There are processes that run on each system (primary and backup failover members, and each connected async member) that are responsible for mirror communication and synchronization.
For more information, see the following topics:
Mirroring Processes on the Primary Failover Member
Running the system status routine (^%SS) on the primary failover member reveals the processes listed in the following table.
Note:
The CPU, Glob, and Pr columns have been intentionally omitted from the ^%SS output in this section.
Mirroring Processes on Primary Failover Member
Device Namespace Routine User/Location
/dev/null %SYS MIRRORMGR Mirror Master
MDB2 %SYS MIRRORCOMM Mirror Primary*
192.168.1.1 %SYS MIRRORCOMM Mirror Svr:Rd*
The processes are defined as follows:
Each connected async member results in a new set of Mirror Master, Mirror Primary, and Mirror Svr:Rd* processes on the primary failover member.
Mirroring Processes on the Backup Failover/Async Member
Running the system status routine (^%SS) on the backup failover/async member reveals the processes listed in the following table.
Mirroring Processes on Backup Failover/Async Member
Device Namespace Routine User/Location
/dev/null %SYS MIRRORMGR Mirror Master
/dev/null MIRRORMGR Mirror Dejour
/dev/null MIRRORMGR Mirror Prefet*
/dev/null MIRRORMGR Mirror Prefet*
MDB1 MIRRORMGR Mirror Backup
/dev/null MIRRORMGR Mirror JrnRead
The processes identified in this table also appear on each connected async member:
Updating Mirror Member Network Addresses
When one or more of the network addresses of one or more mirror members (including the primary) must be updated, as described in Editing or Removing a Failover Member, this information is generally changed on the primary. When you save your changes, the primary propagates them to all connected mirror members (and to disconnected members when they reconnect). You cannot change any mirror member network addresses on a connected backup or async member, as mirror members must receive all such information from the primary. There are a few exceptions to the general case, however, as follows:
Note:
As described in Configure Async Mirror Members, the Async Member Address you provide when an async member joins a mirror becomes the async’s superserver address and mirror private address (see Mirror Member Network Addresses). If you want these to be different, for example when you want to place a DR async’s mirror private address on the mirror private network while leaving its superserver address on the external network, after adding the async to the mirror you can update its addresses as described here.
Authorizing X.509 DN Updates (SSL/TLS Only)
When you configure a mirror to use SSL/TLS, you must authorize the newly-added second failover member and each new async member on the first failover member before it can join the mirror, as described in Authorize the Second Failover Member or Async Member (SSL/TLS only). For similar reasons, when a member of a mirror using SSL/TLS updates its X.509 certificate and DN, this update must be propagated to and authorized on other members in one of the following ways:
Note:
The Authorize/Reject Pending DN Updates option (primary) or the Authorize Pending DN Updates option (backup or async) on the Mirror Configuration menu of the ^MIRROR routine can be also used to authorize X.509 DN updates.
When a mirror includes one or more async members of Caché versions earlier than 2015.2, X.509 updates cannot be propagated as above, and you must therefore use special procedures, as follows:
Promoting a DR Async Member to Failover Member
A disaster recovery (DR) async mirror member can be promoted to failover member, replacing a current failover member if two are configured or joining the current member if there is only one. For example, when one of the failover members will be down for a significant period due to planned maintenance or following a failure, you can temporarily promote a DR async to take its place (see Temporary Replacement of a Failover Member with a Promoted DR Async). During true disaster recovery, when both failover members have failed, you can promote a DR to allow it to take over production as primary failover member, accepting the risk of some data loss; see Manual Failover to a Promoted DR Async During a Disaster for more information.
When a DR async is promoted to failover member, it is paired, if possible, with the most recent primary as failover partner; when this cannot be done automatically, you are given the option of choosing the failover partner. Following promotion, the promoted member communicates with its failover partner's ISCAgent as any failover member does at startup, first to obtain the most recent journal data, then to become primary if the failover partner is not primary, or to become backup if the failover partner is primary. The promoted member cannot automatically become primary unless it can communicate with its failover partner to obtain the most recent journal data.
When promoting a DR async to failover member, there are several important considerations to bear in mind:
In some disaster recovery situations, however, the promoted DR async cannot contact any existing failover member’s agent. When this is the case, you have the option of promoting the DR with no failover partner, as described under Promotion With Partner Selected by User in this section. This means that the DR can become primary only, using only the journal data it already has and any more recent journal data that may be available on other connected mirror members, if any. When this happens, the new primary may not have all the journal data that has been generated by the mirror, and some application data may be lost. If you restart a former failover partner while a DR async promoted in this manner is functioning as primary, it may need to be rebuilt; see Rebuilding a Mirror Member for more information. Be sure to see the DR promotion procedure later in this section for details.
Note:
When the primary Caché instance is in an indefinite trouble state due to isolation from both the backup and the arbiter in arbiter controlled mode, as described in Automatic Failover Mechanics Detailed, you cannot promote a DR async to failover member.
Promotion With Partner Selected Automatically
When possible, the promoted DR async’s failover partner is selected automatically, as follows:
Promotion With Partner Selected by User
When Caché is not running on any failover member and at least one ISCAgent cannot be contacted, the promotion procedure informs you of which agents cannot be contacted and gives you the option of choosing a failover partner. To avoid the possibility of data loss, you should select the failover member that was last primary, even if its agent cannot be contacted. The results differ depending on the selection you make and ISCAgent availability, as follows:
When the failover partner is not selected automatically, the following rules apply:
Caution:
If the promoted DR async becomes primary or is forced to become primary without obtaining the most recent journal data, some global update operations may be lost and the other mirror members may need to be rebuilt (as described in Rebuilding a Mirror Member). Under some disaster recovery scenarios, however, you may have no alternative to promoting a DR async to primary without obtaining journal data. If you are uncertain about any aspect of the promotion procedure, InterSystems recommends that you contact the InterSystems Worldwide Response Center (WRC) for assistance.
To promote a DR async member to failover member, do the following:
  1. On the DR async member that you are promoting to failover member, navigate to the [Home] > [Mirror Monitor] page to display the Mirror Monitor.
  2. Click the Promote to Failover Member button at the top of the page.
  3. Follow the instructions provided by the resulting dialog boxes. In the simplest case, this involves only confirming that you want to proceed with promotion, but it may include selecting a failover partner or no partner, as described earlier in this section.
  4. If a VIP is configured for the mirror, the promoted DR async must have a network interface on the VIP’s subnet to be able to acquire the VIP in the event of becoming primary (due to manual failover or to a later outage of the primary while operating as backup).
  5. When a former failover member’s agent is available at the time a DR async is promoted, it automatically sets ValidatedMember=0 in the [MirrorMember] section of the Caché Parameter File for the Caché instance (see [MirrorMember] in the Caché Parameter File Reference). This instructs the Caché instance to obtain its new role in the mirror from the promoted DR async, rather than reconnecting to the mirror in its previous role.
    If a former failover member’s agent cannot be contacted at the time of promotion, this change cannot be made automatically. Therefore, at the earliest opportunity and before Caché is restarted on any former failover member whose agent could not be contacted at the time of promotion, you must manually set ValidatedMember=0 by editing the Caché Parameter File for the Caché instance. The instructions list the former failover member(s) on which this change must be made.
    Caution:
    Restarting Caché on a mirror member whose agent was down at the time of DR async promotion without first setting ValidatedMember=0 may result in both failover members simultaneously acting as primary.
Rebuilding a Mirror Member
Under some circumstances following an outage or failure, particularly if manual procedures are used to return a mirror to operation, a member’s mirrored databases may no longer be synchronized with the mirror. For example, when a backup that did not automatically take over following a primary outage is forced to become primary without the most recent journal data (see Manual Failover When the Backup Is Not Active), one or more of the mirrored databases on the former primary may be inconsistent with the new primary’s databases.
In some cases the mirror is able to reconcile the inconsistency, but in others it cannot. When a mirror member whose data is irreparably inconsistent with the mirror is restarted and attempts to rejoin the mirror, the process is halted and the following severity 2 message is written to the console log:
This member has detected that its data is inconsistent with the mirror MIRRORNAME. If the primary is
running and has the correct mirrored data, this member, including its mirrored databases, must be 
rebuilt.
This message is preceded by a severity 1 message providing detail on the inconsistency.
When this message appears in the console log, take the following steps:
  1. Confirm that the functioning mirror has the desired version of the data, and that the member reporting the inconsistency should therefore be rebuilt. This will likely be true, for example, in any case in which this message appears when you are restarting the former primary after having chosen to manually cause another member to become primary without all of the most recent journal data. If this is the case, rebuild the inconsistent member using the steps that follow.
    If you conclude instead that the member reporting the inconsistency has the desired version of the data, you can adapt this procedure to rebuild the other members.
    If you are not certain which version of the data to use or whether it is desirable to rebuild the inconsistent member, contact the InterSystems Worldwide Response Center (WRC) for help in determining the best course of action.
  2. Back up the mirrored databases on a member of the functioning mirror. You can also use an existing backup created on a member of the mirror, if you are certain that
  3. Remove the inconsistent member from the mirror as described in Editing or Removing Mirror Configurations, retaining the mirrored DB attribute on the mirrored databases.
  4. Add the member to the mirror using the appropriate procedure, as described in Configure the second failover member or Configure async mirror members.
  5. Restore the mirrored databases on the member from the backup you created or selected, as described in Add an existing database to the mirror.
Stopping Mirroring on Backup and Async Members
You can temporarily stop mirroring on the backup or an async member. For example, you may want to stop mirroring on the backup member for a short time for maintenance or reconfiguration, or during database maintenance on the primary, and you might temporarily stop mirroring on a reporting async member to reduce network usage. To do so,
  1. Navigate to the [Home] > [Mirror Monitor] page for the member on which you want to stop mirroring
  2. If the member is the backup failover member, click the Stop Mirroring On This Member button.
  3. If the member is an async, click the Stop Mirroring On This Member link in the row for the mirror you want the async to stop mirroring. (Stopping mirroring of one mirror does not affect others a reporting async belongs to.)
The operation takes a few seconds. When you refresh the Mirror Monitor, the Stop Mirroring On This Member is replaced by Start Mirroring On This Member, which you can use to resume mirroring.
Important:
When you stop mirroring on a member, mirroring remains stopped until you explicitly started it again as described in the preceding. Neither reinitialization of the mirror or a restart of the member starts mirroring on the member.
Note:
You can also use the mirroring SYS.Mirror.StopMirror() andSYS.Mirror.StartMirror() API methods or the ^MIRROR routine (see Using the ^MIRROR Routine) to perform these tasks.
Managing Database Dejournaling
As described in Mirror Synchronization, dejournaling is the process of synchronizing mirrored databases by applying journal data from the primary failover member to the mirrored databases on another mirror member. Although dejournaling is an automatic process during routine mirror operation, under some circumstances you may need or want to manage dejournaling using options provided by the ^MIRROR routine (see Using the ^MIRROR Routine). Because of the differences in purpose between the backup failover member, DR async members, and reporting async members, there are also some differences in dejournaling and dejournaling management, specifically in regard to interruptions in dejournaling, whether deliberate or caused by error. In addition, a user-defined filter can be applied to dejournaling for one or more of the mirrors a reporting async belongs to.
Note:
All types of mirror members continue to receive journal data even when dejournaling of one or all mirrored databases is paused.
The SYS.Mirror.AsyncDejournalStatus(), SYS.Mirror.AsyncDejournalStart(), SYS.Mirror.AsyncDejournalStop(), and SYS.Mirror.DejournalPauseDatabase() mirroring API methods can also be used to manage dejournaling.
Managing Dejournaling on the Backup or a DR Async
Because mirrored databases on the backup failover member and DR async members should always as close as possible to caught up to support potential takeover as primary or use in disaster recovery, respectively, dejournaling is paused by error for only the affected mirrored database, while it continues for others.
For example, when there is a database write error such as <FILEFULL> on the backup or a DR async member, dejournaling of the database on which the write error occurred is automatically paused, but dejournaling of other mirrored databases continues. Dismount the database and correct the error, then remount the database and resume dejournaling by selecting the Activate or Catchup mirrored database(s) option from the Mirror Management menu of the ^MIRROR routine or catching up the database using the management portal (see Activating and Catching Up Mirrored Databases).
On a DR async, you also have the option of pausing dejournaling for all mirrored databases on the member using the Manage mirror dejournaling on async member option on the Mirror Management menu of the ^MIRROR routine. (This option is disabled on backup members.) You can use this following a dejournaling error or for maintenance purposes. For example, if you prefer to pause dejournaling for all databases in the mirror when a dejournaling error causes dejournaling to pause for one database only, you can do the following:
  1. Select Manage mirror dejournaling on async member option from the Mirror Management menu of the ^MIRROR routine to pause dejournaling for all databases.
  2. Dismount the problem database, correct the error, and remount the database.
  3. Select Manage mirror dejournaling on async member option from the Mirror Management menu of the ^MIRROR routine to restart dejournaling for all databases. (This option automatically activates the database that had the error and catches it up to the same point as the most up to date database in the mirror.)
Note:
When you pause dejournaling on a DR async member using the Manage mirror dejournaling on async member option, dejournaling does not restart until you use the option again to restart it.
Managing Dejournaling on a Reporting Async
As described in Async Mirror Members, a reporting async member can belong to multiple mirrors. For each of these mirrors, you may want dejournaling of the databases to be continuous or you may want dejournaling to be conducted on a regular schedule, depending on the ways in which the databases are being used. For example, for a given mirror you may want to dejournal between midnight and 4:00am, allowing the databases to remain static for stable report generation over the rest of the day.
In addition, you may want different behavior for different mirrors when dismounting a database for maintenance or encountering an error during dejournaling. For one mirror, it may be most important that the database for which dejournaling is paused not fall behind the other databases in the mirror, in which case you will prefer to pause dejournaling for the entire mirror; for another, it may be most important that the databases in the mirror stay as up to date as possible, in which case you will want to pause only the database involved.
When you want to pause dejournaling for one or more mirrors on a reporting async as a one-time operation or on a regular basis, you can select the Manage mirror dejournaling on async member option from the Mirror Management menu of the ^MIRROR routine to pause dejournaling for all databases in any mirror you wish. When you want to restart dejournaling, use the Manage mirror dejournaling on async member option again. (This option is not available on backup members.)
Unlike backup and DR async members, when there is an error during dejournaling of a database on a reporting async member, dejournaling is automatically paused for all databases in that mirror. Depending on your needs and policies, you can either:
When you want to perform maintenance on a mirrored database on a reporting async member, you can simply dismount the database, then mount the database again after maintenance and use the Activate or Catchup mirrored database(s) option or the management portal to catch up the database. (If the maintenance involves several such databases, use the Mirror Monitor to perform the operation on all of them at once, as described in Activating and Catching Up Mirrored Databases. This is more efficient and less time-consuming than catching up the databases individually.)
Note:
When dejournaling pauses for a mirror on a reporting async member due to an error, the member attempts to restart dejournaling for the mirror the next time its connection to the primary is rebuilt. When you pause dejournaling for a mirror on an async member using the Manage mirror dejournaling on async member option, dejournaling for the mirror does not restart until you use the option again to restart it.
Using a Dejournal Filter on a Reporting Async
On a reporting async only, you can set a user-defined dejournal filter on a given mirror, letting you execute your own code for each journal record to determine which records are applied to the Read-Write databases in that mirror. Once you have defined a filter, you can set it on as many mirrors as you want, and you can set, change and remove filters at any time.
Note:
This functionality is intended only for highly specialized cases and for conversion of shadowing configurations making use of the equivalent functionality of shadowing. Alternatives should be carefully considered. For controlling which globals are replicated to mirror members, global mapping to non-mirrored databases provides a much simpler, lightweight solution. For monitoring updates to application databases, solutions built at the application level are typically more flexible.
A dejournal filter allows a reporting async to skip dejournaling of some of the records in a journal file received from the primary. However, this applies to Read-Write databases only—databases originally added to the mirror on a read-write reporting async, or from which the FailoverDB flag has been cleared since the database was added to the mirror as Read-Only. (See Clearing the FailoverDB Flag on Reporting Async Mirror Members for a detailed explanation of the FailoverDB flag and the mount status of mirrored databases on reporting asyncs.) If the FailoverDB flag is set on a database, which means that the database is mounted as Read-Only, the dejournal filter code still executes, but all records are always dejournaled on that database, regardless of what the filter code returns.
Important:
Setting a dejournal filter slows dejournaling for the mirror it is set on; this effect may be significant, depending on the contents of the filter.
To create a dejournal filter, extend the superclass SYS.MirrorDejournal to create a mirror dejournal filter class. The class name should begin with Z or z so that it is preserved during a Caché upgrade.
To set a dejournal filter on a mirror on a reporting async, navigate to the [Home] > [Configuration] > [Edit Async] page of the management portal, click the Edit Dejournal Filter link next to the desired mirror in the Mirrors this async member belongs to list, enter the name of a mirror dejournal filter class, and click Save. To remove a filter, do the same but clear the entry box before clicking Save. Whenever you add, change, or remove a journal filter on a mirror, dejournaling is automatically restarted for that mirror so the filter can be applied. However, if you modify and recompile a mirror dejournal filter class, you must manually stop and restart dejournaling on all mirrors it is set on using the Manage mirror dejournaling on async member option on the Mirror Management menu of the ^MIRROR routine.
General Mirroring Considerations
This section provides information to consider, recommendations, and best-practice guidelines for mirroring. It includes the following subsections:
Mirror APIs
The SYS.Mirror class provides methods for programmatically calling the mirror operations available through the management portal and the ^MIRROR routine (see Using the ^MIRROR Routine), as well as many queries. For example, the SYS.Mirror.CreateNewMirrorSet() method can be used to create a mirror and configure the first failover member, while the SYS.Mirror.MemberStatusList() query returns a list of mirror members and the journal latency status of each. See the SYS.Mirror class documentation for descriptions of these methods.
If you use an external script to perform backups, you can use the %SYSTEM.Mirror class methods to verify whether a system is part of a mirror and, if so, what its role is:
$System.Mirror.IsMember() 
$System.Mirror.IsPrimary()
$System.Mirror.IsBackup()
$System.Mirror.IsAsyncMember()
$System.Mirror.MirrorName()
where $SYSTEM.Mirror.IsMember() returns 1 if this system is a failover member, 2 if this is an async mirror member, or 0 if this is not a mirror member; $SYSTEM.Mirror.IsPrimary() returns 1 if this system is the primary failover member, or 0 if it is not; $SYSTEM.Mirror.IsBackup() returns 1 if this system is the backup failover member, or 0 if it is not; $SYSTEM.Mirror.IsAsyncMember() returns 1 if this system is an async member, or 0 if it is not; $SYSTEM.Mirror.MirrorName() returns the name of the mirror if the instance is configured as a failover mirror member or NULL if it is not.
You can also use %SYSTEM.Mirror.GetMemberType() and %SYSTEM.Mirror.GetMemberStatus() to obtain information about the mirror membership (if any) of the current instance of Caché and its status in that role; see Mirror Member Journal Transfer and Dejournaling Status for more information.
External Backup of Primary Failover Member
When using the Backup.General.ExternalFreeze() method to freeze writes to a database on the primary failover member so an external backup can be performed, as described in the Backup and Restore chapter of the Caché Data Integrity Guide, ensure that the external freeze does not suspend updates for longer than the specified ExternalFreezeTimeOut parameter of Backup.General.ExternalFreeze(). If this happens, the mirror may fail over to the backup failover member, thereby terminating the backup operation in progress.
Upgrading Caché on Mirror Members
See Minimum Downtime Upgrade with Mirroring in the “Upgrading Caché” chapter of the Caché Installation Guide for considerations to take into account when upgrading Caché on a mirror member.
Database Considerations for Mirroring
This section provides information to consider when configuring and managing mirrored databases:
Caché Instance Compatibility
While mirror member systems can be of different operating systems and/or endianness, the Caché instances on all members of a mirror
Note:
The one exception to the character width and locale requirements is that an 8-bit instance using a locale based on the ISO 8859 Latin-1 character set is compatible with a Unicode instance using the corresponding wide character locale. For example, an 8–bit primary instance using the enu8 locale is compatible with a Unicode backup instance using the enuw locale. However, an 8–bit primary instance using the heb8 locale is not compatible with a Unicode backup instance using the hebw locale, as these locales are not based on ISO 8859 Latin-1.
In addition, the failover members and any DR async member must be of the same Caché version; they can differ only for the duration of one of the upgrade procedures described in Minimum Downtime Upgrade with Mirroring in the “Upgrading Caché” chapter of the Caché Installation Guide. Once an upgraded member becomes primary, you cannot make use of the other failover member and any DR async members (and in particular cannot allow them to become the primary) until the upgrade is completed.
Mirroring does not require reporting async members to be of the same Caché version as the failover members, although application functionality may require it.
See “Supported Version Interoperability” in InterSystems Supported Platforms for information about the range of the version difference allowed between failover/DR async members and reporting async member, and allowed between failover and DR async members during an upgrade procedure.
Member Endianness Considerations
When creating a mirrored database or adding an existing database to a mirror, if a backup failover member or async member has a different endianness than the primary failover member, you cannot use the backup and restore procedure described in Add an existing database to the mirror; you must instead use the procedure in that section involving copying the database’s CACHE.DAT file. Additionally, when using that procedure, insert the following step after copying the CACHE.DAT file to all non-primary members and before mounting the database on those members:
Creating a Mirrored Database Using the ^DATABASE Routine
You can create mirrored databases on mirror members using the ^DATABASE routine. (See ^DATABASE in the “Using Character-based Security Management Routines” chapter of the Caché Security Administration Guide for information about the routine.) You must create the new mirrored database on the primary member before creating it on other mirror members. To create a mirrored database:
  1. Run the ^DATABASE routine, and select the 1) Create a database option.
  2. Enter the directory path at the Database directory? prompt.
  3. Enter 3 (Mirror DB Name:) at the Field number to change? prompt, and enter a mirror name for the mirrored database at the Mirror DB Name? prompt.
    Note:
    If the member on which you are creating the mirrored database is a member of multiple mirrors and you are creating a mirrored database that is in a mirror that is different from the one that is listed by default, Enter (Mirror Set Name:) at the Field number to change? prompt, and choose the correct mirror name from the list. If the member on which you are running the routine is a member of only one mirror, this field cannot be changed.
  4. Modify other fields as necessary for your database, then when you have finished making changes, press Enter at the Field number to change? prompt without specifying any option.
  5. Enter the dataset name of the database at the Dataset name of this database in the configuration: prompt. This is the name that is displayed in the management portal.
  6. Enter responses to the remaining prompts until the mirrored database is created.
When you create the mirrored databases on the backup and async members, they automatically catch up with the database you created on the primary member.
Note:
You cannot add an existing non-mirrored database to a mirror using the ^DATABASE routine; see Adding Databases to Mirror for the required procedure.
Recreating an Existing Mirrored Database Using the ^DATABASE Routine
The 10) Recreate a database option of ^DATABASE routine lets you clear the data in an existing database without changing the database’s name or size. (See ^DATABASE in the “Using Character-based Security Management Routines” chapter of the Caché Security Administration Guide for information about the routine.) You can use this option with a mirrored database, but you must use it on every mirror member on which the database appears, and in the same order in which you use the Create a database option to create a new mirrored database—on the primary first, then the backup, then any asyncs on which the database is part of the mirror.
Caution:
If you use the 10) Recreate a database option to recreate a database on the primary, you must repeat the operation on the backup and any DR asyncs in the mirror; if you do not, the database may become obsolete in the event of failover or disaster recovery. You are strongly encouraged to repeat the recreate operation on reporting asyncs as well.
Mounting/Dismounting Mirrored Databases
Mirrored databases can be mounted/dismounted on either failover member. If dismounted on the backup failover member, however, the database remains in a “stale” state until it is remounted, after which mirroring attempts to catch up the database automatically. If the required journal files are available on the primary failover member, the automatic update should succeed, but if any of the required journal files on the primary member have been purged, you must restore the database from a recent backup on the primary member.
Copying Mirrored Databases to Non-Mirrored Systems
You can copy a mirrored database to a non-mirrored system and mount it read-write on that system by doing the following:
  1. Back up the mirrored database on the primary or backup failover member and restore it on the non-mirrored system using the procedure described in Add an Existing Database to the Mirror (omit the step of manually activating and catching up the database following external backup restore or cold backup restore). Once restored, the database is still marked as mirrored and is therefore read-only.
  2. On the non-mirrored system, use the ^MIRROR routine (see Using the ^MIRROR Routine) to remove the database from the mirror by selecting Mirror Management and then Remove mirrored database and following the instructions. Following this procedure the database is mounted read-write.
Ensemble Considerations for Mirroring
This section discusses additional considerations that apply to Ensemble, including:
Creating an Ensemble Namespace with Mirrored Data
Because creating an Ensemble namespace requires database writes that enable the use of Ensemble in the new namespace, an Ensemble namespace with mappings from one or more mirrored databases must be created on the current primary mirror member and cannot be created on the backup, where mirrored databases are read-only.
How Ensemble Handles a Namespace with Mirrored Data
Ensemble examines the mappings in a namespace and determines whether that namespace contains any mappings from a mirrored database.
When you start or upgrade Ensemble, it treats the primary mirror member differently from the other mirror members, as follows:.
Recommended Mirroring Configuration for Ensemble
Mirroring is intended to be a high availability solution and there should thus be minimal extraneous activity on either of the mirror instances. That is, you should mirror all databases on any mirrored instances.
Customers sometimes choose to have “less critical” productions running on either node without having that data mirrored. Such a configuration, however, creates operational complexity that may prove difficult to maintain. Consequently, InterSystems strongly recommends that you avoid such configurations and that you instead mirror all the databases.
How Ensemble Autostart Works in a Mirrored Environment
When a mirror system starts up (at which point no member has yet become the primary failover member):
  1. Ensemble does not start any production that accesses mirrored data even if the production is specified in ^Ens.AutoStart. If the member becomes the primary instance, these productions will be started at that time.
  2. Ensemble determines if there are any namespaces on the instance that do not access mirrored data. As described previously, InterSystems recommends that only mirrored productions be installed on a mirror member. If you have, however, installed any production with non-mirrored databases, Ensemble starts the production specified in ^Ens.AutoStart. (This logic ensures that if you have installed a non-mirrored namespace on a mirror member, it is started on Ensemble startup.)
Later, when the member becomes the primary failover member, Ensemble finds the namespaces that do reference mirrored data so that it can start the productions in these namespaces. If you follow InterSystems recommendations, no production accessing mirrored data should be running before an instance becomes the primary mirror member. Ensemble first checks to see if a production is already running before starting it, specifically:
  1. Ensembles determines whether the production is already running by counting the jobs that are running as the _Ensemble user in the namespace. If there are more than two such jobs, indicating that the production is already running, Ensemble logs a warning to the console log and does not attempt to start the production.
  2. If, as expected, the production is not running, Ensemble automatically starts the production specified in ^Ens.AutoStart.
For complete information about starting and stopping Ensemble productions, see the Starting and Stopping Productions chapter of Managing Ensemble.
Mirror Outage Procedures
Due to planned maintenance or to unplanned problems, the Caché instance on one or both of the failover members in a mirror may become unavailable. When a failover member’s Caché instance is unavailable, its ISCAgent may continue to be available, or may also be unavailable due to a host system failure. This section provides procedures for dealing with a variety of planned and unplanned outage scenarios involving instance outages or total outages of one or both failover members.
As noted in Automatic Failover Mechanics, there are two requirements for safe and successful failover from the primary failover member to the backup failover member:
In reading and using this material, you may want to refer to Automatic Failover Rules to review the rules governing automatic failover.
For information about using the Mirror Monitor to determine whether a backup failover member is active or a DR async is caught up, see Mirror Member Journal Transfer and Dejournaling Status and Monitoring Mirrors.
This section covers the following topics:
Planned Outage Procedures
To perform planned maintenance, you may need to temporarily shut down the Caché instance on one of the failover members, or the entire system hosting it. Situations in which you might do this include the following:
In this section, the term graceful shutdown refers to the use of the ccontrol stop command. For information about ccontrol, see Controlling Caché Instances in the “Using Multiple Instances of Caché” chapter of the Caché System Administration Guide.
Note:
In addition to the ccontrol stop command, the SYS.Mirror API and the ^MIRROR routine can be used to manually trigger failover.
For information on shutting down the primary without triggering automatic failover, see Avoiding Unwanted Failover During Maintenance of Failover Members.
When there is no backup failover member available due to planned or unplanned failover member outage, you can promote a DR async member to failover member if desired, protecting you from interruptions to database access and potential data loss should a primary failure occur. See Temporary Replacement of a Failover Member with a Promoted DR Async for information about temporarily promoting a DR async member to failover member.
Maintenance of Backup Failover Member
When you need to take down the backup failover member Caché instance, you can perform a graceful shutdown on the backup instance. This has no effect on the functioning of the primary. When the backup instance is restarted it automatically rejoins the mirror as backup.
However, if the primary’s Caché instance is restarted while the backup’s host is shut down and the backup’s ISCAgent therefore cannot be contacted, the primary cannot become primary after the restart, because it has no way of determining whether it was the most recent primary. When you need to shut down the backup’s host system, you can eliminate this risk using the following procedure:
  1. On the backup, navigate to the [Home] > [Mirror Monitor] page to display the Mirror Monitor, and click the Demote to DR Member button at the top of the page to demote the backup to DR async.
  2. Shut down on the former backup instance and its host system, complete the maintenance work, and restart the member as a DR async.
  3. Promote the former backup from DR async to failover member, as described in Promoting a DR Async Member to Failover Member, to restore it to its original role.
If the primary is restarted after the backup has been demoted, it automatically becomes primary.
If you do not demote the backup before shutting it down, and find you do need to restart the primary Caché instance while the backup’s agent is unavailable, follow the procedures in Unplanned Outage of Both Failover Members.
Maintenance of Primary Failover Member
When you need to take down the primary failover member Caché instance or host system, you can gracefully fail over to the backup first. When the backup is active (see Mirror Synchronization), perform a graceful shutdown on the primary Caché instance. Automatic failover is triggered, allowing the backup to take over as primary.
When maintenance is complete, restart the former primary Caché instance or host system. When the Caché instance restarts, it automatically joins the mirror as backup. If you want to return the former primary to its original role, you can repeat the procedure—perform a graceful shutdown on the backup Caché instance to trigger failover, then restart it.
Avoiding Unwanted Failover During Maintenance of Failover Members
You may want to gracefully shut down the primary failover member without the backup member taking over as primary, for example when the primary will be down for only a very short time, or prevent the backup from taking over in the event of a primary failure. You can do this in any of three ways:
Upgrade of Caché Instances in a Mirror
To upgrade Caché across a mirror, see the procedures in Minimum Downtime Upgrade with Mirroring in the “Upgrading Caché” chapter of the Caché Installation Guide.
Unplanned Outage Procedures
When a failover member unexpectedly fails, the appropriate procedures depend on which Caché instance has failed, the failover mode the mirror was in (see Automatic Failover Mechanics Detailed), the status of the other failover member instance, the availability of both failover member’s ISCAgents, and the mirror’s settings.
In reading and using this section, you may want to review Mirror Response to Various Outage Scenarios, which discusses the details of the backup’s behavior when the primary becomes unavailable.
Unplanned Outage of Backup Failover Member
When the backup failover member’s Caché instance or its host system fails, the primary continues to operate normally, although some applications may experience a brief pause (see Effect of Backup Outage for details).
When an unplanned outage of the backup occurs, correct the conditions that caused the failure and then restart the backup Caché instance or host system. When the backup Caché instance restarts, it automatically joins the mirror as backup.
Note:
If the backup fails in agent controlled mode (see Automatic Failover Rules) and the backup’s ISCAgent cannot be contacted, the primary’s Caché instance cannot become primary after being restarted, because it has no way of determining whether it was the most recent primary. Therefore, if you need for any reason to restart the primary Caché instance while the backup host system is down, you must use the procedure described in Maintenance of Backup Failover Member to do so.
Unplanned Outage of Primary Failover Member With Automatic Failover
As described in Automatic Failover Rules, when the primary Caché instance becomes unavailable, the backup can automatically take over as primary when
See Automatic Failover in Response to Primary Outage Scenarios for a detailed discussion of the situations in which automatic failover can take place.
When the backup has automatically taken over following an unplanned primary outage, correct the conditions that caused the outage, then restart the former primary Caché instance or host system. When the Caché instance restarts, it automatically joins the mirror as backup. If you want to return the former primary to its original role, perform a graceful shutdown on the backup Caché instance to trigger failover, then restart it, as described in Maintenance of Primary Failover Member.
Unplanned Outage of Primary Failover Member When Automatic Failover Does Not Occur
As described in Automatic Failover Rules, the backup Caché instance cannot automatically take over from an unresponsive primary instance when the primary’s host system, including its ISCAgent, is unavailable, and any of the following is true:
Under this scenario, there are three possible situations, each of which is listed with possible solutions in the following:
  1. The primary host system has failed but can be restarted. You can do either of the following:
  2. The primary host system has failed and cannot be restarted. You can manually force the backup to take over. The procedures for this vary depending on whether or not the backup was active when it lost its connection the primary; there is some risk of data loss, as described in the following sections.
  3. The primary host system is running but is network isolated from the arbiter as well as the backup; see Unplanned Isolation of Primary Failover Member for procedures.
Manually Forcing a Failover Member to Become Primary
When a failover member cannot become primary you can force it to do so, but there is a risk of data loss if you do this in any situation in which the last primary could have more recent journal data than the member you are forcing. The following procedures describe how to determine and manage that risk. If you force a member to become the primary when you cannot confirm that it has the most recent journal data, the other mirror members may be unable to rejoin the mirror and need to be rebuilt (as described in Rebuilding a Mirror Member).
Caution:
Before proceeding, confirm that the primary is down and will remain down during this procedure. If you cannot confirm that, it is best to abort this procedure in order to avoid the risk that the original primary becomes available again, resulting in both members simultaneously acting as primary. If you are uncertain whether this procedure is appropriate, contact the InterSystems Worldwide Response Center (WRC) for assistance.
Determining Whether the Backup Was Active Before Manually Failing Over
Assume two failover members called Caché A and Caché B. If the ^MIRROR routine confirms that the backup (Caché B) was active at the time contact with the primary (Caché A) was lost, and therefore has the most recent journal data from Caché A, you can manually fail over using a single procedure. When the connection was lost due to the primary failure, this poses no risk of data loss. However, when multiple failures occur, it is possible that an active backup does not have all of the latest journal data from the primary because the primary continued operating for some period after the connection was lost.
Determine whether the backup was active using this procedure:
  1. Confirm that Caché A is actually down (and ensure that it stays down during the entire manual failover procedure).
  2. On Caché B, run the ^MIRROR routine (see Using the ^MIRROR Routine) in the %SYS namespace in Caché Terminal.
  3. Select Mirror Management from the main menu to display the following submenu:
     1) Add mirrored database(s)
     2) Remove mirrored database(s)
     3) Activate or Catchup mirrored database(s)
     4) Change No Failover State
     5) Try to make this the primary
     6) Connect to Mirror
     7) Stop mirroring on this member
     8) Modify Database Size Field(s)
     9) Force this node to become the primary
    10) Promote Async DR member to Failover member
    11) Demote Backup member to Async DR member
    12) Mark an inactive database as caught up
    13) Manage mirror dejournaling on async member (disabled)
    14) Pause dejournaling for database(s)
  4. Select the Force this node to become the primary option. If the backup was active at the time contact was lost, a message like the following is displayed:
    This instance was an active backup member the last time it was 
    connected so if the primary has not done any work since that time,
    this instance can take over without having to rebuild the mirror 
    when the primary reconnects. If the primary has done any work
    beyond this point (file #98),
         C:\InterSystems\MyCache\mgr\journal\MIRROR-GFS-20140815.009
    then the consequence of forcing this instance to become the primary is
    that some operations may be lost and the other mirror member may need
    to be rebuilt from a backup of this node before it can join as
    a backup node again.
    Do you want to continue? <No>
    If you have access to the primary’s journal files, you can confirm that the cited file is the most recent before proceeding.
    If the backup was not active at the time contact with the primary was lost, a message like the following is displayed:
    Warning, this action can result in forcing this node to become
    the primary when it does not have all of the journal data which
    has been generated in the mirror. The consequence of this is that
    some operations may be lost and the other mirror member may need
    to be rebuilt from a backup of this node before it can join as
    a backup node again.
    Do you want to continue? <No>
Manual Failover To An Active Backup
If the Force this node to become the primary option of the ^MIRROR routine confirms that the backup was active when it lost its connection to the primary, complete the manual failover procedure as follows:
  1. Enter y at the Do you want to continue? prompt to continue with the procedure. The Force this node to become the primary option waits 60 seconds for the mirror member to become the primary. If the operation does not successfully complete within 60 seconds, ^MIRROR reports that the operation may not have succeeded and instructs you to check the console log to determine whether the operation failed or is still in progress.
  2. Once the ^MIRROR rountine confirms that the backup has become primary, restart Caché A when you can do so. Caché A joins the mirror as backup when the Caché instance restarts.
Manual Failover When the Backup Is Not Active
Even when the ^MIRROR routine does not confirm that the backup (Caché B) was active at the time it lost its connection with the primary (Caché A), you can still continue the manual failover process using the following procedure, but there is some risk of data loss if you do. This risk can be minimized by copying the most recent mirror journal files from Caché A, if you have access to them, to Caché B before manual failover, as described in this procedure.
  1. If you have access to the primary’s mirror journal files, copy the most recent files to Caché B, beginning with the latest journal file on Caché B and including any later files from Caché A. For example, if MIRROR-MIRRORA-20130220.001 is the latest file on Caché B, copy MIRROR-MIRRORA-20130220.001 and any later files from Caché A. Check the files’ permissions and ownership and change them if necessary to match existing journal files.
  2. If you accept the risk of data loss, confirm that you want to continue by entering y at the prompt; the backup becomes primary. The Force this node to become the primary option waits 60 seconds for the mirror member to become the primary. If the operation does not successfully complete within 60 seconds, ^MIRROR reports that the operation may not have succeeded and instructs you to check the console log to determine whether the operation failed or is still in progress.
  3. Once the ^MIRROR rountine confirms that the backup has become primary, restart Caché A when you can do so.
Unplanned Isolation of Primary Failover Member
As described in Automatic Failover Mechanics, when the primary simultaneously loses contact with both the backup and the arbiter, it goes into an indefinite trouble state and can no longer operate as primary. Typically, when this occurs, the backup takes over and becomes primary. When the primary’s connection to the backup is restored, the backup forces the primary down; alternatively, you can force the primary down yourself before restoring the connection.
However, if a network event (or series of network events) causes the failover members and arbiter to all lose contact with each other simultaneously (or nearly simultaneously), there can be no primary because the backup cannot take over and the primary is no longer operating as primary. This situation is shown as the final scenario in the illustration Mirror Responses to Lost Connections in Arbiter Mode in the section Automatic Failover Mechanics Detailed. A similar situation can occur when the primary becomes isolated and the backup cannot take over because of an error.
When these circumstances occur, you have the following options:
Caution:
If you force the primary to resume operation as primary without confirming the listed conditions, you run the risk of data loss or both failover members simultaneously acting as primary. If you are uncertain whether this procedure is appropriate, contact the InterSystems Worldwide Response Center (WRC) for assistance.
Unplanned Outage of Both Failover Members
When both failover members unexpectedly fail, due the same event or different events, the appropriate procedures depends on whether you can restart either or both of the failover members within the limits of your availability requirements. The longer the mirror can be out of operation, the more options you are likely to have.
Disaster Recovery Procedures
As described in Async Mirror Members, a disaster recovery (DR) async member maintains read-only copies of the mirrored databases, making it possible for the DR async to be promoted to failover member should the need arise. The procedure for promoting a DR async is described in Promoting a DR Async Member to Failover Member. This section discusses three scenarios in which you can use DR async promotion:
In the procedures in this section, Caché A is the original primary failover member, Caché B is the original backup, and Caché C is the DR async to be promoted.
Manual Failover to a Promoted DR Async During a Disaster
When the mirror is left without a functioning failover member, you can manually fail over to a promoted DR async. The following procedures covers scenarios under which this is an option:
Caution:
If you cannot confirm that the primary failover member Caché instance is really down, and there is a possibility that the instance will become available, do not manually fail over to another mirror member. If you do manually fail over and the original primary becomes available, both failover members will be simultaneously acting as primary.
Note:
When the primary Caché instance is in an indefinite trouble state due to isolation from both the backup and the arbiter in arbiter controlled mode, as described in Automatic Failover Mechanics Detailed, you cannot promote a DR async to failover member.
DR Promotion and Manual Failover with No Additional Journal Data
In a true disaster recovery scenario, in which the host systems of both failover members are down and their journal files are inaccessible, you can promote the DR async member to primary without obtaining the most recent journal data from the former primary. This is likely to result in some data loss. If the host systems of the failover members are accessible, use one of the procedures in DR Promotion and Manual Failover with Journal Data from Primary’s ISCAgent or DR Promotion and Manual Failover with Journal Data from Journal Files instead, as these allow the promoted DR async to obtain the most recent journal data before becoming primary, minimizing the risk of data loss.
Once you have promoted a DR async that is not participating in the mirror VIP to primary, you must make any needed changes to redirect users and applications to the new primary (see Redirecting Application Connections Following Failover or Disaster Recovery) before completing the procedures provided in this section.
Note:
A promoted DR async does not attempt to become primary unless all mirrored databases marked Mount Required at Startup (see Edit a Local Database’s Properties in the “Managing Caché” chapter of the Caché System Administration Guide) are mounted, activated, and caught up, and therefore ready for use on becoming primary.
Caution:
Promoting a DR async to primary without the most recent journal data from the former primary is likely to result in the loss of some global update operations, and the other mirror members may need to be rebuilt (as described in Rebuilding a Mirror Member). If you are uncertain whether this procedure is appropriate, contact the InterSystems Worldwide Response Center (WRC) for assistance.
To promote a DR async (Caché C) to primary without obtaining the most recent journal data, do the following.
  1. Promote Caché C to failover member without choosing a failover partner. Caché C becomes the primary without any additional journal data.
  2. When the host systems of the former failover members (Caché A and Caché B) become operational, at earliest opportunity and before restarting Caché, set ValidatedMember=0 in the [MirrorMember] section of the Caché Parameter File for the Caché instance on each member (see [MirrorMember] in the Caché Parameter File Reference). This instructs the Caché instance to obtain its new role in the mirror from the promoted DR async, rather than reconnecting in its previous role. The promotion instructions note that this change is required.
    Caution:
    Failure to set ValidatedMember=0 may result in two mirror members simultaneously acting as primary.
  3. Restart Caché on each former failover member.
    1. If the member joins the mirror as DR async when Caché restarts, no further steps are required. Any journal data that was on the failed member but not on the current primary has been discarded.
    2. If the member cannot join the mirror when Caché restarts, as indicated by the console log message referring to inconsistent data described in Rebuilding a Mirror Member, the most recent database changes on the member are later than the most recent journal data present on Caché C when it became primary. To resolve this, rebuild Caché A as described in that section.
  4. After Caché A and Caché B have rejoined the mirror, you can use the procedures described in Temporary Replacement of a Failover Member with a Promoted DR Async to return all of the members to their former roles. If either Caché A or Caché B restarted as backup, start with a graceful shutdown of Caché C when the backup is active to fail over to the backup; if Caché A and Caché B both restarted as DR async, promote one of them to backup and then perform the graceful shutdown on Caché C. Promote the other former failover member to backup, then restart Caché C as DR async.
DR Promotion and Manual Failover with Journal Data from Primary’s ISCAgent
If the host system of Caché A is running, but the Caché instance is not and cannot be restarted, you can use the following procedure to update the promoted Caché C with the most recent journal data from Caché A after promotion through Caché A’s ISCAgent.
  1. Promote Caché C, choosing the Caché A as failover partner. Caché C is promoted to failover member, obtains the most recent journal data from Caché A’s agent, and becomes primary.
  2. Restart the Caché instance on Caché A, which rejoins the mirror as backup.
  3. After Caché A has rejoined the mirror and become active, you can use the procedures described in Temporary Replacement of a Failover Member with a Promoted DR Async to return all of the members to their former roles, starting with a graceful shutdown of Caché C, followed by setting ValidatedMember=0 in the [MirrorMember] section of the Caché Parameter File for Caché B (see [MirrorMember] in the Caché Parameter File Reference), restarting Caché B as DR async, promoting Caché B to backup, and restarting Caché C as DR async.
Note:
If Caché A’s host system is down, but Caché B’s host system is up although its Caché instance is not running, run the ^MIRROR routine on Caché B as described in Manual Failover To An Active Backup to determine whether Caché B was an active backup at the time of failure. If so, use the preceding procedure but select Caché B as failover partner during promotion, allowing Caché C to obtain the most recent journal data from Caché B’s ISCAgent.
DR Promotion and Manual Failover with Journal Data from Journal Files
If the host systems of both Caché A and Caché B are down but you have access to Caché A’s journal files, or Caché B’s journal files and console log are available, you can update Caché C with the most recent journal data from the primary before promotion, using the following procedure.
  1. Update Caché C with the most recent journal files from Caché A or Caché B as follows:
  2. Promote Caché C to failover member without choosing a failover partner. Caché C becomes the primary.
  3. When the problems with Caché A and Caché B have been fixed, at earliest opportunity and before restarting Caché, set ValidatedMember=0 in the [MirrorMember] section of the Caché Parameter File for the Caché instance on each member (see [MirrorMember] in the Caché Parameter File Reference). The promotion instructions note that this change is required. Once you have done this, restart Caché on each member, beginning with Caché A (the member that was most recently the primary).
    1. If the member joins the mirror as backup or DR async when Caché restarts, no further steps are required. Any journal data that was on the failed member but not on the current primary has been discarded.
    2. If the member cannot join the mirror when the Caché instance restarts, as indicated by the console log message referring to inconsistent data described in Rebuilding a Mirror Member, the most recent database changes on the member are later than the most recent journal data present on Caché C when it became the primary. To resolve this, rebuild the member as described in that section.
  4. In most cases, the DR async system is not a suitable permanent host for the primary failover member. After Caché A and Caché B have rejoined the mirror, use the procedures described in Temporary Replacement of a Failover Member with a Promoted DR Async to return all of the members to their former roles. If either Caché A or Caché B restarted as backup, start with a graceful shutdown of Caché C when the backup is active to fail over to the backup; if Caché A or Caché B both restarted as DR async, promote one of them to backup and then perform the graceful shutdown on Caché C. Promote the other former failover member to backup, then restart Caché C as DR async.
Planned Failover to a Promoted DR Async
If you have included one or more DR asyncs in a mirror to provide disaster recovery capability, it is a good idea to regularly test this capability through a planned failover to each DR async. To perform this test, or when you want to fail over to a DR async for any other reason (such as a planned power outage in the data center containing the failover members), use the following procedure:
  1. Promote Caché C to failover member; because Caché A is available, you are not asked to choose a failover partner. Caché C becomes backup and Caché B (if it exists) is demoted to DR async.
    Note:
    If the mirror contains only one failover member to start with, the procedure is the same; you are not asked to choose a failover partner, and Caché C becomes backup, so that the mirror now has two failover members.
  2. When Caché C becomes active (see Backup Status and Automatic Failover), perform a graceful shutdown on Caché A. Automatic failover is triggered, allowing Caché C to take over as primary.
  3. After any testing you might want to perform on Caché C, restart Caché A, which automatically joins the mirror as backup.
  4. When Caché A becomes active, perform a graceful shutdown on Caché C to fail over to Caché A.
  5. Promote Caché B (if it exists) to failover member; it becomes backup.
  6. Restart the Caché instance on Caché C, which automatically joins the mirror in its original role as DR async.
A DR async that does not have network access to the mirror private addresses of the failover members, as described in Sample Mirroring Architecture and Network Configurations, can be promoted only to function as primary, and this should be done only when no other failover member is in operation. When this is the case, therefore, the preceding procedure is not appropriate. Instead, follow this procedure:
  1. Perform a graceful shutdown on Caché B, if it exists, so that only Caché A is functioning as failover member (primary).
  2. When Caché C is caught up (see Mirror Member Journal Transfer and Dejournaling Status), perform a graceful shutdown on Caché A.
  3. Promote Caché C to primary, as described in DR Promotion and Manual Failover with Journal Data from Primary’s ISCAgent. The new primary contacts former primary’s ISCAgent to confirm that it has the most recent journal data during this procedure.
  4. After any testing you might want to perform on Caché C, shut it down.
  5. Restart Caché A; it automatically becomes primary.
  6. Restart Caché B (if it exists); due to Caché C’s promotion, it joins as DR async.
  7. Promote Caché B to backup.
  8. Restart Caché C, which automatically joins the mirror in its original role as DR async.
Note:
In both of the procedures in this section, if Caché B does not exist, that is, the mirror consists of primary and asyncs only, Caché C when restarted becomes backup. Demote it to DR async as described in Maintenance of Backup Failover Member.
Temporary Replacement of a Failover Member with a Promoted DR Async
Some of the procedures described in Planned Outage Procedures and Unplanned Outage Procedures involve temporary operation of the mirror with only one failover member. While it is not necessary to maintain a running backup failover member at all times, it does protect you from interruptions to database access and potential data loss should a primary failure occur. For this reason, when only the primary is available due to planned or unplanned failover member outage, you can consider temporarily promoting a DR async member to backup failover member. Before doing so, however, consider the following:
Note:
Before using this option, review the discussion of failover partner selection and the requirement to set ValidatedMember=0 on former failover members whose agent cannot be contacted at the time of promotion in Promoting a DR Async Member to Failover Member.
If you need to perform planned maintenance on Caché B, the current backup failover member (see Maintenance of Backup Failover Member), you can do the following:
  1. Promote Caché C, a DR async that is caught up (see Mirror Member Journal Transfer and Dejournaling Status). Caché C automatically becomes backup, and Caché B is demoted to DR async.
  2. Shut down Caché B’s Caché instance or host system and complete the planned maintenance.
  3. Restart Caché B, which joins the mirror as DR async.
  4. When Caché B is caught up, promote it to failover member, returning it to its original role as backup. Caché C is automatically demoted to DR async, its original role.
If you need to perform planned maintenance on Caché A, the current primary failover member (see Maintenance of Primary Failover Member), you can do the following:
  1. When Caché B is active (see Mirror Synchronization), perform a graceful shutdown on Caché A. Automatic failover is triggered, allowing Caché B to take over as primary.
  2. Promote Caché C, a DR async that is caught up. Caché C automatically becomes backup.
  3. Complete the planned maintenance on Caché A, shutting down and restarting the host system if required.
  4. Restart the Caché instance on Caché A, which joins the mirror as DR async.
  5. When Caché A is caught up, promote it to failover member; it becomes backup, and Caché C is automatically demoted, returning it to its original role.
  6. When Caché A becomes active, perform a graceful shutdown on Caché B. Automatic failover is triggered, returning Caché A to its original role.
  7. Restart the Caché instance on Caché B, which joins the mirror in its original role.
If you have had an unplanned outage of Caché B, or automatically or manually failed over to Caché B due to an unplanned outage of Caché A (see Unplanned Outage Procedures), you can do the following:
  1. Promote Caché C, a DR async that is caught up. Caché C automatically becomes backup.
  2. Restart the failed failover member. If the failed member’s ISCAgent could not be contacted when the DR async was promoted, you must at earliest opportunity and before restarting Caché set ValidatedMember=0 in the [MirrorMember] section of the Caché Parameter File for the Caché instance (see [MirrorMember] in the Caché Parameter File Reference). The promotion instructions note that this change is required. When you restart the former failover member’s Caché instance, it joins the mirror as DR async.
  3. When the restarted failover member is caught up, promote it to failover member; it becomes backup, and Caché C is automatically demoted to DR async, its original role.
  4. If you want the failover members to exchange their current roles, when the backup becomes active perform a graceful shutdown on the current primary, triggering automatic failover. Restart the other failover member; it joins the mirror as backup.