docs.intersystems.com
Home  /  System Administration  /  Data Integrity Guide  /  Cluster Journaling


Data Integrity Guide
Cluster Journaling
[Back]  [Next] 
InterSystems: The power behind what matters   
Search:  


This chapter contains information about journaling on ECP-based shared-disk clustered systems in InterSystems IRIS™. It discusses the following topics:
For related information, see the following chapters:
Journaling on Clusters
Journaling is necessary for cluster failover to bring the databases up to date and to use transaction processing. Each node in a cluster maintains its own journal files, which must be accessible to all other nodes in the cluster to ensure recoverability. A cluster session ID (CSI) is the time when the session begins, that is, the cluster start time, which is stored in the header of every journal file on a clustered system.
In addition to the information journaled on a nonclustered system, the following specifics apply to clustered systems:
The journal files on clustered systems are organized using the following:
Important:
The default location of the journal files is in the install-dir\Mgr\journal directory of the InterSystems IRIS instance. However, InterSystems recommends placing journal files on separate storage devices from those on which database files and the CACHE.WIJ file are located. See Journaling Best Practices in the “Journaling” chapter of this guide for more information about separating journal storage.
Caution:
Do not stop journaling in a cluster environment, although it is possible to do so with the ^JOURNAL routine. If you do, the recovery procedure is vulnerable until the next backup.
Cluster Journal Log
Journal files used by members of a cluster are logged in a file, CACHEJRN.LOG, located in the cluster pre-image journal, or PIJ directory. It contains a list of journal files maintained by nodes while they are part of the cluster. Journal files maintained by a node when it is not part of the cluster may not appear in the cluster journal log; for more information, see Journal History Log in the “Journaling” chapter of this guide.
Following is an example of part of a cluster journal log:
0,_$1$DRA1:[TEST.50.MGR.JOURNAL]20030913.004
1,_$1$DKA0:[TEST.50.MGR.JOURNAL]20030916.002
0,_$1$DRA1:[TEST.50.MGR.JOURNAL]20030916.001
0,_$1$DRA1:[TEST.50.MGR.JOURNAL]20030916.002
1,_$1$DRA1:[TEST.5A.MGR.JOURNAL]20030916.002
1,_$1$DRA1:[TEST.5A.MGR.JOURNAL]20030916.003
The first value in each comma-delimited row is the cluster system number (CSN) of the node to which the journal file, the second field, belongs. The log is useful for locating journal files of all the members of a cluster, especially the members that have left the cluster. The CSN of a node may change when it restarts.
When a node joins the cluster its current journal file is added to the journal log. Processes that start journaling or switch journal files also add entries. The log is used in a cluster journal restore and by the journal dump utility.
Cluster Journal Sequence Numbers
InterSystems recommends locking globals in cluster-mounted databases if you require a record of the sequence of updates. This is the tool InterSystems IRIS uses to record the time-sequencing of updates in the journal files of the cluster nodes. If cluster failover occurs, the journals of all the nodes can be applied in the proper order. Updates may not be dejournaled in the same order as they originally occurred, but they are valid with respect to the synchronization guaranteed by the Lock command and $Increment function.
To restore clustered databases properly from the journal files, the updates must be applied in the order they occurred. The cluster journal sequence number, which is part of every journal entry of a database update, and the cluster session ID, which is part of the journal header, provide a way to sequence transactions from the journal files of all the cluster members. For each cluster session, the sequence number starts at 1 and can go as high as 18446744073709551619 (that is, 2**64-1).
The master node of a cluster maintains a master copy of the sequence number, which is incremented during database mounting, and with each use of Lock and $Increment. The master value of the sequence propagates to one or all cluster nodes, depending on the type of operation.
The sequence number is used by cluster journal restore, which operates on the assumption that the sequence number increases monotonically during a cluster session.
At the end of a backup, the cluster journal sequence number on all cluster nodes is incremented to be higher than the previous master cluster journal sequence number. Then a journal marker bearing the new cluster journal sequence number is placed in the current local journal file. In a sense, the journal marker serves as a barrier of cluster journal sequence numbers, separating the journaled database updates that are covered in the backup from those that are not. Following the restore of the backup, cluster journal restore can start from the cluster journal sequence number of the journal marker and move forward.
You can also set your own journal markers using the ^JRNMARK utility. See Setting Journal Markers on a Clustered System for details.
Cluster Failover
The InterSystems IRIS cluster failover process protects the integrity of data on other cluster nodes when one cluster member fails. It allows the remaining cluster members to continue to function. The following conditions must be met for cluster failover to work successfully:
If a cluster member fails, the cluster master executes cluster failover. If the master is the failing node, the cluster member that least recently joined the cluster becomes the new master and executes failover. Cluster failover consists of two phases.
In the first phase, the cluster master does the following:
If an error occurs during this phase, the cluster crashes and further Cluster Recovery must take place.
In the second phase, the cluster master does the following:
During failover, the journals from all cluster members are applied to the database and any incomplete transactions are rolled back. If cluster failover completes successfully, there is no database degradation or data loss from the surviving nodes. There is only minimal data loss (typically less than the last second) not visible to other cluster members from the failing node.
If failover is unsuccessful, the cluster crashes and you must shut down all cluster nodes before restarting the cluster. See the Failover Error Conditions section for more details.
Cluster Recovery
Recovery occurs when InterSystems IRIS stops on any cluster member. The procedure changes depending on how InterSystems IRIS stops. During successful failover, the recovery procedure is fairly straightforward and automatic. If, however, a clustered system crashes, the recovery is more complex.
Following a cluster crash, a clustered InterSystems IRIS node cannot be restarted until all of the nodes in the cluster have been stopped. If a cluster member attempts InterSystems IRIS startup or tries to join the cluster by cluster-mounting a disk before all other cluster members have been stopped, the following message is displayed:
ENQ daemon failed to start because cluster is crashed.
Once all members are stopped, start each node. The first node that starts runs the InterSystems recovery procedure if it detects there was an abnormal system shutdown. This node becomes the new cluster master. Cluster members that are not the cluster master are frequently referred to as slave nodes.
The Recovery daemon (RCVRYDMN) performs recovery on the surviving or new master node based on whether the crashed node was the master or a slave. To enable recovery, the node’s databases and WIJ files must be accessible cluster-wide. The master is responsible for managing the recovery cluster-wide based on the WIJ file on each node.
When a slave crashes, the Recovery daemon on the master node does the following:
  1. It uses the journal information provided by the WIJ files to apply journal files on all cluster nodes (including the one that crashed) from a common starting point, as with all cluster journal restores.
  2. It rolls back all incomplete transactions on the crashed system. Again, the rollbacks are journaled, this time on the host system of the Recovery daemon. For this reason, if you restore journal files, it is safer to do a cluster journal restore than a stand-alone journal restore, as the rollback of an incomplete transaction in one node's journal may be journaled on another node.
When the (former) master crashes, the Recovery daemon does the following:
  1. As in step 1 above, it applies the journal files on all the cluster nodes.
  2. In between the two steps above, it adjusts the cluster journal sequence number on its host system, which is the new master, so that it is higher than that of the last journaled entry on the crashed system, which was the old master. This guarantees the monotonic increasing property of cluster journal sequence in cluster-wide journal files.
  3. As above, all incomplete transactions are rolled back on the crashed system.
    If the last remaining node of a cluster crashes, restarting the first node of the cluster involves cluster journal recovery, which includes rolling back any transactions that were open (uncommitted) at the time of crash.
Cluster Restore
Typically, journal files are applied after backups are restored to bring databases up to date or up to the point of a crash. If nodes have not left or joined a cluster since the last backup, you can restore the journal files starting from the marker corresponding to the backup. If one or more nodes have joined the cluster since the last backup, the restore is more complex.
A node joins a cluster either when it restarts with a proper configuration or cluster mounts a database after startup (as long as you properly set up other parameters, such as the PIJ directory, at startup). To make journal restore easier in the latter case, switch the journal file on the node as soon as it joins the cluster.
For each node that has joined the cluster since the last backup of the cluster:
  1. Restore the latest backup of the node.
  2. If the backup occurred before the node joined the cluster, restore the private journal files from where the backup ends, up to the point when it joined the cluster. (You can make this easier by switching the journal file when the node joins the cluster.)
  3. Restore the latest cluster backup.
  4. Restore the cluster journal files starting from where the backup ends.
See Cluster Journal Restore for detailed information about running the utility.
This procedure works well for restoring databases that were privately mounted on nodes before they joined the cluster and then are cluster-mounted after that node joined the cluster. It is based on the following assumptions:
In more complicated scenarios, these assumptions may not be true. The first assumption becomes false if, say, rather than centralizing backups of all cluster-mounted databases on one node, you configure each node to back up selected cluster-mounted databases along with its private databases.
In this case, you may have to take a decentralized approach by restoring one database at a time. For each database, the restore procedure is essentially the same:
  1. Restore the latest backup that covers the database.
  2. Restore the private journal files up to the point when the node joined the cluster, if it postdates the backup.
  3. Restore the cluster journal files from that point forward.
Caution:
Even if the database has always been privately mounted on the same node, it is safer to restore the cluster journal files than to apply only the journal files of that node. If the node crashed or was shut down when it was part of the cluster, open transactions on the node would have been rolled back by and journaled on a surviving node of the cluster. Restoring the cluster journal files ensures that you do not miss such rollbacks in journal files of other nodes.
InterSystems does not recommend or support the scenario where a node joins and leaves the cluster multiple times.
Failover Error Conditions
When a cluster member fails, the other cluster members notice a short pause while failover occurs. In rare situations, some processes on surviving cluster nodes may receive <CLUSTERFAILED> errors. You can trap these errors with the $ZTRAP error-trapping mechanism.
Cluster failover does not work if one of the following is true:
If failover is unsuccessful, the cluster crashes and the following message appears at the operator's console:
****** InterSystems IRIS : CLUSTER CRASH - ALL InterSystems IRIS SYSTEMS ARE SUSPENDED ******
The other cluster members freeze when InterSystems IRIS processes reach a Set or Kill command.
Examine the failover log, which is contained in the console log (normally cconsole.log in the manager’s directory) of the cluster master, to see the error messages generated during the failover process.
If a cluster member that failed attempts startup, or a node tries to join the cluster by cluster-mounting a database while the cluster is in failover, the following message is displayed:
The cluster appears to be attempting to recover from the failure of one or more 
members at this time. Waiting 45 seconds for failover to complete...
A period (.) appears every five seconds until the active recovery phase completes. The mount or startup then proceeds.
If the cluster crashes during this time, the following message is displayed:
ENQ daemon failed to start because cluster is crashed.
See the Cluster Recovery section for an explanation of what happens when a cluster crashes.
Tools and Utilities
The following tools and utilities are helpful in cluster journaling processes:
Cluster Journal Restore
The cluster journal restore procedure allows you to start or end a restore using the journal markers placed in the journal files by an InterSystems IRIS backup. You can run a cluster journal restore either as part of a backup restore or as a stand-alone procedure.
InterSystems IRIS includes an entry point to the journal restore interface for performing specific cluster journal restore operations. From the %SYS namespace, run the following:
 Do CLUMENU^JRNRESTO
This invokes a menu that includes the following options:
Perform a Cluster Journal Restore
The first option of the cluster journal restore menu allows you to run the general journal restore on a clustered or nonclustered system. It is the equivalent of running ^JRNRESTO and answering Yes to the Cluster journal restore? prompt.
%SYS>Do ^JRNRESTO

This utility uses the contents of journal files
to bring globals up to date from a backup.
Replication is not enabled.

Restore the Journal? Yes => Yes

Cluster journal restore? Yes
You are asked to describe the databases to be restored from the journal and the starting and ending points of the restore. The starting and ending points can be based on a backup, on a set of journal markers, at a cluster start, or any arbitrary point in the journal.
The interface prompts for directory information, including any redirection specifics, and whether all databases and globals are to be processed. For example:
Directory: _$1$DKB300:[TEST.CLU.5X]
Redirect to Directory: _$1$DKB300:[TEST.CLU.5X] => _$1$DKB300:[TEST.CLU.5X]
                                              --> _$1$DKB300:[TEST.CLU.5X]
Restore all globals in _$1$DKB300:[TEST.CLU.5X]? Yes => Yes

Directory:
For each directory you enter you are asked if you want to redirect. Enter the name of the directory to which to restore the dejournaled globals. If this is the same directory, enter the period (.) character or press Enter.
Also specify for each directory whether you want to restore all journaled globals. Enter Yes or press Enter to apply all global changes to the database and continue with the next directory. Otherwise, enter No to restore only selected globals.
At the Global^ prompt, enter the name of the specific globals you want to restore from the journal. You may select patterns of globals by using the asterisk (*) to match any number of characters and the question mark (?) to match any single character. Enter ?L to list the currently selected list of globals.
When you have entered all your selected globals, press Enter at the Global^ prompt and enter the next directory. When you have entered all directories, press Enter at the Directory prompt. Your restore specifications are displayed as shown in this example:
Restoring globals from the following clustered datasets:
 1. _$1$DKB300:[TEST.CLU.5X]   All Globals

Specifications for Journal Restore Correct? Yes => Yes
Updates will not be replicated
Verify the information you entered before continuing with the cluster journal restore. Answer Yes or press Enter if the settings are correct; answer No to repeat the process of entering directories and globals.
Once you verify the directory and global specifications, the Main Settings menu of the cluster journal restore setup process is displayed with the current default settings, as shown in the following example:
Cluster Journal Restore - Setup - Main Settings
1. To LOCATE journal files using cluster journal log
      _$1$DKB400:[TEST.5X]CACHEJRN.LOG
   with NO redirections of journal files
2. To START restore at the beginning of cluster session <20030319 15:35:37>
3. To STOP restore at sequence #319 of cluster session <20030319 15:35:37>
      134388,_$1$DRA2:[TEST.5Y.JOURNAL]20030320.005
4. To SWITCH journal file before journal restore
5. To DISABLE journaling the dejournaled transactions

Select an item to modify ('Q' to quit or ENTER to accept and continue):
From this menu you may choose to modify any of the default values of the five settings by entering its menu item number:
After each modification, the Main Settings menu is displayed again, and you are asked to verify the information you entered before the restore begins. The following is an example of how the menu may look after several changes:
Cluster Journal Restore - Setup - Main Settings
1. To LOCATE journal files using cluster journal log
      _$1$DKB400:[TEST.5Y.MGR]CACHEJRN.TXT
   with redirections of journal files
      _$1$DKB400:[TEST.5X.JOURNAL] -> _$1$DRA2:[TEST.5X.JOURNAL]
      _$1$DKB400:[TEST.5Y.JOURNAL] -> _$1$DRA2:[TEST.5Y.JOURNAL]
      _$1$DKB400:[TEST.5Z.JOURNAL] -> _$1$DRA2:[TEST.5Z.JOURNAL]
2. To START restore at the journal marker located at
      138316,_$1$DKB400:[TEST.5X.JOURNAL]20030401.001
          -> _$1$DRA2:[TEST.5X.JOURNAL]20030401.001
3. To STOP restore at the journal marker located at
      133232,_$1$DKB400:[TEST.5X.JOURNAL]20030401.003
          -> _$1$DRA2:[TEST.5X.JOURNAL]20030401.003
4. NOT to SWITCH journal file before journal restore
5. To DISABLE journaling the dejournaled transactions

Select an item to modify ('Q' to quit or ENTER to accept and continue):
Start journal restore? 
Press Enter to accept the settings and continue. If you are using the journal log of the current cluster, you are informed that the restore will stop at the currently marked journal location and asked if you want to start the restore.
Select an item to modify ('Q' to quit or ENTER to accept and continue):

To stop restore at currently marked journal location
  offset 134168 of _$1$DRA1:[TEST.50.MGR.JOURNAL]20031002.008

Start journal restore?
Enter Yes to begin the cluster journal restore. Once the restore finishes your system is ready for activity.
Enter No to go back to the main menu where you can continue to make changes to the cluster journal restore setup or enter Q to abort the cluster journal restore. After aborting the cluster journal restore, you can run a private journal restore or abort the restore process entirely.
Select an item to modify ('Q' to quit or ENTER to accept and continue): Q
Run private journal restore instead? No

[Journal restore aborted]

Replication Enabled
Change the Source of the Restore
The first item on the Main Settings menu contains the information required to find the journal files for all the cluster members. The information has two elements:
By default, the restore uses the cluster journal log associated with the current clustered system. If you are running the restore on a nonclustered system, you are prompted for a cluster journal log before the main menu is displayed.
Choose this option to restore the journal files on a different InterSystems IRIS cluster from the one that owns the journal files. You can either:
Note:
The option to redirect the journal files is available only if the specified cluster journal log is not that of the current cluster.
Identify the Cluster Journal Log
The Journal File Information menu displays the cluster journal log file to be used in the cluster journal restore. If the journal files on the original cluster are not accessible to the current cluster, copy them to a location accessible to the current cluster and specify how to locate them by entering redirect information.
Enter I to identify the journal log used by the original cluster.
Select an item to modify ('Q' to quit or ENTER to accept and continue): 1

Cluster Journal Restore - Setup - Journal File Info
[I]dentify an existing cluster journal log to use for the restore
   Current: _$1$DRA1:[TEST.50]CACHEJRN.LOG
  - OR -
[C]reate a cluster journal log by specifying where journal files are


Selection ( if no change): I

                              *** WARNING ***
        If you specify a cluster journal log different from current one, you
        may need to reenter info on journal redirection, restore range, etc.

Enter the name of the cluster journal log ( if no change)
=> cachejrn.txt

Cluster Journal Restore - Setup - Journal File Info
[I]dentify an existing cluster journal log to use for the restore
   Current: _$1$DRA1:[TEST.50.MGR]CACHEJRN.TXT
[R]edirect journal files in _$1$DRA1:[TEST.50.MGR]CACHEJRN.TXT
  - OR -
[C]reate a cluster journal log by specifying where journal files are

You must redirect journal files if the journal files being restored are not in their original locations, as specified in the cluster journal log. To redirect the journal files listed in the cluster journal log, provide the original and current locations when prompted. You may give a full or partial directory name as an original location. All original locations with leading characters that match the partial name are replaced with the new location. An example of redirecting files follows:
Selection ( if no change): R
Journal directories in _$1$DRA1:[TEST.50.MGR]CACHEJRN.TXT
      _$1$DRA1:[TEST.50.MGR.JOURNAL]
      _$1$DRA1:[TEST.5A.MGR.JOURNAL]
      _$1$DRA1:[TEST.5B.MGR.JOURNAL]
Enter the original and current locations of journal files (? for help)
Journal files originally from: _$1$DRA1:
are currently located in: _$1$DRA2:
    _$1$DRA1:[TEST.50.MGR.JOURNAL] -> _$1$DRA2:[TEST.50.MGR.JOURNAL]
    _$1$DRA1:[TEST.5A.MGR.JOURNAL] -> _$1$DRA2:[TEST.5A.MGR.JOURNAL]
    _$1$DRA1:[TEST.5B.MGR.JOURNAL] -> _$1$DRA2:[TEST.5B.MGR.JOURNAL]
Journal files originally from:

Cluster Journal Restore - Setup - Journal File Info
[I]dentify an existing cluster journal log to use for the restore
   Current: _$1$DRA1:[TEST.50.MGR]CACHEJRN.TXT
[R]edirect journal files in _$1$DRA1:[TEST.50.MGR]CACHEJRN.TXT
  - OR -
[C]reate a cluster journal log by specifying where journal files are

Selection ( if no change):

This example shows the choice of an alternative cluster journal log, CACHEJRN.TXT, which contains a list of journal files originally located on _$1$DRA1:. These files are redirected to be retrieved from their new location, _$1$DRA2:, during the restore.
When you have finished entering the redirection information, press Enter to return to the Main Settings menu.
Journal redirection assumes a one-to-one or many-to-one relationship between source and target directory locations. That is, journal files from one or multiple original directories may be located in one new location, but not in multiple new locations. To restore from journal files that are in multiple new locations, create a cluster journal log that specifies where to locate the journal files.
Create a Cluster Journal Log
If the journal files on the original cluster are not accessible to the current cluster, create a cluster journal log that specifies the locations of the journal files. The files in the specified locations must all be part of the cluster. Copy them to a location accessible to the current cluster and specify how to locate them by entering redirect information.
Selection ( if no change): C

                              *** WARNING ***
        If you specify a cluster journal log different from current one, you
        may need to reenter info on journal redirection, restore range, etc.

Enter the name of the cluster journal log to create (ENTER if none) => 
cachejrn.txt
How many cluster members were involved? (Q to quit) => 3
For each cluster member, enter the location(s) and name prefix (if any) of the 
journal files to restore --
Cluster member #0                       Journal File Name Prefix:
  Directory: _$1$DRA1:[TEST.50.MGR.JOURNAL]
  Directory:
Cluster member #1                       Journal File Name Prefix:
  Directory: _$1$DRA1:[TEST.5A.MGR.JOURNAL]
  Directory:
Cluster member #2                       Journal File Name Prefix:
  Directory: _$1$DRA1:[TEST.5B.MGR.JOURNAL]
  Directory:
This example shows the creation of a cluster journal log, CACHEJRN.TXT, for a cluster with three members whose journal files were originally located on _$1$DRA1:.
The next menu contains the additional option to redirect the journal files in the cluster journal log you created:
Cluster Journal Restore - Setup - Journal File Info
[I]dentify an existing cluster journal log to use for the restore
   Current: _$1$DRA1:[TEST.50.MGR]CACHEJRN.TXT
[R]edirect journal files in _$1$DRA1:[TEST.50.MGR]CACHEJRN.TXT
  - OR -
[C]reate a cluster journal log by specifying where journal files are

Selection ( if no change):

Enter R to redirect the files as described in Identify the Cluster Journal Log. When finished entering redirect information, press Enter to return to the Main Settings menu.
Change the Starting Point of the Restore
The second and third items on the Main Settings menu specify the range of restore—where in the journal files to begin restoring and where to stop. The starting point information contains the starting journal file and sequence number for each cluster member. The default for where to begin is determined in the following order:
Cluster Journal Restore - Setup - Main Settings
1. To LOCATE journal files using cluster journal log
      _$1$DRA1:[TEST.50.MGR]CACHEJRN.TXT
   with redirections of journal files
      _$1$DRA1:[TEST.50.MGR.JOURNAL] -> _$1$DRA2:[TEST.50.MGR.JOURNAL]
      _$1$DRA1:[TEST.5A.MGR.JOURNAL] -> _$1$DRA2:[TEST.5A.MGR.JOURNAL]
      _$1$DRA1:[TEST.5B.MGR.JOURNAL] -> _$1$DRA2:[TEST.5B.MGR.JOURNAL]
2. To START restore at the end of last restored backup 
   _$1$DRA1:[TEST.50.MGR]CLUFULL.BCK
      134120,_$1$DRA1:[TEST.50.MGR.JOURNAL]20031002.008
          -> _$1$DRA1:[TEST.50.MGR.JOURNAL]20031002.008
3. To STOP restore at the end of the cluster journal log
4. To SWITCH journal file before journal restore
5. To DISABLE journaling the dejournaled transactions

Select an item to modify ('Q' to quit or ENTER to accept and continue): 2

Cluster Journal Restore - Setup - Where to Start Restore
1. At the beginning of a cluster session
2. At a specific journal marker
3. Following the restore of backup _$1$DRA1:[TEST.50.MGR]CLUFULL.BCK  (*)
   i.e., at the journal marker located at
      134120,_$1$DRA1:[TEST.50.MGR.JOURNAL]20031002.008

Selection ( if no change): 1

To start journal restore at the beginning of cluster session ...
1. 20030904 09:47:01
2. 20031002 13:19:12
3. 20031002 13:26:40
4. 20031002 13:29:10
5. 20031002 13:51:31
6. 20031002 13:58:57
7. 20031002 14:29:42
8. 20031002 14:33:55
9. 20031002 14:35:48

=> 5

Cluster Journal Restore - Setup - Main Settings
1. To LOCATE journal files using cluster journal log
      _$1$DRA1:[TEST.50.MGR]CACHEJRN.TXT
   with redirections of journal files
      _$1$DRA1:[TEST.50.MGR.JOURNAL] -> _$1$DRA2:[TEST.50.MGR.JOURNAL]
      _$1$DRA1:[TEST.5A.MGR.JOURNAL] -> _$1$DRA2:[TEST.5A.MGR.JOURNAL]
      _$1$DRA1:[TEST.5B.MGR.JOURNAL] -> _$1$DRA2:[TEST.5B.MGR.JOURNAL]
2. To START restore at the beginning of cluster session <20031002 13:51:31>
3. To STOP restore at the end of the cluster journal log
4. To SWITCH journal file before journal restore
5. To DISABLE journaling the dejournaled transactions

Select an item to modify ('Q' to quit or ENTER to accept and continue): 2

Cluster Journal Restore - Setup - Where to Start Restore
1. At the beginning of a cluster session (*): <20031002 13:51:31>
2. At a specific journal marker
3. Following the restore of backup _$1$DRA1:[TEST.50.MGR]CLUFULL.BCK

Selection ( if no change): 2

To start restore at a journal marker location (in original form)
journal file: _$1$DRA1:[TEST.50.MGR.JOURNAL]20031002.008
offset: 134120

You have chosen to start journal restore at
      134120,_$1$DRA1:[TEST.50.MGR.JOURNAL]20031002.008
the journal location by the end of backup _$1$DRA1:[TEST.50.MGR]CLUFULL.BCK

The submenu varies slightly based on the current settings. For example, if no backup restore was performed, the submenu for specifying the beginning of the restore does not list option 3 to restore from the end of last backup. In a submenu, the option that is currently chosen is marked with an asterisk (*).
Change the Ending Point of the Restore
By default, the restore ends at either the current journal location, if the current system is associated with the selected cluster journal log, or the end of the journal log. The submenu for option 3 is similar to that for option 2:
Cluster Journal Restore - Setup - Main Settings
1. To LOCATE journal files using cluster journal log
      _$1$DRA1:[TEST.50.MGR]CACHEJRN.TXT
   with redirections of journal files
      _$1$DRA1:[TEST.50.MGR.JOURNAL] -> _$1$DRA2:[TEST.50.MGR.JOURNAL]
      _$1$DRA1:[TEST.5A.MGR.JOURNAL] -> _$1$DRA2:[TEST.5A.MGR.JOURNAL]
      _$1$DRA1:[TEST.5B.MGR.JOURNAL] -> _$1$DRA2:[TEST.5B.MGR.JOURNAL]
2. To START restore at the end of last restored backup 
   _$1$DRA1:[TEST.50.MGR]CLUFULL.BCK
      134120,_$1$DRA1:[TEST.50.MGR.JOURNAL]20031002.008
          -> _$1$DRA1:[TEST.50.MGR.JOURNAL]20031002.008
3. To STOP restore at the end of the cluster journal log
4. To SWITCH journal file before journal restore
5. To DISABLE journaling the dejournaled transactions

Select an item to modify ('Q' to quit or ENTER to accept and continue): 3

Cluster Journal Restore - Setup - Where to Stop Restore
1. At the end of a cluster session
2. At the end of _$1$DRA1:[TEST.50.MGR]CACHEJRN.TXT
3. At a specific journal marker
This is the menu you would see if the journal log is the one for the current cluster:
Select an item to modify ('Q' to quit or ENTER to accept and continue): 3

Cluster Journal Restore - Setup - Where to Stop Restore
1. At the end of a cluster session
2. At current journal location (*)
3. At a specific journal marker
The submenu varies slightly based on the current settings. For example, depending whether or not the journal log is the one for current cluster, option 2 in the menu for specifying the end of the restore would be either the current journal location or the end of the journal log. In a submenu, the option that is currently chosen is marked with an asterisk (*).
Toggle the Switching Journal File Setting
The fourth menu item specifies whether to switch the journal file before the restore. If you select this item number, the value is toggled between the values To SWITCH and NOT to SWITCH the journal file; the menu is displayed again with the new setting:
Select an item to modify ('Q' to quit or ENTER to accept and continue): 4

Cluster Journal Restore - Setup - Main Settings
1. To LOCATE journal files using cluster journal log
      _$1$DRA1:[TEST.50.MGR]CACHEJRN.TXT
   with redirections of journal files
      _$1$DRA1:[TEST.50.MGR.JOURNAL] -> _$1$DRA2:[TEST.50.MGR.JOURNAL]
      _$1$DRA1:[TEST.5A.MGR.JOURNAL] -> _$1$DRA2:[TEST.5A.MGR.JOURNAL]
      _$1$DRA1:[TEST.5B.MGR.JOURNAL] -> _$1$DRA2:[TEST.5B.MGR.JOURNAL]
2. To START restore at the end of last restored backup 
   _$1$DRA1:[TEST.50.MGR]CLUFULL.BCK
      134120,_$1$DRA1:[TEST.50.MGR.JOURNAL]20031002.008
          -> _$1$DRA1:[TEST.50.MGR.JOURNAL]20031002.008
3. To STOP restore at the end of the cluster journal log
4. NOT to SWITCH journal file before journal restore
5. To DISABLE journaling the dejournaled transactions

The default is to switch the journal file before the restore. This provides a clean start so that updates that occur after the restore are in new journal files.
Toggle the Disable Journaling Setting
The fifth menu item specifies whether to disable journaling of the dejournaled transactions during the restore. If you select this item, the value is toggled between the values DISABLE and NOT to DISABLE journaling the dejournaled transactions; the menu is redisplayed with the new setting.
Select an item to modify ('Q' to quit or ENTER to accept and continue): 5

Cluster Journal Restore - Setup - Main Settings
1. To LOCATE journal files using cluster journal log
      _$1$DRA1:[TEST.50.MGR]CACHEJRN.TXT
   with redirections of journal files
      _$1$DRA1:[TEST.50.MGR.JOURNAL] -> _$1$DRA2:[TEST.50.MGR.JOURNAL]
      _$1$DRA1:[TEST.5A.MGR.JOURNAL] -> _$1$DRA2:[TEST.5A.MGR.JOURNAL]
      _$1$DRA1:[TEST.5B.MGR.JOURNAL] -> _$1$DRA2:[TEST.5B.MGR.JOURNAL]
2. To START restore at the end of last restored backup 
   _$1$DRA1:[TEST.50.MGR]CLUFULL.BCK
      134120,_$1$DRA1:[TEST.50.MGR.JOURNAL]20031002.008
          -> _$1$DRA1:[TEST.50.MGR.JOURNAL]20031002.008
3. To STOP restore at the end of the cluster journal log
4. NOT to SWITCH journal file before journal restore
5. NOT to DISABLE journaling the dejournaled transactions
For better performance, the default setting is to disable journaling the dejournaled transactions.
Note:
If you choose not to disable journaling, the dejournaled transactions are journaled only if they otherwise meet the normal criteria for being journaled.
Generate a Common Journal File
The user interface for this option is similar to the first with additional questions about the contents and format of the output file. However, instead of restoring the journal files, this option produces a common-format journal file that can be read by the ^%JREAD utility on an InterSystems IRIS system that does not support cluster journal restores or on another platform such as DSM.
^JCONVERT provides the same functionality if you answer Yes to the Cluster Journal Convert? question.
The second option produces a single common-format output file from the cluster journal files. It calls the ^JCONVERT utility, which takes a journal file from a single system and writes it out in a common format to be read by the %JREAD routine. This is useful for restoring journal files across versions of InterSystems IRIS where the journal files are not compatible (for example, as part of an “almost rolling” upgrade) or as part of failing back to an earlier release. You can also use this option to write the journal file in a format that can be loaded into another platform such as DSM.
 
Cluster Journal Restore Menu
--------------------------------------------------------------
1) Cluster journal restore
2) Generate common journal file from specific journal files
3) Cluster journal restore after backup restore
4) Cluster journal restore corresponding to InterSystems IRIS backups
--------------------------------------------------------------
H) Display Help
E) Exit this utility
--------------------------------------------------------------
 
Enter choice (1-4) or [E]xit/[H]elp? 2
 
Perform a Cluster Journal Restore after a Backup Restore
Option three restores the journal files after an InterSystems IRIS backup has been restored. This is similar to the restore performed by the incremental backup restore routine, ^DBREST, after a cluster backup restore when there is no way to run it independently of restoring a backup. (To restart the journal restore, for example.) One difference between this option and restoring using ^DBREST is that this option does not start with the list of databases contained in the backup; you must enter the database list.
The routine offers to include all currently cluster-mounted databases in the restore, but if it is being run after restoring a backup, the databases restored by the backup are then privately mounted unless you change the mount state. (The restore mounts them privately and leaves them privately mounted when it is finished.) It starts with the markers recorded in the journal files by the backup and ends with the end of the journal data.
Perform a Cluster Journal Restore Based on InterSystems IRIS Backups
The fourth menu option restores the journal files using journal markers that were added by an InterSystems IRIS backup to specify the starting point and, optionally, the end point. It is similar to option three except that it uses backups which have been performed to designate where to start rather than backups which have been restored. Functionally they are the same; both options use a marker which has been placed into the journal file by an InterSystems IRIS backup as the starting point. The difference is in the list of choices of where to start.
Journal Dump Utility
On an InterSystems IRIS clustered system, the ^JRNDUMP routine displays the cluster session ID (cluster startup time) of a journal file instead of the word JRNSTART. The ^JRNDUMP routine displays a list of records in a journal file, showing the cluster session ID along with the journal file sizes.
The utility lists journal files maintained by the local system as well as journal files maintained by other systems of the InterSystems IRIS cluster, in the order of cluster startup time (cluster session ID) and the first and last cluster journal sequence numbers of the journal files. Journal files created by ^JRNSTART are marked with an asterisk (*). Journal files that are no longer available (purged, for example) are marked with D (for deleted). Journal file names are displayed with indentions that correspond to their CSN, that is: no indention for journal files from system 0, one space for system 1, two spaces for system 2, etc.
Sample output from the cluster version of ^JRNDUMP follows. By default, The level-1 display on a clustered system is quite different from the nonclustered one:
 FirstSeq   LastSeq   Journal Files

 Session 20030820 11:02:43
    0          0 D /bench/test/cache/50a/mgr/journal/20030820.003
    0          0   /bench/test/cache/50b/mgr/journal/20030820.004
 Session 20030822 10:55:46
    3          3   /bench/test/cache/50b/mgr/journal/20030822.001

(N)ext,(P)rev,(G)oto,(E)xamine,(Q)uit =>

Besides a list of journal files from every cluster node, (even the dead ones), there are cluster session IDs and the first and the last cluster journal sequence numbers of each journal file. A cluster session ID (the date-time string following Session) is the time the first node of the cluster starts. A cluster session ends when the last node of the cluster shuts down. Files from different nodes are shown with different indention: no indentation for the node with CSN 0, one space for the node with CSN 1, and so on. The CSN of a node uniquely identifies the node within the cluster at a given time. The files labeled D have most likely been deleted from their host systems.
The previous version of ^JRNDUMP for clusters is available as OLD^JRNDUMP, if you prefer that output.
Startup Recovery Routine
The following is the help display of the startup recovery routine, ^STURECOV:
%SYS>Do ^STURECOV
 
Enter error type (? for list) [^] => ?
 
Supported error types are:
     JRN - Journal restore and transaction rollback
     CLUJRN - Cluster journal restore and transaction rollback
 
Enter error type (? for list) [^] => CLUJRN
 
 
Cluster journal recovery options
--------------------------------------------------------------
1) Display the list of errors from startup
2) Run the journal restore again
4) Dismount a database
5) Mount a database
6) Database Repair Utility
7) Check Database Integrity
--------------------------------------------------------------
H) Display Help
E) Exit this utility
--------------------------------------------------------------
 
Enter choice (1-8) or [E]xit/[H]elp? H
--------------------------------------------------------------
Before running ^STURECOV you should have corrected the
errors that prevented the journal restore or transaction rollback
from completing. Here you have several options regarding what
to do next.
 
Option 1: The journal restore and transaction rollback procedure
     tries to save the list of errors in ^%SYS(). This is not always
     possible depending on what is wrong with the system. If this
     information is available, this option displays the errors.
Option 2: This option performs the same journal restore and
     transaction rollback which was performed when the system was
     started. The amount of data is small so it should not be
     necessary to try and restart from where the error occurred.
Option 3 is not enabled for cluster recovery
Option 4: This lets you dismount a database. Generally this
     would be used if you want to let users back on a system but
     you want to prevent them from accessing a database which still
     has problems (^DISMOUNT utility).
Option 5: This lets you mount a database (^MOUNT utility).
Option 6: This lets you edit the database structure (^REPAIR utility).
Option 7: This lets you validate the database structure (^INTEGRIT utility).
Option 8 is not enabled for cluster recovery. Shut the system
  down using the bypass option with ccontrol stop and then start it
  with ccontrol start. During startup answer YES when asked if you
  want to continue after it displays the message related to errors
  during recovery.
 
Press <enter> continue
 
Cluster journal recovery options
--------------------------------------------------------------
1) Display the list of errors from startup
2) Run the journal restore again
4) Dismount a database
5) Mount a database
6) Database Repair Utility
7) Check Database Integrity
--------------------------------------------------------------
H) Display Help
E) Exit this utility
--------------------------------------------------------------
 
Enter choice (1-8) or [E]xit/[H]elp?
Setting Journal Markers on a Clustered System
To set a journal marker effective cluster-wide, use the following routine
$$CLUSET^JRNMARK(id,text,swset)
Where.... is...
id Marker ID (for example, -1 for backup)
text Marker text (for example, “timestamp” for backup)
swset 1 — if the switch that inhibits database reads and writes (switch 10) has been set cluster-wide (and locally) by the caller. The caller is responsible for clearing it afterwards.
0 — if the switch has not been set. The routine takes care of setting and clearing the switch properly
Note that switch 10 must be set locally and cluster-wide to ensure the integrity of the journal marker. If successful, the routine returns the location of the marker — the offset of the marker in the journal file and the journal file name — delimited by a comma. Otherwise, it returns an error code (<=0) and error message, also delimited by a comma.
Cluster Journal Information Global
The global node ^%SYS("JRNINFO") is where cluster journal information is maintained. It is indexed by current cluster session ID and is recreated every time the cluster restarts. This allows you to modify or delete the cluster journal log (presumably after deleting the journal files) between two cluster sessions, as the update algorithm assumes that you do not alter the cluster journal log during a cluster session.
The ^%SYS("JRNINFO") global has three subcomponents:
Here is a sample of ^%SYS("JRNINFO") contents:
^%SYS("JRNINFO",1032803946,"jrninfo")=16
^%SYS("JRNINFO",1032803946,"jrninfo",
  "_$1$DKA0:[TEST.50.MGR.JOURNAL]20030916.002")=1,2,1031949277,160,160
  "_$1$DRA1:[TEST.50.MGR.JOURNAL]20030913.004")=0,1,1031949277,3,3
  "_$1$DRA1:[TEST.50.MGR.JOURNAL]20030916.001")=0,3,1031949277,292,292
  "_$1$DRA1:[TEST.50.MGR.JOURNAL]20030916.002")=0,4,1032188507,3,417
  "_$1$DRA1:[TEST.50.MGR.JOURNAL]20030916.003")=0,7,1032188507,3,422
  "_$1$DRA1:[TEST.50.MGR.JOURNAL]20030916.004")=0,8,1032197355,3,4
  "_$1$DRA1:[TEST.50.MGR.JOURNAL]20030916.005")=0,9,1032197355,3,7
  "_$1$DRA1:[TEST.50.MGR.JOURNAL]20030916.006")=0,10,1032197355,3,10
  "_$1$DRA1:[TEST.50.MGR.JOURNAL]20030916.007")=0,11,1032197355,3,17
  "_$1$DRA1:[TEST.50.MGR.JOURNAL]20030916.008")=0,12,1032197355,3,17
  "_$1$DRA1:[TEST.50.MGR.JOURNAL]20030918.001")=0,13,1032197355,3,27
  "_$1$DRA1:[TEST.50.MGR.JOURNAL]20030923.001")=0,15,1032803946,3,133
  "_$1$DRA1:[TEST.5A.MGR.JOURNAL]20030916.002")=1,5,1032188507,3,3
  "_$1$DRA1:[TEST.5A.MGR.JOURNAL]20030916.003")=1,6,1032188507,131,131
  "_$1$DRA1:[TEST.5A.MGR.JOURNAL]20030923.001")=1,14,1032197355,39,39
  "_$1$DRA1:[TEST.5A.MGR.JOURNAL]20030923.002")=1,16,1032803946,3,3

^%SYS("JRNINFO",1032803946,"jrninfor",0,
            1)=_$1$DRA1:[TEST.50.MGR.JOURNAL]20030913.004
            3)=_$1$DRA1:[TEST.50.MGR.JOURNAL]20030916.001
            4)=_$1$DRA1:[TEST.50.MGR.JOURNAL]20030916.002
            7)=_$1$DRA1:[TEST.50.MGR.JOURNAL]20030916.003
            8)=_$1$DRA1:[TEST.50.MGR.JOURNAL]20030916.004
            9)=_$1$DRA1:[TEST.50.MGR.JOURNAL]20030916.005
           10)=_$1$DRA1:[TEST.50.MGR.JOURNAL]20030916.006
           11)=_$1$DRA1:[TEST.50.MGR.JOURNAL]20030916.007
           12)=_$1$DRA1:[TEST.50.MGR.JOURNAL]20030916.008
           13)=_$1$DRA1:[TEST.50.MGR.JOURNAL]20030918.001
           15)=_$1$DRA1:[TEST.50.MGR.JOURNAL]20030923.001

^%SYS("JRNINFO",1032803946,"jrninfor",1,
                    2)=_$1$DKA0:[TEST.50.MGR.JOURNAL]20030916.002
                    5)=_$1$DRA1:[TEST.5A.MGR.JOURNAL]20030916.002
                    6)=_$1$DRA1:[TEST.5A.MGR.JOURNAL]20030916.003
                   14)=_$1$DRA1:[TEST.5A.MGR.JOURNAL]20030923.001
                   16)=_$1$DRA1:[TEST.5A.MGR.JOURNAL]20030923.002

^%SYS("JRNINFO",1032803946,"seqinfo",1031949277,3,3,0,1)=
^%SYS("JRNINFO",1032803946,"seqinfo",1031949277,160,160,1,2)=
^%SYS("JRNINFO",1032803946,"seqinfo",1031949277,292,292,0,3)=
^%SYS("JRNINFO",1032803946,"seqinfo",1032188507,3,3,1,5)=
^%SYS("JRNINFO",1032803946,"seqinfo",1032188507,3,417,0,4)=
^%SYS("JRNINFO",1032803946,"seqinfo",1032188507,3,422,0,7)=
^%SYS("JRNINFO",1032803946,"seqinfo",1032188507,131,131,1,6)=
^%SYS("JRNINFO",1032803946,"seqinfo",1032197355,3,4,0,8)=
^%SYS("JRNINFO",1032803946,"seqinfo",1032197355,3,7,0,9)=
^%SYS("JRNINFO",1032803946,"seqinfo",1032197355,3,10,0,10)=
^%SYS("JRNINFO",1032803946,"seqinfo",1032197355,3,17,0,11)=
                                                       12)=
^%SYS("JRNINFO",1032803946,"seqinfo",1032197355,3,27,0,13)=
^%SYS("JRNINFO",1032803946,"seqinfo",1032197355,39,39,1,14)=
^%SYS("JRNINFO",1032803946,"seqinfo",1032803946,3,3,1,16)=
^%SYS("JRNINFO",1032803946,"seqinfo",1032803946,3,133,0,15)=