Caché Data Integrity Guide
Data Consistency on Multiple Systems
[Back] 
   
Server:docs2
Instance:LATEST
User:UnknownUser
 
-
Go to:
Search:    

When mirroring, shadowing, or other mechanisms are used to maintain a copy of data on another system, you may want to check the consistency of that data between the two systems. DataCheck provides this checking and includes provisions to recheck transient discrepancies.

This chapter discusses the following topics:
DataCheck Overview
DataCheck provides a mechanism to compare the state of data on two systems — the DataCheck source and the DataCheck destination — to determine whether or not they match. All configuration, operational controls and results of the check are provided on the destination system; the source system is essentially passive.
On the instance of Caché that is to act as the DataCheck destination, you must create a DataCheck destination configuration. You can create multiple destination configurations on the same instance, which you can configure to check data against multiple source systems (or configure them to check different data against a single source). If DataCheck is being used to check the consistency of a shadow system, it is recommended that the Caché instance serving as the destination of shadowing also be configured as the DataCheck destination. If you are using DataCheck to check the consistency of a mirror, see DataCheck for Mirror Configurations for more details.
The following subsections describe DataCheck topics in more detail:
DataCheck Queries
The destination system submits work units called DataCheck “queries” to the source system. Each query specifies a database, an initial global reference, a number of nodes, and a target global reference. Both systems calculate an answer by traversing the specified number of global nodes starting with the initial global reference, and hashing the global keys and values. If the answers match, the destination system records the results and resubmits the query with a larger number of nodes and the initial global reference advanced; if they don't match, the query is resubmitted with a smaller number of nodes until the discrepancy is isolated down to the configured minimum query size.
You can display information about the queries submitted by the destination system using the View Queries option of the View Details submenu of the ^DATACHECK routine, including the globals that remain to be processed (or global ranges if subscript include/exclude ranges are used), and the active queries currently being worked on by DataCheck.
DataCheck Jobs
The answer to each query is calculated by DataCheck worker jobs running on both the source system and the destination system. The number of worker jobs is determined by the dynamically tunable performance settings of the destination system; for more information, see Performance Considerations in this chapter.
In addition to the worker jobs, there are other jobs on each system. The following additional jobs run on the destination system:
The following additional jobs run on the source system:
DataCheck Results
The results of the check lists global subscript ranges with one of the following states:
You can view the results from the current check and the final results from the last check on the destination system; for more information, see the SYS.DataCheck.RangeList class. For all subscript ranges within DataCheck, the beginning of a range is inclusive and the end exclusive. See Specifying Globals and Subscript Ranges to Check in this chapter for information about subscript ranges.
The following provides a sample check result:
c:\InterSystems\cache\mgr\mirror2 ^XYZ       Unmatched
        ^XYZ --Matched--> ^XYZ(3001,4)
        ^XYZ(3001,4) --Unmatched--> ^XYZ(5000)
        ^XYZ(5000) --Matched--> [end]
This result indicates that the nodes in the range starting at ^XYZ up to but not including ^XYZ(3001,4) are matched, while there is at least one discrepancy in the range of nodes from ^XYZ(3001,4) up to but not including ^XYZ(5000). The nodes in the range from ^XYZ(5000) to the end are matched.
The minimum number and frequency of discrepancies in the unmatched range depends on the minimum query size (see Performance Considerations). For example, if the minimum query size is set to the default of 32 in this case, there is at least one discrepancy every 32 nodes from ^XYZ(3001,4) until ^XYZ(5000); if there were a sequence within this range of more than 32 nodes without a discrepancy, it would appear in the results as a separate matched range.
DataCheck Workflow
During the check, data may be changing and transient discrepancies may be recorded. Rechecking may be required to eliminate these transient discrepancies. The destination system has a workflow that defines a strategy for how to check the globals.
A typical workflow begins with the “Check” phase as phase #1. (Phase #1 should always be defined as the logical starting point of the check cycle, since it is used by the workflow timeout and the Start dialog of the ^DATACHECK routine to indicate a "reset" from beginning, as described in the next section.) At the beginning of this phase, the current set of results are saved as the last completed results and a new set of active results is established. DataCheck makes an initial pass through all globals specified for inclusion in the check.
Following the Check phase, the “Recheck Discrepancies” phase is typically specified with the desired number of iterations. Each iteration rechecks all unmatched ranges in an effort to eliminate transient discrepancies.
As each phase of the workflow is completed, DataCheck moves to the next phase. The workflow is implicitly restarted from phase #1 after the last phase is complete. The “Stop” phase shuts down all DataCheck jobs and the “Idle” phase causes DataCheck to wait for you to manually specify the next phase.
Starting/Stopping/Reconnecting DataCheck
You can stop and start DataCheck at any time; when you start DataCheck, it resumes the workflow from where it left off. In addition, you can specify a different workflow phase to follow the current phase and/or abort the current phase at any time.
If, during a check, DataCheck is stopped, becomes disconnected, or pauses due to mirroring, the routine reports why the system was stopped, what phase it stopped in, and what it will do when it starts (for example, resume processing, move to the next phase, change phase due to user request or restart at phase #1 due to workflow timeout). If, upon starting, DataCheck is going to resume processing the current phase or make a transition to any phase other than phase #1, you are offered the option of restarting at phase #1, as in the following example:
Option? 4

Configuration Name: test

State:  Stopped due to Stop Requested
Current Phase: 1 - Check
Workflow Phases:
  1 - Check
  2 - RecheckDiscrepancies, Iterations=10
  3 - Stop
  (restart)
Workflow Timeout: 432000
New Phase Requested: 2
Abort Current Phase Requested

DataCheck is set to abort the current phase and transition to phase #2.

You may enter RESTART to restart at phase #1

Start Datacheck configuration 'test'? (yes/no/restart)
In cases in which DataCheck becomes disconnected and reconnects only after an extended period, it may be more desirable to restart from phase #1 of the workflow instead. For example, if the systems were disconnected for several weeks in the middle of a check and then the check is resumed, the results are of questionable value, having been collected in part from two weeks prior and in part from the present time. The workflow has a Timeout property that specifies the time, in seconds, within which DataCheck may resume a partially completed workflow phase. If the timeout is exceeded, DataCheck restarts from phase #1 the next time it reaches the running state. The default value is five days (432000 seconds), based on the assumption that a large amount of data is checked by this DataCheck configuration and the check may take hours or days to complete normally; a smaller value may be preferable for configurations that complete a check in a shorter amount of time. A value of zero means no timeout.
Note:
As noted, you should define phase #1 to be the logical starting point of the check cycle, since it is used by the workflow timeout and the Start dialog of the ^DATACHECK routine to indicate a "reset" from beginning, as shown in the previous example.
DataCheck for Mirror Configurations
Upon creating a DataCheck destination configuration, if the system is a member of a mirror (see the Mirroring chapter of the Caché High Availability Guide), you are given the option to configure DataCheck to check the mirrored data. If you choose this option, you need only select the mirror member to act as the DataCheck source, and the rest of the configuration is automatic.
When a check begins, all mirrored databases are included in the check; you do not have to map databases individually. You can specify which globals are checked or exclude entire databases, as described in Specifying Globals and Subscript Ranges to Check. A mirror-based DataCheck configuration cannot be used to check non-mirrored databases, but a separate non-mirrored DataCheck configuration can be created for such purposes.
This section discusses the following topics:
Planning DataCheck within the Mirror
Each DataCheck destination configuration connects to one source mirror member. Although the source member should not be changed, additional DataCheck configurations can be created to check against more than one source mirror member (or to check different sets of data from the same source).
This section includes the following member-specific subsections:
Checking Data Between Failover Members
When checking between failover mirror members, the check is typically run with the backup failover member configured as the DataCheck destination for the following reasons:
Whenever DataCheck loses its connection to the source, it retries the connection, waiting indefinitely for the source machine to become available again. If a mirror-based DataCheck is started on the destination when it was not the primary failover member, and that member becomes the primary, DataCheck stops rather than automatically try to reconnect. This prevents DataCheck from unintentionally running on the primary. For more information about reconnecting, see Starting/Stopping/Reconnecting DataCheck in this chapter.
Checking Data on Async Members
When mirror-based DataCheck is checking between a failover member and an async member, the async member is typically the destination. This is for the same reasons mentioned above (see Checking Data Between Failover Members) in regards to checking between failover members, but primarily because the results of the check should be stored on the async member during disaster recovery.
When there are two failover members, it is often desirable to create one DataCheck destination configuration on an async member for each of the two failover members as sources. The ^DATACHECK routine offers to create both for you, and offers settings for how they behave with respect to which of the two is the primary failover member.
Each DataCheck configuration has a setting to govern how it behaves based on the source failover member’s status as the primary member. The settings are:
Note:
For information about reconnecting after a pause, see Starting/Stopping/Reconnecting DataCheck in this chapter.
For DataCheck configurations that are run manually (on demand) by a system administrator, these settings may not be of particular importance; they are more important for DataCheck configurations that are run continuously (or nearly so).
Any member may check another member without any particular relation. For example, if an async member is being used to check both failover members, it could also be used as the source of a check for other async members, thus avoiding the need to have any other async members check against the failover members.
Selecting Globals to Check
All mirrored databases that exist when DataCheck is run are checked automatically; for information about controlling which globals and databases are checked, see Specifying Globals and Subscript Ranges to Check in this chapter.
DataCheck Setup Procedure
You can set up DataCheck destination systems with the ^DATACHECK routine and enable DataCheck source systems through the Management Portal. To set up a new DataCheck system, do the following:
  1. Create new destination system.
  2. Set up/edit destination system configurations, as follows:
    1. For non-mirror-based configurations, specify the hostname/IP address, superserver port, and optional SSL configuration for the TCP connection to the source system.
      For mirror-based configurations, specify the mirror member you want to check.
    2. For non-mirror-based configurations, specify the set of databases to be checked and their corresponding paths on the source system.
      For mirror-based configurations, all mirrored databases are included.
    3. Optionally, specify global selection masks and subscript ranges for fine-grained control over which databases, globals, and global ranges to include or exclude. For more information, see Specifying Globals and Subscript Ranges to Check in this chapter.
    4. Optionally, adjust the dynamically tunable settings to control the performance and system resource consumption for the check. For more information, see Performance Considerations in this chapter.
    5. Optionally, modify the workflow specifying the strategy for the check. For more informations, see DataCheck Workflow in this chapter.
  3. Enable the %Service_DataCheck service on the source system. For more information, see Enabling the DataCheck Service in this chapter.
  4. Start the destination system, which controls the checking.
  5. Monitor the status of the check, as follows:
Enabling the DataCheck Service
Use the Management Portal from the Caché instance running on the source system to enable the data checking service and, optionally, restrict connections:
  1. Navigate to the Services page (System Administration > Security > Services) of the Management Portal.
  2. Click %Service_DataCheck in the list of service names to edit the data checking service properties.
  3. Select the Service enabled check box. Before clicking Save, you may want to first restrict which IP addresses can connect to this database source. If so, perform the next step, and then click Save.
    Note:
    When configured to check a mirror, DataCheck uses SSL if the mirror is set to use SSL (for more information, see DataCheck for Mirror Configurations in this chapter). The DataCheck service, however, does not automatically restrict access only to mirror members. If you wish to restrict DataCheck connections from other systems, you must configure the Allowed Incoming Connections for the %Service_DataCheck service.
  4. Optionally, to restrict access to the service, in the Allowed Incoming Connections box (which displays previously entered server addresses), click Add to add an IP Address. Repeat this step until you have entered all permissible addresses.
    You may delete any of these addresses individually by clicking Delete in the appropriate row, or click Delete All to remove all addresses, therefore allowing connections from any address.
Specifying Globals and Subscript Ranges to Check
DataCheck lets you specify global names and subscript ranges to include in or exclude from checking using the options detailed in the following.
Note:
Only literal values are accepted as global names and subscripts when specifying global and subscript ranges.
^DATACHECK Routine
You can use the ^DATACHECK routine (in the %SYS namespace) to configure and manage the data checking. To obtain Help at any prompt, enter ?.
To start the ^DATACHECK routine, do the following:
  1. Enter the following commands in the Caché Terminal:
    ZNSPACE "%SYS"
    %SYS>do ^DATACHECK
    
  2. The main menu is displayed. Enter the number of your choice or press Enter to exit the routine:
    1) Create New Configuration
    2) Edit Configuration
    3) View Details
    4) Start
    5) Stop
    6) Delete Configuration
    7) Incoming Connections to this System as a DataCheck Source
    
    Option? 
    
    Note:
    For options 2 through 6, if you created multiple destination systems, a list is displayed so that you can select the destination system on which to perform the action.
    The main menu lets you select DataCheck tasks to perform as described in the following table:
    Option Description
    1) Create New Configuration
    Prompts for the name of a new DataCheck destination system configuration via the Create New Configuration prompt.
    2) Edit Configuration
    Displays the Edit Configuration submenu.
    3) View Details
    Displays the View Details submenu.
    4) Start
    Starts/restarts the destination system. If you are restarting, it resumes from where you stopped it.
    5) Stop
    Stops the destination system. If you restart the destination system after stopping it, it resumes from where you stopped it.
    6) Delete Configuration
    Deletes the specified destination system configuration.
    7) Incoming Connections to this System as a DataCheck Source
    This option must be selected on a source system.
Create New Configuration
This submenu lets you configure the destination system. When you select this option, the following prompt is displayed:
Configuration Name: 
If you are creating a DataCheck configuration on a system that is not a mirror member, the Edit Settings submenu is displayed, and you complete the configuration manually as described in Editing DataCheck Configurations on Non-mirror-based Systems.
If you are creating a DataCheck configuration on a system that is a mirror member, you are prompted for additional information that is dependent upon whether or not you want to base the data checking on mirroring. Choosing to configure DataCheck that is not based on mirroring displays the Edit Settings submenu, which you use to complete the configuration manually as described in Editing DataCheck Configurations on Non-mirror-based Systems. However, choosing to configure DataCheck based on mirroring restricts data checking to mirrored databases, and subsequent prompts are dependent on whether the destination system is a failover or async mirror member; for more information, see DataCheck for Mirror Configurations in this chapter.
Edit Configuration
The submenu lets you modify the destination system configurations. The options in the submenus are different depending on whether you are editing mirror-based or non-mirror-based configurations. For more information, see the following subsections:
Editing DataCheck Configurations on Non-mirror-based Systems
On a non-mirror-based system, when you select this option, the following prompts are displayed:
Configuration Name: dc_test
 
1) Import Settings from a Shadow   (static)
2) Connection Settings             (static)
3) Database Mappings               (static)
4) Globals to Check                (dynamic)
5) Performance Settings            (dynamic)
6) Manage Workflow                 (dynamic)

Option? 
Note:
In edit mode, if you created multiple destination systems, a list is displayed so that you can select a destination system to edit. In addition, before you edit the settings for options 1 through 3, you must stop the system.
Enter the number of your choice or press ^ to return to the previous menu. The options in this submenu let you configure the destination system as described in the following table:
Option Description
1) Import Settings from a Shadow
If you are using DataCheck to verify that the source and destination of a shadowing system are synchronized, the option lets you import settings from an existing shadowing system.
2) Connection Settings
Information to connect to the source system.
3) Database Mappings
Lets you add, delete, or list database mappings on the source and destination systems.
4) Globals to Check
Globals to check or exclude from checking. For more information, see Specifying Globals and Subscript Ranges to Check in this chapter.
5) Performance Settings
Adjusts system resources (throttle) used and/or granularity with which DataCheck isolates discrepancies (minimum query size). For more information, see Performance Considerations in this chapter.
6) Manage Workflow
Manages the order of workflow phases. For more informations, see DataCheck Workflow in this chapter.
Editing Mirror-based DataCheck Configurations
On a mirror-based system, the following submenu is displayed:
Configuration Name: MIRRORSYS2_MIRRORX201112A_1
 
1) Globals to Check     
2) Performance Settings 
3) Manage Workflow      
4) Change Mirror Settings (Advanced) 

Option? 
Enter the number of your choice or press ^ to return to the previous menu. The options in this submenu let you configure the destination system as described in the following table:
Option Description
1) Globals to Check
Globals to check or exclude from checking. For more information, see Specifying Globals and Subscript Ranges to Check in this chapter.
2) Performance Settings
Adjusts system resources (throttle) used and/or granularity with which DataCheck isolates discrepancies (minimum query size). For more information, see Performance Considerations in this chapter.
3) Manage Workflow
Manages the order of workflow phases. For more informations, see DataCheck Workflow in this chapter.
4) Change Mirror Settings (Advanced)
View Details
This submenu lets you monitor the status of the destination system, as well as view detailed information about the queries that are running and the results of data checking:
System Name: dc_test
 
1) View Status
2) View Results
3) View Queries
3) View Log

Option? 
Enter the number of your choice or press ^ to return to the previous menu. The options in this submenu let you view information about the destination system as described in the following table:
Option Description
1) View Status
Displays information about the selected destination system, including performance metrics for the DataCheck worker jobs, percentage of queries completed in the current phase, and the number of discrepancies recorded in this phase.
2) View Results
Displays the results for the selected destination system. For more information, see DataCheck Results in this chapter.
3) View Queries
Displays information about the queries submitted by the selected destination system (see DataCheck Queries). This includes the globals that remain to be processed (or global ranges if subscript include/exclude ranges are used), and indicates the active queries currently being worked on by DataCheck. A summary count is displayed at the end of the list.
4) View Log
Displays the selected destination system log file.
Note:
When ^DATACHECK is run against the two copies of a mirrored database on two mirror member instances, and that database is experiencing the rapid setting and killing of a whole global, it can display confusing results from the View Status option when compared to the View Results option. For example, it will report that there are unmatched answers in status, but will not actually report the globals that caused these answers in results (because further passes resolved the discrepancies). In addition, displayed answer counts can be larger than the actual number of globals within the instance (as displayed in the management portal, and as actually reported in the results).
When View Status shows Answers Rcvd having a non-zero unmatched value but discrepancies having a zero value, this is indicative of transient globals, not a data issue.
Incoming Connections to this System as a DataCheck Source
This submenu lets you view information about the source system:
1) List Source Systems
2) View Log

Option? 
Enter the number of your choice or press ^ to return to the previous menu. The options in this submenu let you view information about the source system as described in the following table:
Option Description
1) List Source Systems
Displays information about the DataCheck source system.
2) View Log
Displays the source system log file.
Special Considerations for Data Checking
Review the following special considerations when using DataCheck:
Performance Considerations
While data checking is useful to ensure consistency of databases on multiple systems, it consumes resources on both the source and destination systems. This could negatively impact performance of other processes on either system, depending on load and the configured DataCheck settings. DataCheck includes controls to help you manage performance.
The throttle is an integer between 1 and 10 that controls how much of the available system resources (CPU, disk I/O, database cache) DataCheck may use. The throttle value can be changed at any time, to take effect immediately; for example, the value can be increased during periods when the system load is otherwise expected to be light, and decreased during periods when system load is heavy. This is useful for checks that are expected to run for an extended period of time. (The DataCheck routine can also be stopped during periods of high load; upon being restarted, it automatically resumes at the point in the check at which it was stopped.)
The characteristics of every system are different, but the following general descriptions of throttle values apply:
The View Status option on the ^DATACHECK View Details submenu shows performance metrics for the DataCheck worker jobs, helping you understand performance characteristics and how they relate to the throttle setting.
The implementation of the throttle may differ over time as software and hardware characteristics evolve.
The minimum query size represents the minimum number of global nodes allowed to traverse a query; in other words, it determines the minimum size of the range of global nodes to which DataCheck isolates discrepancies. Lower values help locate discrepancies more easily, while higher values significantly improve the speed of the check through unmatched sections. For example, if the minimum query size were set to 1 (not recommended), each discrepant node could be reported as a separate unmatched range, or at least as a range of all unmatched globals, precisely identifying the discrepancies but greatly impacting performance; if the minimum query size were set to 1000 (also not recommended), one or more discrepancies would be reported as a range of at least 1000 unmatched nodes, making it difficult to find them, but the check would be much faster. The default is 32, which is small enough to allow for relatively easy visual inspection of the global nodes in a range using the Management Portal (see the Managing Globals chapter of Using Caché Globals) while not greatly impacting performance.
Security Considerations
The destination system stores subscript ranges for globals that it has checked and is checking (results and queries). (See Specifying Globals and Subscript Ranges to Check in this chapter.) This subscript data is stored in the ^SYS.DataCheck* globals in the %SYS namespace (in the CACHESYS database by default). Global values are not stored; only subscripts are stored. These global subscripts from other databases that are stored in the %SYS namespace may contain sensitive information that may not otherwise be visible to some users, depending on the security configuration. Therefore, some special care is needed in secured deployments.
Use of the ^DATACHECK routine, including the ability to configure, start, and stop, requires both %Admin_Operate:Use privilege and Read/Write privilege (Write for configuring a check, Read for all other tasks) on the database containing the ^SYS.DataCheck* globals which, by default, is CACHESYS. The configuration and results data stored in the ^SYS.DataCheck* globals can be viewed and manipulated outside of the routine by anyone with sufficient database privileges.
For any secure deployment in which %DB_CACHESYS:Read privilege is given to users that should not have access to DataCheck data, you can add a global mapping to the %SYS namespace to map ^SYS.DataCheck* globals to a separate database other than CACHESYS. This database can be assigned a new resource name; read permission for the resource can then be restricted to those roles authorized to use DataCheck.
The ability for another destination system to connect to this system as a source is governed by this system's %Service_DataCheck service. This service is disabled by default on new installations and can be configured with a list of allowed IP addresses. For more information, see Enabling the DataCheck Service in this chapter.
For encryption of the communication between the two systems, the destination system can be configured to use SSL to connect to the source. See Configuring the Caché Superserver to Use SSL/TLS in the “Using SSL/TLS with Caché” chapter of the Caché Security Administration Guide for details.