Introduction to Data Integrity
The integrity of the data in every InterSystems IRIS® database is protected from the consequences of instance and system failure by the features described in this guide.
InterSystems IRIS write image journaling technology protects against structural integrity failures, while logical integrity is maintained by InterSystems IRIS journaling and transaction processing. Together with a backup strategy, journaling provides rapid recovery from physical integrity failures.
This chapter explains how structural integrity is maintained and how to verify it. The remaining chapters cover the following topics:
Write image journaling technology protects against structural integrity failures due to system crashes.
Backup and restore strategies used together with journaling enable rapid recovery from physical integrity failures.
Journaling and transaction processing maintain logical integrity and used together with backup and restore enable rapid recovery from physical integrity failures.
The DataCheck utility lets you compare the state of data that exists on two systems to determine whether they are consistent.
Fundamental Data Integrity Protection
In general, there are two different levels at which integrity can be viewed:
Structural database integrity, or physical integrity, refers to the contents of the database blocks on disk. To have structural integrity, the database blocks must be self-consistent and the globals traversable. Structural integrity during a system crash is maintained by InterSystems write image journal (WIJ) technology, as described in the chapter “Write Image Journaling and Recovery”, and InterSystems IRIS's internal algorithms.
Logical integrity refers to the data represented by the globals within the database, and encompasses the self-consistency of the data created by the application, its transactional integrity, and its being up-to-date with the real world. Logical integrity during a system crash is maintained by InterSystems IRIS journaling (see the “Journaling” chapter) and transaction processing. (Other aspects of logical integrity are under the control of application code, through proper use of interlocks, transactions, and other mechanisms specific to the programming paradigm that the application employs.)
Automatic WIJ and journal recovery are fundamental components of the InterSystems “bulletproof” database architecture that protects InterSystems IRIS databases from system failures.
Integrity Verification and Recovery Mechanisms
Although system crashes alone cannot lead to a loss of integrity, there is always a possibility that a storage device will fail catastrophically, sustain physical damage, or be tampered with. In that case, the integrity of the database, WIJ and journals can become compromised. To compensate for such disasters, InterSystems IRIS provides the following features:
Tools for checking the structural integrity of databases, described in Verifying Structural Integrity in this chapter.
Backup mechanisms, as described in the chapter “Backup and Restore”.
Journaling-based logical data replication for automatic failover and disaster recovery through mirroring, described in the “Mirroring” chapter of the High Availability Guide.
DataCheck, a tool for checking the consistency of data between multiple systems when technologies such as mirroring maintain a replicated copy of data, described in the chapter “Data Consistency on Multiple Systems”.
Verifying Structural Integrity
An integrity check lets you verify the structural integrity (see Fundamental Data Integrity Protection) of a set of databases, or subset of globals within the databases.
The benefits of running an integrity check are as follows:
Integrity check can be integrated into your backup strategy to ensure that at the time of backup, the copy of the database was intact and that no errors were introduced during the backup itself, as discussed in External Backup in the “Backup and Restore” chapter.
Integrity check can detect corruption before users encounter it, giving time to make a plan before users are impacted.
Regular integrity checks provide a means by which the origin of any structural integrity problems that are found can be more accurately pinpointed in time, increasing the likelihood of identifying the root cause.
An integrity check lets you verify the integrity of all globals in selected databases, or of selected globals stored in a single specified database. You can run an integrity check from the Management Portal or using the ^Integrity utility in a Terminal window. This section covers the following topics:
Integrity Check False Positives
Running an integrity check on a volatile database may result in the false reporting of database integrity errors due to ongoing database updates.
When an integrity check is executed from the Management Portal or by the Task Manager, as described in Checking Database Integrity Using the Management Portal, it runs in the background, and automatically retests any globals in which errors are detected. Output from an integrity check that includes this automatic second pass reports on errors in the following manner:
If an error was detected in a global in the first pass but not in the second pass, the first error is assumed to be a false positive and no error is reported.
If the error detected in a global in the second pass differs from the error detected in the first pass, only the second-pass error is reported, with the text These errors in global <global_name> differ from the errors prior to the retry.
If the same error is detected in a global in both passes, the error is reported with the message When retried the errors in global <global_name> remained unchanged.
Integrity checks executed manually using the ^Integrity utility or one of the entry points described in Checking Database Integrity Using the ^Integrity Utility do not retest globals reporting errors on the first pass. If errors are returned, repeat the check for that particular database.
Generally, for an integrity check run on an active system, errors that are not repeated in a second pass are false positives, while errors that persist in a second pass represent actual integrity problems. The latter must be investigated, and the former may merit investigation as well, depending on the level of activity, the number of errors, and the extent to which false positives have previously occurred. The nature of your investigation will depend on your level of expertise and past experience of false positives. Steps you can take include:
Running the integrity check again, if possible during a period of lower system activity.
Running an integrity check on a restored copy of the most recent backup.
Examining the range of data in question for clues to the root problem.
Contacting the InterSystems Worldwide Response Center (WRC) for assistance.
The problem of false positives can be avoided by integrating integrity checks into your standard backup procedures, such as those described in External Backup in the “Backup” chapter of the Data Integrity Guide, so that databases are checked immediately after taking a snapshot of the logical disk volume on which they reside, in isolation from production as described in Isolating Integrity Checks.
Integrity Check Output
In addition to reporting any errors it encounters, the integrity check reports on the number of blocks in each global and the percentage of those blocks that is in use, breaking this information down by block level as well. For example, the following is a portion of the output of an integrity check on a DATA database populated with 20,000 users:
File Name: c:\intersystems\20182555dec15a\mgr\integ.txt IRIS Database Integrity Check - Report Created 01/25/2018 10:41:16 System: BBINSTOCK6440 Configuration: 20182555DEC15A No Errors were found. Full Listing of Databases Checked Directory: C:\InterSystems\20182555DEC15A\Mgr\DATA\ 0 globals with errors found Global: Aviation.AircraftD 0 errors found Top/Bottom Pnt Level: # of blocks=1 8kb (6% full) Data Level: # of blocks=64 512kb (87% full) Total: # of blocks=65 520kb (85% full) Elapsed Time = 0.0 seconds, Completed 01/25/2018 10:41:15 Global: Aviation.AircraftI 0 errors found Top/Bottom Pnt Level: # of blocks=1 8kb (0% full) Data Level: # of blocks=4 32kb (83% full) Total: # of blocks=5 40kb (67% full) Elapsed Time = 0.0 seconds, Completed 01/25/2018 10:41:15 Global: Aviation.Countries 0 errors found Top/Bottom Pnt Level: # of blocks=1 8kb (0% full) Data Level: # of blocks=1 8kb (52% full) Total: # of blocks=2 16kb (26% full) Elapsed Time = 0.0 seconds, Completed 01/25/2018 10:41:15 Global: Aviation.CrewI 0 errors found Top/Bottom Pnt Level: # of blocks=1 8kb (1% full) Data Level: # of blocks=5 40kb (90% full) Total: # of blocks=6 48kb (75% full) Elapsed Time = 0.0 seconds, Completed 01/25/2018 10:41:15 Global: Aviation.EventD 0 errors found Top/Bottom Pnt Level: # of blocks=1 8kb (41% full) Data Level: # of blocks=377 3,016kb (78% full) Big Strings: # of blocks=776 6,208kb (72% full) # = 479 Total: # of blocks=1,154 9,232kb (74% full) Elapsed Time = 0.1 seconds, Completed 01/25/2018 10:41:15 Global: Aviation.EventI 0 errors found Top/Bottom Pnt Level: # of blocks=1 8kb (0% full) Data Level: # of blocks=3 24kb (77% full) Total: # of blocks=4 32kb (58% full) Elapsed Time = 0.0 seconds, Completed 01/25/2018 10:41:15 ... Global: ROUTINE 0 errors found Top/Bottom Pnt Level: # of blocks=1 8kb (1% full) Data Level: # of blocks=6 48kb (78% full) Total: # of blocks=7 56kb (67% full) Elapsed Time = 0.0 seconds, Completed 01/25/2018 10:41:16 Global: SYS 0 errors found Top/Bottom Pnt Level: # of blocks=1 8kb (0% full) Data Level: # of blocks=1 8kb (0% full) Total: # of blocks=2 16kb (0% full) Elapsed Time = 0.0 seconds, Completed 01/25/2018 10:41:16 Global: Data.CompanyD 0 errors found Top/Bottom Pnt Level: # of blocks=1 8kb (0% full) Data Level: # of blocks=1 8kb (35% full) Total: # of blocks=2 16kb (17% full) Elapsed Time = 0.0 seconds, Completed 01/25/2018 10:41:16 Global: Data.CompanyI 0 errors found Top/Bottom Pnt Level: # of blocks=1 8kb (0% full) Data Level: # of blocks=1 8kb (9% full) Total: # of blocks=2 16kb (4% full) Elapsed Time = 0.0 seconds, Completed 01/25/2018 10:41:16 Global: Data.PersonD 0 errors found Top/Bottom Pnt Level: # of blocks=1 8kb (0% full) Data Level: # of blocks=5 40kb (81% full) Total: # of blocks=6 48kb (67% full) Elapsed Time = 0.0 seconds, Completed 01/25/2018 10:41:16 ...
When run from the Management Portal, the report begins with a listing of errors and warnings generated by the integrity check, if any. When run using the ^Integrity utility, the error summary is provided at the end of the output.
Checking Integrity Using the Management Portal
To check the integrity of selected databases, or of selected globals stored in a single database, navigate to the Databases page of the Management Portal (Home > System Operation > Databases) and use the following procedure:
Click Integrity Check to display a list of databases.
Select the appropriate check boxes for the databases you want to check.
If you want to check globals stored in a single database, select only the database that contains the globals you want to check, then click Select Globals to display a list of globals stored in the selected database. Select the globals you want to check, then click Save; if you do not select any globals from the list, all globals in the selected database are checked.
If you want to stop checking the integrity of the database(s) upon encountering an error, select the Stop after any error check box.
Click OK to begin the integrity check. The integrity check process runs in the background.
Click Integrity Log to view the output from the most recent integrity check run using the portal. The path of this file and its contents are automatically displayed.
Integrity Check is also one of the default system background tasks in the Task Manager. You can schedule multiple integrity checks if you wish, for example of different databases at different times. See Using the Task Manager in the “Managing InterSystems IRIS” chapter of the System Administration Guide for more information about scheduling system tasks.
If a database is mounted but was never added to, or has been deleted from, the Management Portal database configuration (see Configuring Databases in the System Administration Guide), the database is not included in the list of databases displayed by the Integrity Check function.
Interactive Integrity Check Using the ^Integrity Utility
You can run a manual integrity check using the ^Integrity utility by opening a Terminal window, switching to the %SYS namespace, and entering do ^Integrity. This is similar to running an integrity check from the Databases page of the Management Portal, except that, as noted in Integrity Check False Positives, the ^Integrity utility cannot recheck globals for which it finds errors in the first pass before completing and reporting its results, and it is therefore important to recheck globals for which errors are reported to eliminate false positives. (The Management Portal integrity check also distributes the integrity check across multiple jobs, instead of running a single job like the ^Integrity utility.)
You can also use the following ^Integrity entry points interactively:
Do CheckPointer^Integrity asks for a directory and a pointer block at which to start checking.
Do Exclude^Integrity asks for a list of databases to exclude from checking; entering ? displays a list of mounted databases.
Integrity Check API
The following ^Integrity entry points are available for programmatic use:
Do CheckList^Integrity(outputglobal,dirlist,stopafteranyerror,listofglolist,maxproc) runs an integrity check in the background and stores the results, including information and warnings. Use of the parameters is as follows:
Specifies a global in which to store the results. If the parameter is omitted from the call, the results are stored in ^IRIS.TempIntegrityOutput(+$JOB).
Specifies a $LIST of all the directories you want to check; if omitted, all directories are checked.
stopafteranyerror Specifies the integrity check’s behavior on error: if 1 is specified, checking of a directory stops when an error is found, if 0 (the default), checking continues on error.
Specifies a $LIST of $LISTs of global names, one for each directory specified in dirlist; using these parameters you could, for example, check all oddDEF globals in all directories by specifying $LB($LB("oddDEF")). If there are fewer elements in listofglolist than in dirlist, the last element in the former is used for the remaining directories in the latter.
Specifies the maximum number of parallel processes to be used, with a default of 8. If the specified value is <1, the number of cores on the host system is used, and if it is 1, the integrity check is done in the foreground.
Do CheckList^Integrity returns a %Status sc that can be evaluated as follows:
If $system.Status.IsOK(sc) is returned, the integrity check ran and found no errors.
A return status with a single error code $$$IntegrityCheckErrors, that is $system.Status.GetErrorCodes(sc)=$$$ERRORCODE($$$IntegrityCheckErrors), the integrity check ran successfully, but found errors.
If neither of the preceding is returned, a problem occurred that may have prevented integrity check from returning complete results. Note that in this case sc may contain more than one error code, and one of those may be $$$IntegrityCheckErrors.
Do Display^Integrity(integritout,flags,dirsum) displays the results of integrity check; use of the parameters is as follows:
Specifies the name of the global in which the results of the integrity check were stored (by CheckList^Integrity); if not specified, it defaults to ^IRIS.TempIntegrityOutput(+$JOB).
Determines which messages are displayed, according to the following values:
0 displays all messages.
1 displays only errors and warnings.
2 displays only errors.
If not specified, all messages are displayed.
If specified and not 0, the display includes a summary of blocks for each directory scanned.
The following is an example checking three databases using five processes. Omitting the dblist parameter here would check all databases instead. (Note that evaluation of sc, as described in the description of CheckList^Integrity returns above, is not required to display the results.)
set dblist=$listbuild(“/data/db1/”,”/data/db2/”,”/data/db3/”) set sc=$$CheckList^Integrity(,dblist,,,5) do Display^Integrity() kill ^IRIS.TempIntegrityOutput(+$job)
These entry points are supported for legacy use only:
Do Silent^Integrity(logfilename,dirlist) starts a background process that does an integrity check on selected or all databases and puts the output in a file specified by the logfilename parameter. The optional dirlist parameter specifies a $LIST ofdatabases to check; if not specified, all databases are checked. This is the equivalent of running an integrity check from the Databases page of the Management Portal.
Do SilentGlobalCheck^Integrity(logfilename,dir,gbllist) starts a background process that does an integrity check on selected globals in a selected database and puts the output in a file specified by the logfilename parameter. The required dir parameter identifies the database that contains the globals you want to check; the required gbllist parameter specifies a $LIST of one or more globals to check. This is the equivalent of choosing Select Globals when running an integrity check from the Databases page of the Management Portal.
Do Query^Integrity(logfilename,outdevice) does not run an integrity check, but outputs the contents of the file specified by the logfilename parameter, the results saved from a previous run, on the device specified in the optional parameter outdevice. Examples of outdevice are the current device (the default), a printer, another display device, or another operating system file name (to which logfilename is copied).
Tuning Integrity Check Performance
Because an integrity check must read every block of the globals being checked (if not already in buffers) in an order dictated by each global’s structure, the operation takes a substantial amount of time and can utilize much of the storage subsystem’s bandwidth. The optimal balance between the speed of an integrity check and its performance impact depends on why and when you are running it. For example, when running integrity check in response to a disaster involving storage corruption you probably want to get the results as soon as possible, whereas when it is run concurrently with production you may want to minimize its impact on the storage subsystem.
Integrity check is capable of reading as fast as the storage subsystem allows. The number of integrity check processes in use times the maximum number of concurrently active asynchronous reads allowed for each process (8 by default) is the upper limit on the number of concurrent reads overall, but the average may be half that. You can compare this estimate to the capabilities of the storage subsystem to determine the optimal number of processes. For example, with storage striped across 20 drives and the default 8 concurrent reads per process, five or more processes may be needed to capture the full capacity of the storage subsystem (5*8/2=20). (Note that assignment to processes is on a per-global basis, so a given global is always checked by just one process.)
The recommended approach to adjusting integrity check performance is as follows:
The first step is to choose the best method for launching the integrity check, as follows.
The CheckList^Integrity entry point to launch the integrity check provides the greatest control over performance, as it lets you specify the number of processes, and is therefore the most straightforward approach to tuning integrity check performance. For information about the maxproc argument to CheckList^Integrity, which specifies the number of parallel processes to use in the integrity check, see Integrity Check API.
The Management Portal Integrity Check option and the Integrity Check system background task in the Task Manager use multiple processes, but do not provide the same control as CheckList^Integrity. The Management Portal option, the Task Manager task, and the SYS.Database.IntegrityCheck() API call all select a number of processes equal to the number of CPU cores, which is typically comparatively large. These interfaces also perform a complete recheck of any global that reported an error in an effort to identify false positives caused by concurrent updates. This recheck is in addition to the false positive mitigation built into the integrity check algorithms, and may be unwanted due to the time required.
Other launching methods, such as the ^Integrity routine in the Terminal or the Silent^Integrity entry point, perform the integrity check in a single process, and are therefore not useful when results are needed quickly. They do have the advantage of outputting their results to a file or the terminal immediately, making them visible to the user during while the integrity check is ongoing.
An integrity check process walks through the pointer blocks of a global, one at a time, validating each against the contents of the data blocks it points to. The data blocks are read with asynchronous I/O to enable multiple concurrently active read requests for the storage subsystem to process, with validation performed as each read completes. By default, the maximum number of concurrent read requests is 8. Bear in mind the following points:
Because the number of processes in use times the maximum number of concurrent reads per process is the upper limit on the overall number of concurrent reads, changing the number of parallel integrity check processes is typically the first adjustment to make. However, changing the maximum concurrent reads parameter can provide additional control. Further, when the integrity check is confined to a single process (for example when there is one extremely large global or other external constraints), tuning this parameter is the primary means of adjusting performance.
This benefits of increasing this parameter are limited by the storage subsystem’s capacity to process concurrent reads. Higher values have no benefit if databases are stored on a single local drive, whereas a storage array with striping across dozens of drives can process dozens of reads concurrently. The benefits are also limited by compute time, among other factors.
To adjust the maximum number of current reads, open the Terminal and in the %SYS namespace, display the current value by entering write $$GetAsyncReadBuffers^Integrity(), then change it if desired by entering do SetAsyncReadBuffers^Integrity(value); the maximum is 128. The change takes effect when the next global is checked. (This setting does not persist through a restart of the instance.)
There is a similar parameter to control the maximum size of each read when blocks are contiguous on disk (or nearly so). This parameter is needed less often, although systems with high storage latency or databases with larger block sizes might benefit from fine tuning. The commands for this parameter are write $$GetAsyncReadBufferSize^Integrity() and do SetAsyncReadBufferSize^Integrity(value); the value is set in unit of 64 KB, so a value of 1 sets 64 KB as the maximum, 4 sets 256 KB, and so on, with a maximum of 512 (32,768 KB). The default is 0, which lets the instance select the value; currently it selects 1, for 64 KB.
Isolating Integrity Check
Many sites run regular integrity checks directly on the production system. This has the advantage of simplicity, but in addition to concerns about integrity check’s impact on storage bandwidth, concurrent database update activity can sometimes lead to false positive errors (despite the built-in mitigation measures), which means that errors reported from an integrity check on a production system must be evaluated and/or rechecked by an administrator.
To avoid false positives, you can isolate integrity checks from production by mounting storage snapshots or backup images on another host and having an isolated InterSystems IRIS instance run integrity checks on them. If the storage is also isolated from production, integrity check performance can be maximized and the result obtained as quickly as possible without any concern about the impact on production storage. This approach is suitable for arrangements in which integrity check is used to validate backups; a validated backup effectively validates the production databases as of the time the backup was made. Cloud and virtualization platforms can also make it easier to establish a usable isolated environment from a snapshot.