Purging Production Data
This page describes how and why to purge production data.
Introduction
For each production running in a given namespace, InterSystems IRIS may write entries to the event log, message warehouse, business process log, business rule log, and I/O archive log for the namespace. Since the entries can accumulate over time and consume large amounts of disk space, InterSystems IRIS enables you to purge outdated entries if you have appropriate permissions (see Controlling Access to Management Portal Functions).
You can do so manually; that is, you can purge production data on an ad hoc basis. You can also schedule regular purges. Typically, you perform manual purges on systems that you are using for development and testing, and you set up scheduled purges for live systems.
Purging generates journaling. If you purge a large volume of data, the resultant journaling can consume a large amount of disk space. To conserve disk space, you can purge smaller amounts of data and review the storage impacts before purging additional data.
First-time Purges
Purging generates journaling. If you purge a large volume of data, the resultant journaling can consume a large amount of disk space. To conserve disk space, you can adopt the following approach the first time you purge management data:
-
Switch to the namespace where you want to purge data.
-
Navigate to the Interoperability > Manage > Purge Management Data page.
-
Set the purge parameters so that a relatively small amount of data is purged.
For example, you can set Do not purge most recent to a relatively large number. For more information, see Settings for Purging Data.
Caution:Purges are irreversible and can lead to unintentionally orphaned data or the loss of unresolved requests. Consequently, InterSystems recommends that you carefully review the description of each setting before proceeding.
-
Click Start Purge.
-
Gradually decrease the Do not purge most recent value and purge additional data until you have purged a sufficient amount of data.
Purging Data Manually
The Purge Management Data page enables you to purge entries in the event log, message warehouse, business process log, business rule log, and I/O archive log all at one time for a given namespace. The page displays information about the entries in a table with the following columns:
-
Record Type — Indicates the type of production data associated with the row. Each row contains one type of artifact that running productions produce on an ongoing basis: Event Log, Messages, Business Processes, Business Rule Log, I/O Log, or Managed Alerts.
-
Count — Shows the total number of entries of a given Record Type stored for the production. You can use the Count value to decide whether it is worthwhile to purge the entries and if so, how many days’ worth of records to keep.
-
Deleted — After you click Start Purge and the system completes the purge process, shows the total number of entries of a given Record Type that were purged.
Additionally, the Purge Criteria area displays the default settings that your system administrator configured for manual purges.
To purge production data manually, do the following:
-
Switch to the namespace where you want to purge data.
-
Navigate to the Interoperability > Manage > Purge Management Data page.
-
If you have appropriate permissions, modify the settings in the Purge Criteria area as needed.
For more information, see Settings for Purging Data.
Caution:Purges are irreversible and can lead to unintentionally orphaned data or the loss of unresolved requests. Consequently, InterSystems recommends that you carefully review the description of each setting before proceeding.
-
Click Start Purge.
The system immediately purges the persistent store using the settings in the Purge Criteria area. The page uses a background job to perform purges, and reports the results of the last-run purge, including a status code, or a notice if the background job is running or has failed to run. After the purge, the Deleted column displays the number of records that were purged.
The Start Purge button is disabled while a purge is being executed in a given namespace.
Purging using the Management Portal removes at most 500 unused nodes of the associated bitmap index at a time. If you are purging a large number of messages, this process will leave unused nodes in the index global, which can take up space. To remove these global nodes, you can run additional purges via the message purge API.
Purging Data on a Schedule
The Task Scheduler Wizard enables you to schedule purges for the following types of production data separately or all at one time for a given namespace:
-
Events
-
Messages
-
Business processes
-
Rule logs
-
I/O logs
-
Host monitor data
-
Managed alerts
To purge data automatically at regular intervals, do the following:
-
Navigate to System Operation > Task Manager, and then select New Task.
-
Fill in the following fields:
-
Task name — Specify a name for the purge task.
-
Namespace to run task in — Select the namespace where you want to purge data.
-
Task type — Select Ens.Util.Tasks.PurgeOpens in a new tab.
Various settings for purging data appear.
-
-
Modify the setting for purging data as needed.
For more information, see Settings for Purging Data.
Caution:Purges are irreversible and can lead to unintentionally orphaned data or the loss of unresolved requests. Consequently, InterSystems recommends that you carefully review the description of each setting before proceeding.
-
Specify other options as needed.
For more information, see Using the Task Manager.
-
Click Finish.
Settings Applicable to Data Purges
This section describes the settings that affect data purges. These settings are all specific to the namespace in which they are set. Users with appropriate permissions can modify these.
Where to modify this setting:
-
Purge Management Data page
-
Purge Data Settings page
-
Task Scheduler Wizard page (as BodiesToo)
Default: disabled
Specifies whether to purge message bodies in addition to message headers (which are always purged) during a purge operation.
If this setting is enabled, InterSystems IRIS purges message headers and their corresponding message bodies. If this setting is disabled, InterSystems IRIS purges only message headers and retains any corresponding message bodies.
The system verifies that body classes exist and are persistent before purging them.
If InterSystems IRIS purges only message headers, the system may accumulate large quantities of message bodies. You cannot delete the retained message bodies from the Management Portal. You can delete them only programmatically. Consequently, InterSystems recommends that you consider your disk space and workflow when you configure the Include message bodies or BodiesToo setting.
Additionally, when InterSystems IRIS purges a message body, it does not necessarily delete all the object-valued properties of the message body. The system deletes only objects that have a serial or child relationship to the message body. You must delete other objects manually by defining a delete trigger or implementing the %OnDelete() method in the message body class, as appropriate. For more information about object-valued properties, see Defining and Using Object-Valued Properties.
Where to modify this setting:
-
Purge Management Data page
-
Purge Data Settings page
-
Task Scheduler Wizard page (as KeepIntegrity)
Default: enabled
Specifies whether to skip messages that are part of incomplete sessions during the purge process.
If this setting is enabled, when InterSystems IRIS encounters a message that meets the age criterion for purging, but is in an incomplete session, the system does not purge the message header or body. An incomplete session corresponds to any session that includes a message with a status other than Complete, Error, Aborted, or Discarded.
If you enable the Purge only completed sessions or KeepIntegrity setting, InterSystems IRIS executes a query that reviews all the messages (including business process instances) in each relevant session to identify any incomplete sessions. Consequently, enabling this setting can increase the amount of time required to complete a purge operation.
Preserving session-level integrity supports long-running business processes. InterSystems recommends that you consider whether you need to support long-running business processes and whether your system contains insignificant old messages in incomplete sessions when you configure the Purge only completed sessions or KeepIntegrity setting.
Purge operations can include messages associated with long-running system processes, such as workflow processes. If you disable this setting, carefully review the Do not purge most recent value to ensure that you do not purge critical system data.
Where to modify this setting:
-
Purge Management Data page
-
Purge Data Settings page
This setting is shown in the same places as Include message bodies and Purge only completed sessions and is intended to explain those settings.
Default: "Include message bodies" is OFF because some Productions may use message objects that are part of a larger environment and not transitory. "Purge only completed sessions" is ON to preserve messages not yet completely processed.
Edit Description as needed, if you modify the two settings it describes.
Where to modify this setting:
-
Purge Management Data page
-
Purge Data Settings page
-
Task Scheduler Wizard page (as NumberOfDaysToKeep)
Default: 7
Specifies how many days’ worth of records to keep. The count of days includes today.
If you set the value to 0 (zero), InterSystems IRIS does not keep any records and purges all the entries that exist at the time of the purge operation. If you set the value to 1 , InterSystems IRIS retains only the messages generated on the current day, according to local server time.
Where to modify this setting:
-
Task Scheduler Wizard page
Default: Events
Specifies the types of records to purge.
Where to modify this setting:
-
Interoperability Settings page
Default: disabled.
Specifies whether to write the old values to the journal file, when purging data; in this case, the journal file contains only the delete instruction, and it is not possible to roll back the changes. If this setting is disabled, then the journal file includes both the new and the old values, as usual. See Defining Interoperability Settings.
Using the Message Purge API
InterSystems IRIS also provides a utility method that enables you to purge messages, with more detailed ability to control how different kinds of messages are purged. This method is in the Ens.Util.MessagePurgeOpens in a new tab class:
classmethod Purge(Output pDeletedCount As %Integer,
pDaysToKeep As %Integer = 7,
pKeepIntegrity As %Boolean = 1,
pBodiesToo As %Boolean = 0,
pBitmapChunkLimit As %Integer = 500,
ByRef pExtendedOptions As %String) as %Status {
}
In order to use this method, you must have the SELECT privilege on the Ens.MessageHeaderOpens in a new tab table.
This method can use the Work Queue Manager to split the purge into multiple parallel batches. This option is enabled only if TypesToPurge is Messages; see the previous section.
This method has the following arguments:
This argument, which is returned as output, indicates how many messages were deleted.
See Do not purge most recent (NumberOfDaysToKeep) in the previous section.
See Purge only completed sessions (KeepIntegrity) in the previous section.
See Include message bodies (BodiesToo) in the previous section.
Specifies the maximum number of nodes of the associated bitmap index to examine and then potentially delete. The system will delete only nodes that are unused – that is, nodes that no longer map to any records in the table.
Each node of the bitmap index global corresponds to 64000 records in the associated table, and it is time-consuming to scan the node to make sure it is unused. Consequently, the API does not by default scan all the nodes of the index global.
This argument, which must be passed by reference, is a multidimensional array that specifies some or all of the following additional options:
-
pExtendedOptions("LimitToConfigItems") is a comma-separated list of configuration names of production hosts. If this option is specified, the purge is limited to messages sent by or received by the business hosts in the list.
-
pExtendedOptions("WQCategory") causes the purge to use the Work Queue Manager, using the given category; if the given category does not exist, the default category is used.
When using the Work Queue Manager, the message purge is split into batches, using separate SQL queries that select messages by their creation time stamps. For example:
-
One batch will purge messages from time stamp A (inclusive) to time stamp B (exclusive).
-
Another batch will purge messages from time stamp B (inclusive) to time stamp C (exclusive).
-
Yet another batch will purge messages from time stamp C (inclusive) to time stamp D (exclusive).
You can specify the batch size either by specifying the number of messages in a batch or by specifying a span of time (in minutes). The default is a batch of 100000 messages.
-
-
pExtendedOptions("WQBatchSize") controls the size of the batch, if WQBatchPeriodMinutes is not defined and you are using WQCategory. This setting specifies the number of messages in a batch (exclusive of completeness or configuration item name requirements). The minimum count is 10000.
-
pExtendedOptions("WQBatchPeriodMinutes") provides an alternative way to controls the size of the batch. This is used if WQBatchSize is not defined and you are using WQCategory. See comments for pExtendedOptions("WQCategory").
-
pExtendedOptions("StartDateTime") and pExtendedOptions("DoNotDeleteEndDateTime") are UTC time stamps that specify the start and the end of the time range to delete. Specifically, if a message has a time stamp that is greater than or equal to pExtendedOptions("StartDateTime") and also less than pExtendedOptions("DoNotDeleteEndDateTime"), the message is purged (if the other criteria apply).
pExtendedOptions("StartDateTime") and pExtendedOptions("DoNotDeleteEndDateTime") override pDaysToKeep.
For example, suppose that we want to purge all messages sent or received by the business hosts Service1, Process1, and Operation1. To do this, we could call the purge method as follows:
set myArray("LimitToConfigItems")="Service1,Process1,Operation1"
set status=##class(Ens.Util.MessagePurge).Purge(.DeleteCount,,,,,.myArray)
Then the DeleteCount variable would contain the number of messages deleted.