Monitoring Processes Using ^PERFSAMPLE
This topic describes the ^PERFSAMPLE utility, a tool for analyzing InterSystems IRIS® data platform processes. The utility process activity on a live system and presents an easily navigable breakdown of the sampled activity, which can provide insights into your system. For example, you may discover application bottlenecks by checking ECP requests, or identify overall system bottlenecks by reviewing the types of wait events.
To get started, run ^PERFSAMPLE from the %SYS namespace on an InterSystems IRIS instance of interest:
USER>set $namespace = "%SYS" %SYS>do ^PERFSAMPLE
For more general information about InterSystems processes, see Controlling InterSystems IRIS Processes.
The following message appears as soon as you run ^PERFSAMPLE:
This utility performs high frequency sampling of processes on the system, analyzing and counting data points in different ways to understand where processes are spending most of their time. On ECP Data Servers, this also offers sampling of the current request being processed and the states of the ECPSvrW daemons doing the processing. 1) Sample Local Process Activity 2) Sample ECP Server Requests Option?
If the instance has no incoming ECP connections from ECP clients (application servers), option 1 above is automatically selected.
You are then prompted to enter the following information:
What processes or ECP connections to sample.
Whether to ignore samples where the process is in any of the following states: READ, READW, EVTW, HANG, SLCT, SLCTW, and RUNW. When sampling ECP connections, only events where the ECPSvrW process is non-idle are recorded.
Selecting YES (the default) reduces the number of events that ^PERFSAMPLE records. When monitoring many processes, this speeds analysis and uses less memory.
Number of samples to collect per second.
Total number of seconds to collect samples.
In the Terminal, the prompts look like:
Enter a list of PIDs, * for all, or ? for ^%SS display: * Ignore samples where the process appears idle (READ, HANG, etc)? Yes => Sample rate per second: 1000 => Number of seconds to sample: 30 =>
Examining and Analyzing Samples
After collecting the samples, you may view an analysis. An analysis is a summary of one or more dimensions, or components, of the sampled processes. That is, an analysis sorts the sampled information according to the selected dimensions.
This section includes the following:
An example of using a predefined analyses.
Information about creating a custom analyses.
A description of the available analysis Dimensions.
Use the following keys to navigate within the analyzer:
|Up arrow or U||Move the selector up|
|Down arrow or D||Move the selector down|
|Right arrow or Enter||Select the current item|
|Left arrow or Backspace||Go back to the previous level|
|C||Cycles through the following count displays:
|N or CTRL-D||Next page (if multiple pages)|
|P or CTRL-U||Previous page (if multiple pages)|
The main landing page looks like:
— PERFSAMPLE for Local Process Activity. 1.710949s at 11/17/2020 15:58:31 28479 samples | CPULoad* 0.91 Multiple jobs included: 1290 samples per job ----------------------------'?' for help---------------------------- Select an analysis to view: New Analysis (press '+' any time) Using CPU? -> PID -> Process State Using CPU? -> Routine -> Namespace -> Process State Process State -> Routine -> PID Kernel Wait State -> Routine -> PID
Predefined Analysis Example
Below is an example of an analysis that begins with the Process State dimension.
In this example, ^PERFSAMPLE found 76755 samples of processes in a sample-able state (non-idle if the option to ignore idle was selected) out of 319994 total samples:
– PERFSAMPLE for Local Process Activity. 3.89s at 11/17/2020 16:59:59 76755 events in 319994 samples [24.0 %-total] | CPULoad* 8.22 Multiple jobs included: 2191 samples per job ----------------------------'?' for help---------------------------- Process State [24.0 %-total] GGET [8.46 %-total] RUN [5.88 %-total] GDEF [3.16 %-total] GSETW [1.63 %-total] BSETW [1.21 %-total] GDEFW [1.18 %-total] GGETW [0.931 %-total] SEMW [0.685 %-total] GSET [0.311 %-total] LOCKW [0.144 %-total] LOCK [0.0644 %-total] INCRW [0.0641 %-total] BSET [0.0513 %-total]
Initially, the values appear as a percent of the total number of samples. The most common Process State value sampled in this case was GGET, which represents 8.46% of the total 319994 samples.
Pressing c cycles through how this count is displayed. For example, you can display the above information as a raw count of samples:
Process State  > GGET  RUN  GDEF 
You can also view the information as a percent of qualifying samples (in this case, samples that had a non-idle Process State):
Process State [24.0 %-total] > GGET [35.3 %-subset] RUN [24.5 %-subset] GDEF [13.2 %-subset]
Finally, you can view the average number of jobs concurrently each state:
Process State [24.0 %-total] GGET [12.4 jobs] RUN [8.59 jobs] GDEF [4.62 jobs]
Selecting GGET with the Right Arrow key moves to the next dimension, ordering the values of that dimension for samples where the first dimension had value GGET. You can navigate freely between the dimensions using the arrow keys.
Creating a Custom Analysis
Select the New Analysis option from the main landing page to create a custom analysis. You can also create an custom analysis using one of these shortcuts:
|+ key||Add a dimension to the current analysis (when in an analysis)|
|* key||Begin a new analysis with the current item as the first dimension.|
Adding a new analysis brings you to the following screen:
New Analysis: Specify a comma-delimited list of dimensions upon which to analyze samples. For example, "state,ns,rou" means first count each unique state the sampled processes were in; then for each state, count the namespace from the samples in that state; and finally for each state->namespace pair, count each unique routine name. In other words, report on routines by namespace by state. The following dimensions are available: cpu - Using CPU? (process state indicates expected CPU use) ns - Namespace (current namespace) pid - PID (process ID) rou - Routine (name of current routine) state - Process State (process state string, e.g. GSETW) trace - Kernel Trace (alternative to 'state' w/ kernel-level detail) waits - Kernel Wait State (kernel-level condition that delayed the process) wtrace - Reverse Kernel Trace (revese kernel trace, stop at any wait state) Enter dimension list:
From here, enter the list of dimensions you would like to analyze as described by the prompt. Once you press Enter, you may navigate the analysis as described above.
The dimensions for analyses are described within the ^PERFSAMPLE tool. This section provides some additional information.
cpu - Using CPU? (process state indicates expected CPU use)Note:
The yes or no value for cpu is not a true measure of on-cpu time, but an estimate. ^PERFSAMPLE infers CPU use from the process state, and InterSystems IRIS state tracking may not directly correlate to CPU use.
If the process is waiting for the OS scheduler to make the CPU available to it as a result of (instantaneous or persistent) over-utilization of the CPU, this can also lead to inaccuracy in cpu.
ns - Namespace (current namespace)
pid - PID (process ID)
rou - Routine (name of current routine)
state - Process State (process state string, e.g. GSETW)
waits - Kernel Wait State (kernel-level condition that delayed the process). See the following section for more information.
In general, the following dimensions are only useful when troubleshooting with the InterSystems Worldwide Response Center (WRC):
trace - Kernel Trace (alternative to 'state' w/ kernel-level detail)
wtrace - Reverse Kernel Trace (revese kernel trace, stop at any wait state)
The trace and wtrace dimensions have a hierarchical organization. Selecting an ancestor, denoted with an ellipsis (...), moves down a hierarchy level. Selecting a non-ancestral item goes to the next dimension of analysis. The h key toggles between this hierarchical view and a flattened view. Pressing the a key on an ancestor aggregates subsequent dimensions for all its descendants.
The waits Dimension
The waits dimension is null if the process was not found to be waiting on anything internal to the IRIS kernel. A non-null value indicates a condition which required the process to wait (to block internally).
It's important to note that these are internal conditions leading to the process waiting outside of the application's direct control. As such, waiting due to a conflicting LOCK command, a $SYSTEM.Event, and the like do not count here.
Nonetheless, many of the values, particularly the more common ones, are things that the application can influence indirectly. For example, if samples show that a key application process is often waiting for diskio, this indicates that the process is waiting to read database blocks from disk and could possibly benefit from parallelization, prefetching, or more database cache. Similarly, a process that samples show is often waiting on inusebufwt is encountering database block collisions that may need investigation at the application level (with the help of the ^BLKCOL utility). The values in this dimension take on the following mnemonic values, which are subject to change in the future:
diskio: waiting for database physical block read
inusebufwt: waiting due to block collision (^BLKCOL utility may help identify application cause)
expand: waiting for database expansion
ecpwait: waiting for an answer from the ECP server
jrniowait: no space in journal buffers, waiting for journal I/O
jrnsyncblk: waiting for journal data to be committed
jrnlckwait: waiting to access journal buffer
mirrorwait: waiting for active backup mirror member
mirrortrouble: blocked due to mirror trouble state
globwait: waiting because of an internal condition blocking global updates
aiowait: waiting for asynchronous disk I/O to complete
wdqwait: waiting for a write cycle to complete
freebuf: global buffers are completely exhausted and waiting for database writes
gfownwait: access to database is blocked
resenqXYZ: waiting on an internal resource XYZ
While many of these correspond to a canonical process state that includes the W letter flag (e.g. GSETW, GORDW, etc) and not all do – diskio is a very common example – and not all cases of the W state flag have an internal reason reflected here (e.g. LOCKW as mentioned above).