Skip to main content

Monitoring Processes Using ^PERFSAMPLE

This topic describes the ^PERFSAMPLE utility, a tool for analyzing InterSystems IRIS® data platform processes. The utility process activity on a live system and presents an easily navigable breakdown of the sampled activity, which can provide insights into your system. For example, you may discover application bottlenecks by checking ECP requests, or identify overall system bottlenecks by reviewing the types of wait events.

To get started, run ^PERFSAMPLE from the %SYS namespace on an InterSystems IRIS instance of interest:

USER>set $namespace = "%SYS"
 
%SYS>do ^PERFSAMPLE

For more general information about InterSystems processes, see Controlling InterSystems IRIS Processes.

Collecting Samples

The following message appears as soon as you run ^PERFSAMPLE:

This utility performs high frequency sampling of processes on the system,
analyzing and counting data points in different ways to understand where
processes are spending most of their time. On ECP Data Servers, this also
offers sampling of the current request being processed and the states of
the ECPSvrW daemons doing the processing.

1) Sample Local Process Activity
2) Sample ECP Server Requests

Option?

If the instance has no incoming ECP connections from ECP clients (application servers), option 1 above is automatically selected.

You are then prompted to enter the following information:

  1. What processes or ECP connections to sample.

  2. Whether to ignore samples where the process is in any of the following states: READ, READW, EVTW, HANG, SLCT, SLCTW, and RUNW. When sampling ECP connections, only events where the ECPSvrW process is non-idle are recorded.

    Selecting YES (the default) reduces the number of events that ^PERFSAMPLE records. When monitoring many processes, this speeds analysis and uses less memory.

  3. Number of samples to collect per second.

  4. Total number of seconds to collect samples.

In the Terminal, the prompts look like:

Enter a list of PIDs, * for all, or ? for ^%SS display: *
Ignore samples where the process appears idle (READ, HANG, etc)?
  Yes =>
Sample rate per second: 1000 =>
Number of seconds to sample: 30 =>

Examining and Analyzing Samples

After collecting the samples, you may view an analysis. An analysis is a summary of one or more dimensions, or components, of the sampled processes. That is, an analysis sorts the sampled information according to the selected dimensions.

This section includes the following:

Use the following keys to navigate within the analyzer:

Up arrow or U Move the selector up
Down arrow or D Move the selector down
Right arrow or Enter Select the current item
Left arrow or Backspace Go back to the previous level
C Cycles through the following count displays:
  • Percent of total
  • Raw count
  • Percent of the current subset
  • (If multiple jobs are sampled) Average number of jobs found concurrently in this state
N or CTRL-D Next page (if multiple pages)
P or CTRL-U Previous page (if multiple pages)
Q Quit ^PERFSAMPLE

The main landing page looks like:

— PERFSAMPLE for Local Process Activity. 1.710949s at 11/17/2020 15:58:31
28479 samples | CPULoad* 0.91
Multiple jobs included: 1290 samples per job
----------------------------'?' for help----------------------------
Select an analysis to view:
New Analysis (press '+' any time)
Using CPU? -> PID -> Process State
Using CPU? -> Routine -> Namespace -> Process State
Process State -> Routine -> PID
Kernel Wait State -> Routine -> PID

Predefined Analysis Example

Below is an example of an analysis that begins with the Process State dimension.

In this example, ^PERFSAMPLE found 76755 samples of processes in a sample-able state (non-idle if the option to ignore idle was selected) out of 319994 total samples:

– PERFSAMPLE for Local Process Activity. 3.89s at 11/17/2020 16:59:59
76755 events in 319994 samples [24.0 %-total] | CPULoad* 8.22
Multiple jobs included: 2191 samples per job
----------------------------'?' for help----------------------------
Process State [24.0 %-total]
GGET [8.46 %-total]
RUN [5.88 %-total]
GDEF [3.16 %-total]
GSETW [1.63 %-total]
BSETW [1.21 %-total]
GDEFW [1.18 %-total]
GGETW [0.931 %-total]
SEMW [0.685 %-total]
GSET [0.311 %-total]
LOCKW [0.144 %-total]
LOCK [0.0644 %-total]
INCRW [0.0641 %-total]
BSET [0.0513 %-total]

Initially, the values appear as a percent of the total number of samples. The most common Process State value sampled in this case was GGET, which represents 8.46% of the total 319994 samples.

Pressing c cycles through how this count is displayed. For example, you can display the above information as a raw count of samples:

Process State [76755]
> GGET [27083]
RUN [18823]
GDEF [10121]

You can also view the information as a percent of qualifying samples (in this case, samples that had a non-idle Process State):

Process State [24.0 %-total]
> GGET [35.3 %-subset]
RUN [24.5 %-subset]
GDEF [13.2 %-subset]

Finally, you can view the average number of jobs concurrently each state:

Process State [24.0 %-total]
GGET [12.4 jobs]
RUN [8.59 jobs]
GDEF [4.62 jobs]

Selecting GGET with the Right Arrow key moves to the next dimension, ordering the values of that dimension for samples where the first dimension had value GGET. You can navigate freely between the dimensions using the arrow keys.

Creating a Custom Analysis

Select the New Analysis option from the main landing page to create a custom analysis. You can also create an custom analysis using one of these shortcuts:

+ key Add a dimension to the current analysis (when in an analysis)
* key Begin a new analysis with the current item as the first dimension.

Adding a new analysis brings you to the following screen:

New Analysis:

Specify a comma-delimited list of dimensions upon which to analyze samples.
For example, "state,ns,rou" means first count each unique state the sampled
processes were in; then for each state, count the namespace from the samples
in that state; and finally for each state->namespace pair, count each unique
routine name. In other words, report on routines by namespace by state.

The following dimensions are available:
cpu - Using CPU? (process state indicates expected CPU use)
ns - Namespace (current namespace)
pid - PID (process ID)
rou - Routine (name of current routine)
state - Process State (process state string, e.g. GSETW)
trace - Kernel Trace (alternative to 'state' w/ kernel-level detail)
waits - Kernel Wait State (kernel-level condition that delayed the process)
wtrace - Reverse Kernel Trace (revese kernel trace, stop at any wait state)

Enter dimension list:

From here, enter the list of dimensions you would like to analyze as described by the prompt. Once you press Enter, you may navigate the analysis as described above.

Analysis Dimensions

The dimensions for analyses are described within the ^PERFSAMPLE tool. This section provides some additional information.

  • cpu - Using CPU? (process state indicates expected CPU use)

    Note:

    The yes or no value for cpu is not a true measure of on-cpu time, but an estimate. ^PERFSAMPLE infers CPU use from the process state, and InterSystems IRIS state tracking may not directly correlate to CPU use.

    If the process is waiting for the OS scheduler to make the CPU available to it as a result of (instantaneous or persistent) over-utilization of the CPU, this can also lead to inaccuracy in cpu.

  • ns - Namespace (current namespace)

  • pid - PID (process ID)

  • rou - Routine (name of current routine)

  • state - Process State (process state string, e.g. GSETW)

  • waits - Kernel Wait State (kernel-level condition that delayed the process). See the following section for more information.

In general, the following dimensions are only useful when troubleshooting with the InterSystems Worldwide Response Center (WRC)Opens in a new window:

  • trace - Kernel Trace (alternative to 'state' w/ kernel-level detail)

  • wtrace - Reverse Kernel Trace (revese kernel trace, stop at any wait state)

The trace and wtrace dimensions have a hierarchical organization. Selecting an ancestor, denoted with an ellipsis (...), moves down a hierarchy level. Selecting a non-ancestral item goes to the next dimension of analysis. The h key toggles between this hierarchical view and a flattened view. Pressing the a key on an ancestor aggregates subsequent dimensions for all its descendants.

The waits Dimension

The waits dimension is null if the process was not found to be waiting on anything internal to the IRIS kernel. A non-null value indicates a condition which required the process to wait (to block internally).

It's important to note that these are internal conditions leading to the process waiting outside of the application's direct control. As such, waiting due to a conflicting LOCK command, a $SYSTEM.Event, and the like do not count here.

Nonetheless, many of the values, particularly the more common ones, are things that the application can influence indirectly. For example, if samples show that a key application process is often waiting for diskio, this indicates that the process is waiting to read database blocks from disk and could possibly benefit from parallelization, prefetching, or more database cache. Similarly, a process that samples show is often waiting on inusebufwt is encountering database block collisions that may need investigation at the application level (with the help of the ^BLKCOL utility). The values in this dimension take on the following mnemonic values, which are subject to change in the future:

  • diskio: waiting for database physical block read

  • inusebufwt: waiting due to block collision (^BLKCOL utility may help identify application cause)

  • expand: waiting for database expansion

  • ecpwait: waiting for an answer from the ECP server

  • jrniowait: no space in journal buffers, waiting for journal I/O

  • jrnsyncblk: waiting for journal data to be committed

  • jrnlckwait: waiting to access journal buffer

  • mirrorwait: waiting for active backup mirror memberOpens in a new window

  • mirrortrouble: blocked due to mirror trouble stateOpens in a new window

  • globwait: waiting because of an internal condition blocking global updates

  • aiowait: waiting for asynchronous disk I/O to complete

  • wdqwait: waiting for a write cycle to complete

  • freebuf: global buffers are completely exhausted and waiting for database writes

  • gfownwait: access to database is blocked

  • resenqXYZ: waiting on an internal resource XYZ

Note:

While many of these correspond to a canonical process state that includes the W letter flag (e.g. GSETW, GORDW, etc) and not all do – diskio is a very common example – and not all cases of the W state flag have an internal reason reflected here (e.g. LOCKW as mentioned above).

FeedbackOpens in a new window