Monitoring Processes Using ^PERFSAMPLE
This topic describes the ^PERFSAMPLE utility, a tool for analyzing InterSystems IRIS® data platform processes.
The utility performs high frequency sampling of selected processes on the system, and analyzes the data to determine where the processes are spending most of their time. It presents an easily navigable breakdown of the sampled activity, which can provide insights into your system. For example, you may discover application bottlenecks by checking ECP requests, or identify overall system bottlenecks by reviewing the types of wait events.
To get started, run ^PERFSAMPLE in the %SYS namespace:
USER>set $namespace = "%SYS"
%SYS>do ^PERFSAMPLE
Collecting Samples
The following message appears as soon as you run ^PERFSAMPLE:
This utility performs high frequency sampling of processes on the system,
analyzing and counting data points in different ways to understand where
processes are spending most of their time. On ECP Data Servers, this also
offers sampling of the current request being processed and the states of
the ECPSvrW daemons doing the processing.
1) Sample Local Process Activity
2) Sample ECP Server Requests
Option?
If the instance has no incoming ECP connections from ECP clients (application servers), option 1 above is automatically selected.
Then specify the following information at the prompts:
-
Specify the processes or ECP connections to sample.
-
If you are sampling local processes, the prompt looks like this:
Enter a list of PIDs (comma separated), * for all, or ? for ^%SS display.
If you enter ?, the routine displays information on all the currently running processes, with the process ID in the first column, as in the following partial example:
Process Device Namespace Routine CPU,Glob Pr User/Location 7428 CONTROL 0,0 8 1668 WRTDMN 184,2501 9 5168 GARCOL 0,0 8 3512 JRNDMN 1912,401 8 5800 EXPDMN 0,0 8 ...
-
If you are sampling ECP connections, the routine first lists the available connections alphabetically (with connection numbers) as in the following example, and then prompts for input:
The following clients are connected: Conn# Client 9: DBSVR:APP1:IRIS 10: DBSVR:APP2:IRIS 2: DBSVR:APP3:IRIS 1: DBSVR:APP4:IRIS 4: DBSVR:APP5:IRIS Enter *, list of connection numbers (comma separated) or clients.
For ECP connections, you can enter * to select all, a connection number, a client name, or a comma-separated list of connection numbers and/or client names.
For a given client name, you can specify the fully qualified name, like MYSVR:MYAPPHOST1:IRIS, or — if any of those three pieces are sufficient to uniquely identify the connection (as is typically true of the client host name in the middle piece) — then you can abbreviate the name to just that piece.
-
-
Specify whether to ignore samples where the process is in any of the following states: READ, READW, EVTW, HANG, SLCT, SLCTW, and RUNW. When sampling ECP connections, only events where the ECPSvrW process is non-idle are recorded.
Selecting YES (the default) reduces the number of events that ^PERFSAMPLE records. When monitoring many processes, this speeds analysis and uses less memory.
-
Specify the number of samples to collect per second.
-
Specify the total number of seconds to collect samples.
The following shows an example:
Enter a list of PIDs, * for all, or ? for ^%SS display: *
Ignore samples where the process appears idle (READ, HANG, etc)?
Yes =>
Sample rate per second: 1000 =>
Number of seconds to sample: 30 =>
Examining and Analyzing Samples
After collecting the samples, you may view an analysis. An analysis is a summary of one or more dimensions, or components, of the sampled processes. That is, an analysis sorts the sampled information according to the selected dimensions.
This section includes the following:
-
An example of using a predefined analyses.
-
Information about creating a custom analyses.
-
A description of the available analysis dimensions.
Use the following keys to navigate within the analyzer:
Key Input | Navigation Action |
---|---|
Up arrow or U | Move the selector up |
Down arrow or D | Move the selector down |
Right arrow or Enter | Select the current item |
Left arrow or Backspace | Go back to the previous level |
C | Cycles through the following count displays:
|
N or CTRL-D | Next page (if multiple pages) |
P or CTRL-U | Previous page (if multiple pages) |
Q | Quit ^PERFSAMPLE |
The main landing page looks like:
— PERFSAMPLE for Local Process Activity. 1.710949s at 11/17/2020 15:58:31
28479 samples | CPULoad* 0.91
Multiple jobs included: 1290 samples per job
----------------------------'?' for help----------------------------
Select an analysis to view:
New Analysis (press '+' any time)
Using CPU? -> PID -> Process State
Using CPU? -> Routine -> Namespace -> Process State
Process State -> Routine -> PID
Kernel Wait State -> Routine -> PID
Predefined Analysis Example
Below is an example of an analysis that begins with the Process State dimension.
In this example, ^PERFSAMPLE found 76755 samples of processes in a sample-able state (non-idle if the option to ignore idle was selected) out of 319994 total samples:
– PERFSAMPLE for Local Process Activity. 3.89s at 11/17/2020 16:59:59
76755 events in 319994 samples [24.0 %-total] | CPULoad* 8.22
Multiple jobs included: 2191 samples per job
----------------------------'?' for help----------------------------
Process State [24.0 %-total]
GGET [8.46 %-total]
RUN [5.88 %-total]
GDEF [3.16 %-total]
GSETW [1.63 %-total]
BSETW [1.21 %-total]
GDEFW [1.18 %-total]
GGETW [0.931 %-total]
SEMW [0.685 %-total]
GSET [0.311 %-total]
LOCKW [0.144 %-total]
LOCK [0.0644 %-total]
INCRW [0.0641 %-total]
BSET [0.0513 %-total]
Initially, the values appear as a percent of the total number of samples. The most common Process State value sampled in this case was GGET, which represents 8.46% of the total 319994 samples.
Pressing c cycles through how this count is displayed. For example, you can display the above information as a raw count of samples:
Process State [76755]
> GGET [27083]
RUN [18823]
GDEF [10121]
You can also view the information as a percent of qualifying samples (in this case, samples that had a non-idle Process State):
Process State [24.0 %-total]
> GGET [35.3 %-subset]
RUN [24.5 %-subset]
GDEF [13.2 %-subset]
Finally, you can view the average number of jobs concurrently each state:
Process State [24.0 %-total]
GGET [12.4 jobs]
RUN [8.59 jobs]
GDEF [4.62 jobs]
Selecting GGET with the Right Arrow key moves to the next dimension, ordering the values of that dimension for samples where the first dimension had value GGET. You can navigate freely between the dimensions using the arrow keys.
Creating a Custom Analysis
Select the New Analysis option from the main landing page to create a custom analysis. You can also create an custom analysis using one of these shortcuts:
Key Input | Shortcut |
---|---|
+ key | Add a dimension to the current analysis (when in an analysis) |
* key | Begin a new analysis with the current item as the first dimension |
Adding a new analysis brings you to the following screen:
New Analysis:
Specify a comma-delimited list of dimensions upon which to analyze samples.
For example, "state,ns,rou" means first count each unique state the sampled
processes were in; then for each state, count the namespace from the samples
in that state; and finally for each state->namespace pair, count each unique
routine name. In other words, report on routines by namespace by state.
The following dimensions are available:
cpu - Using CPU? (process state indicates expected CPU use)
ns - Namespace (current namespace)
pid - PID (process ID)
rou - Routine (name of current routine)
state - Process State (process state string, e.g. GSETW)
trace - Kernel Trace (alternative to 'state' w/ kernel-level detail)
waits - Kernel Wait State (kernel-level condition that delayed the process)
wtrace - Reverse Kernel Trace (revese kernel trace, stop at any wait state)
Enter dimension list:
From here, enter the list of dimensions you would like to analyze as described by the prompt. Once you press Enter, you may navigate the analysis as described above.
Analysis Dimensions
The dimensions for analyses are described within the ^PERFSAMPLE tool. This section provides some additional information.
-
cpu - Using CPU? (process state indicates expected CPU use)
Note:The yes or no value for cpu is not a true measure of on-cpu time, but an estimate. ^PERFSAMPLE infers CPU use from the process state, and InterSystems IRIS state tracking may not directly correlate to CPU use.
If the process is waiting for the OS scheduler to make the CPU available to it as a result of (instantaneous or persistent) over-utilization of the CPU, this can also lead to inaccuracy in cpu.
-
ns - Namespace (current namespace)
-
pid - PID (process ID)
-
rou - Routine (name of current routine)
-
state - Process State (process state string, e.g. GSETW)
-
waits - Kernel Wait State (kernel-level condition that delayed the process). See the following section for more information.
In general, the following dimensions are only useful when troubleshooting with the InterSystems Worldwide Response Center (WRC)Opens in a new tab:
-
trace - Kernel Trace (alternative to 'state' w/ kernel-level detail)
-
wtrace - Reverse Kernel Trace (revese kernel trace, stop at any wait state)
The trace and wtrace dimensions have a hierarchical organization. Selecting an ancestor, denoted with an ellipsis (...), moves down a hierarchy level. Selecting a non-ancestral item goes to the next dimension of analysis. The h key toggles between this hierarchical view and a flattened view. Pressing the a key on an ancestor aggregates subsequent dimensions for all its descendants.
The waits Dimension
The waits dimension is null if the process was not found to be waiting on anything internal to the InterSystems IRIS kernel. A non-null value indicates a condition which required the process to wait (to block internally).
It's important to note that these are internal conditions leading to the process waiting outside of the application's direct control. As such, waiting due to a conflicting LOCK command, a $SYSTEM.Event, and the like do not count here.
Nonetheless, many of the values, particularly the more common ones, are things that the application can influence indirectly. For example, if samples show that a key application process is often waiting for diskio, this indicates that the process is waiting to read database blocks from disk and could possibly benefit from parallelization, prefetching, or more database cache. Similarly, a process that samples show is often waiting on inusebufwt is encountering database block collisions that may need investigation at the application level (with the help of the ^BLKCOL utility). The values in this dimension take on the following mnemonic values, which are subject to change in the future:
-
diskio: waiting for database physical block read
-
inusebufwt: waiting due to block collision (^BLKCOL utility may help identify application cause)
-
expand: waiting for database expansion
-
ecpwait: waiting for an answer from the ECP server
-
jrniowait: no space in journal buffers, waiting for journal I/O
-
jrnsyncblk: waiting for journal data to be committed
-
jrnlckwait: waiting to access journal buffer
-
mirrorwait: waiting for active backup mirror member
-
mirrortrouble: blocked due to mirror trouble state
-
globwait: waiting because of an internal condition blocking global updates
-
aiowait: waiting for asynchronous disk I/O to complete
-
wdqwait: waiting for a write cycle to complete
-
freebuf: global buffers are completely exhausted and waiting for database writes
-
gfownwait: access to database is blocked
-
resenqXYZ: waiting on an internal resource XYZ
While many of these correspond to a canonical process state that includes the W letter flag (e.g. GSETW, GORDW, etc) and not all do – diskio is a very common example – and not all cases of the W state flag have an internal reason reflected here (e.g. LOCKW as mentioned above).
Save Analysis
After viewing the samples, you may save them for future analysis. To do so, press the Left Arrow from the analysis landing page. This returns you to the initial Collecting Samples page, but with the additional option to Save Samples to File. Select this option and enter the desired filename, such as perfsample001.txt. ^PERFSAMPLE saves the file to the install-dir\mgr directory.
To open a saved analysis, launch ^PERFSAMPLE using the LOAD tag and specify the file to open. For example:
USER>set $namespace = "%SYS"
%SYS>do LOAD^PERFSAMPLE
File: C:\MyIRIS\mgr\perfsample001.txt
^PERFSAMPLE loads the file, allowing you to analyze and examine the saved samples.
See Also
-
Controlling InterSystems IRIS Processes (general information on InterSystems processes)
-
^BLKCOL utility (for monitoring block collisions, which occur when a process is forced to wait for access to a block)