Monitoring Caché Using the cstat Utility

This appendix provides an overview of how to use the cstat utility. It is intended as an introduction for new users and a reference for experienced users.

Important:

When using this utility, you should consult with the InterSystems Worldwide Response Center (WRC)Opens in a new tab for guidance about specifying appropriate cstat options and assistance in interpreting the data produced by the utility.

cstat is a C executable that is distributed with Caché. It is a diagnostic tool for system level problems, including Caché hangs, network problems, and performance issues. When run, cstat attaches to the shared memory segment allocated by Caché at start time, and displays InterSystems’ internal structures and tables in a readable format. The shared memory segment contains the global buffers, lock table, journal buffers, and a wide variety of other memory structures which need to be accessible to all Caché processes. Processes also maintain their own process private memory for their own variables and stack information. The basic display-only options of cstat are fast and non-invasive to Caché.

Caution:

More advanced (undocumented) options may alter shared memory and should be used with care. These advanced options should be used only at the direction of InterSystems Support personnel; for information, contact the InterSystems Worldwide Response Center (WRC)Opens in a new tab.

This appendix contains the following sections covering cstat:

Basics of Running cstat

In the event of a system problem, the cstat report is often the most important tool that InterSystems has to determine the cause of the problem. Use the following guidelines to ensure that the cstat report contains all of the necessary information:

Run cstat at the time of the event.
Use the Diagnostic Report task or CacheHung script unless directed otherwise by InterSystems support personnel.
Check the contents of the cstat report to ensure it is valid.

Since cstat is a separate executable file included with Caché, it is run outside of Caché, at an operating system prompt. Therefore, the details of running it depend on the operating system:

Running cstat with no options is not a common way to run it, but doing so produces a basic report which is the equivalent of running it with the following default options:

-f (global module flags)
-p (PID table)
-q (semaphores)

For information about cstat options, see Running cstat with Options.

Running cstat on Windows

The cstat` executable is located in Caché instance’s \bin directory. Starting with a Windows command prompt running as Administrator, you can run it as follows:

C:\>cd install-dir\bin

C:\install-dir\bin>cstat

If you run cstat from a directory other than the instance’s \bin or \mgr, you must include the -s argument to specify the location of the \mgr directory. For example:

C:\Users>\install-dir\bin\cstat -s\install-dir\mgr

Running cstat on UNIX®

The cstat` executable is located in the Caché instance’s Bin directory. You can run it from another directory, but unless you are in the instance’s mgr or Bin directory, you must include the -s argument to specify the location of the mgr directory. Starting with a Unix® command prompt, running as root, change to the Bin directory or the mgr directory and run the cstat command as follows:

bash-3.00$ ./cstat

From the Caché installation directory, the command would be as follows:

bash-3.00$ ./bin/cstat -smgr

You can also invoke cstat via the ccontrol command, which can be run from any directory as shown in the following example:

bash-3.00$ ccontrol stat Cache_Instance_Name

where Cache_Instance_Name is the name of the Caché instance on which you are running cstat.

Running cstat with Options

Running cstat without options produces a basic report. Generally, you run cstat to obtain specific information. To specify the information you want, add or subtract options, as follows:

To include (turn on) an option, specify a flag followed by a 1 (or other level).
To exclude (turn off) an option, specify a flag followed by a 0.

For example, to include the Global File Table (GFILETAB) section in the cstat report, use the -m1 option:

C:\MyCache\Bin\cstat -m1

or, to turn off the default basic options, use the -a0 option:

C:\CACHE\Bin\cstat -a0

Many options have more detailed levels than 0 and 1. These additional levels are described as having “bits,” which are displayed in decimal as powers of two and control specific types of information about the option. For example, the basic -p option, which displays the PID table, is turned on with a 1; however, using a 2 adds a swcheck column, a 4 adds a pstate column. and so on. These bits can be combined; for example, if you want to see the information displayed by both the 2 and 4 bits, specify -p6. To ask for all bits, use -1, as follows:

bash-3.00$ ./cstat -p-1

In addition, multiple flags can be combined in a single cstat command. For example, the following command turns off the basic options, then turns on all bits for the global module flags and PID table, as well as a detailed level for the GFILETAB:

bash-3.00$ ./cstat -a0 -f-1 -p-1 -m3

It is common for cstat commands to have many flags when you start diagnosing a complex problem; however, the options that make modifications are typically used alone. For example, the -d option requests a process dump; before using this option, you might run cstat with multiple options to identify the process to dump, but when using -d, typically no other options are selected.

The cstat Options table describes the options that you can use with the cstat command.

Note:

For assistance in interpreting the data produced by the cstat options described in this table, contact the InterSystems Worldwide Response Center (WRC)Opens in a new tab.

cstat Options

Option	Description
–a[0/1]	Displays “all” information as described in the Running cstat with Options section of this chapter.
–b[bits]	Displays information about global buffer descriptors blocks (BDBs). You can specify a combination of the following bits: 1 (all) 2 (cluster) 4 (ECP server) 8 (ECP client) 16 (block contents) 64 (check block integrity) 128 (block and LRU summary) Note: See also –l. Running cstat -b64 may require extra time.
–c[bits]	Displays counters, which are statistics on system performance. You can specify a combination of the following bits: 1 (global) 2 (network) 4 (lock) 8 (optim) 16 (terminal) 32 (symtab) 64 (journal) 128 (disk i/o) 256 (cluster) 262144 (bshash) 2097152 (job cmd) 4194304 (sem) 8388608 (async disk i/o) 16777216 (fsync) 33554432 (obj class) 67108864 (wd) 134217728 (bigstr) 268435456 (swd) 536870912 (sort) 1073741824 (symsave) 2147483648 (freeblkpool)
–d[pid,opt]	Creates dump of Caché processes. You can specify the following options: 0 (full); default 1 (partial)
–e[0/1/2]	Displays the Caché system error log (see Caché System Error Log in the “Monitoring Caché Using the Management Portal” chapter); –e2 displays additional process information (in hex).
–f[bits]	Displays global module flags. You can specify a combination of the following bits: 1 (basic) 64 (resources) 128 (with detail) 256 (account detail) 512 (incstrtab) 1024 (audit)
–g[0/1]	Displays ^GLOSTAT information; for information see the “Gathering Global Activity Statistics Using ^GLOSTAT” chapter in this document.
–h	Displays cstat usage information.
–j[0/1/2/3/4/5/6]	Displays the journal system master structure, which lists information about journaling status. –j32 displays mirror server information.
-k	Displays information about prefetch daemons used by the $PREFETCHON function; see $PREFETCHON in the Caché ObjectScript Reference.
–l[bits]	Displays information about least recently used (LRU) global buffer descriptor block (BDB) queue, but not the contents of the BDBs. You can specify a combination of the following bits: 1 (all) 2 (cluster) 4 (ECP server) 8 (ECP client) 16 (block contents) 32, but not 1 (most recently used (MRU) order) Note: See also –b.
–m[0/1/3/4/8/16]	Displays Global File Table (GFILETAB), which contains information about all databases, listed by SFN, that have been mounted since the instance of Caché started up. You can specify a combination of the following bits: 3 (additional details) 4 (volume queues) 8 (disk device id table) 16 (systems remotely mounting this database)
–n[0/1]	Displays information about network structures and local/remote SFN translations; cstat -n-1 also displays namespace structures.
-o1	Clears the resource statistics displayed by cstat -c to reestablish a base situation without rebooting Caché. No output is produced.
–p[bits]	Displays information about processes that are running in Caché. The information is obtained from the process ID table (PIDTAB). You can specify a combination of the following flags: 2 (swcheck) 4 (pstate and %SS) 5 (NT mailbox locks); Windows only 8 (js sum) 16 (js list) 32 (grefcnt info) 64 (gstatebits) 128 (gstate summary) 256 (jrnhib) 512 (transaction summary) 1024 (pidflags) 2048 (pgbdbsav); additionally dumps pgshared table 4096 (freeblk table)
–q[0/1]	Displays information about hibernation semaphores.
-s[dir]	Specifies the directory containing the cstat executable when running the command from other than the mgr or bin directories.
-t[seconds]	Runs cstat repeatedly in a loop every seconds seconds until halted. Only the global module flags section is displayed, as when -f1 is specified.
–u[bits]	Displays information about Caché locks stored in the lock table (see Monitoring Locks in the “Monitoring Caché Using the Management Portal” chapter of this guide). You can specify a combination of the following bits: 1 (summary) 2 (waiters) 4 (intermediate) 8 (detail) 16 (watermark) 32 (buddy memory) 64 (resource info)
-v1	Ensures that the Caché executable associated with the shared memory segment cstat is being run on and the cstat executable are from the same version; if not, cstat will not run.
–w[bits]	Displays information about BDBs in write daemon queues.
–B[0/1]	Displays, in hex, the contents of blocks held in GBFSPECQ.
–C[0/1]	Displays configuration information for inter-job communication (IJC) devices.
–D[secs],[msecs][,0]	Displays resource statistics over an interval of ‘secs’ seconds. Sample block collisions ever ‘msec’ milliseconds. Note: Resource information same as –c. The ^BLKCOL utility, described in the “Monitoring Block Collisions Using ^BLKCOL” chapter of this guide, provides more detailed information about block collisions.
–E[bits]	Displays status of cluster on platforms that support clustering. You can specify a combination of the following bits: 1 (vars) 2 (write daemon locks) 4 (enqinuse) 8/16 (allenq)
–G[bdb]	Displays, in hex, the contents of the global buffer descriptors and the global buffer for a specific buffer descriptor block (BDB). Note: Same as –H except that the information is displayed by BDB.
–H[sfn],[blk]	Displays, in hex, the contents of the global buffer descriptors and the global buffer for a specific system file number (sfn) and block number (blk) pair. Note: Same as –G except that the information is displayed by system file number and block number pair. The block must be in the buffer pool.
–I[0/1]	Displays the incremental backup data structures.
–L[0/1]	Displays the license. Note: Same as ^CKEY and %SYSTEM.License.CKEYOpens in a new tab method.
–M[0/1]	Displays the mailbox log. Note: Disabled by default. A special build is required to capture and log the mailbox messages; additional logging may be required.
–N[value]	Displays ECP network information. You can specify a combination of the following values: 1 (client) 2 (server) 4 (client buffers) 8 (server buffers) 16 (client buffers, in detail) 32 (user jobs awaiting answer) 64 (server answer buffers details) 128 (request global) 256 (server send answer buffer details; not -1) 1024 (dump server received request buffers) 2048 (client trans bitmap) 4096 (client GLO Q) 8192 (request global reference dump, in hex) 65536 (ECP blocks downloaded to clients) 131072 (client released request buffer details; not -1)
–R[value]	Displays information about routine buffers in use (or changing), class control blocks (CCB), and least recently used (LRU) queues. You can specify a combination of the following values: 1 (routine buffers in use) 4 (RCT – changed routine table) 8 (RCT detail) 16 (0x10=all routine buffers) 32 (0x20=LRU Q) 64 (0x40=all CCB’s) 128 (0x80=invalidated CCB’s) 0x100 (invalidated subclasses) 0x200 (buffer address) 0x400 (buffer descriptors) 0x800 (procedure table and cached routines buffer number) 0x1000 (process cached routine names) 0x2040 (CCB’s and CCB details 0x4000 (cls NS cache) 0x6000 (cls NS cache details) 0x8000 (validate shm cls cache) 0x10000 (dump all class hierarchy) 0x20000 (dump all class hierarchy details) 0x40000 (dump process class and routine statistics) 0x80000 (process cached class names)
–S[bits]	Displays information about the cause of a hang based on a self diagnosis of whether or not the system is hung. You can specify a combination of the following bits: 1 (display diagnosis) 2 (partial process dump for suspect jobs) 4 (full process dump for first suspect job and partial dumps for other suspect jobs) Note: In a cluster, this option should be run all cluster members.
–T[0/1]	Displays hex values of many in-memory tables, including National Language Settings (NLS) tables.
–V[pid]	Displays variables that are part of the process memory structures; of limited value unless you have access to the source code. Note: Windows only. Run from the directory that contains the pid.dmp file.
-W	Performs the same function as the Backup.General.ExternalThaw()Opens in a new tab classmethod, and may be used to resume the write daemon after Backup.General.ExternalFreeze()Opens in a new tab has been called in cases in which a new Caché session cannot be started. (See External Backup in the “Backup and Restore” chapter of the Caché Data Integrity Guide for information on the use of these methods.) This option will not unfreeze the write daemon from any hang or suspension caused by anything other than a backup. Use of this option is recorded in the console log.
–X[0/1]	Displays the contents of the device translation table. It is organized by device number and shows both the numeric and plaintext class identifiers.

Viewing cstat Output

cstat data can be viewed immediately (via a terminal) or redirected to an output file (see cstat Text File in this appendix) for later analysis. The most common methods for viewing the data are:

Note:

When Caché is forcibly shut down, cstat is run in order to capture the current state of the system. The output is added to the console log as part of the emergency shutdown procedure.

cstat Text File

cstat reports can be redirected to a file instead of the terminal, which might be useful if you want to collect a set of cstat options that are not provided by one of the Caché tools (Diagnostic Report Task, CacheHung Script, ^pButtons Utility) or if you are having trouble running those tools.

Diagnostic Report Task

The Diagnostic Report task creates an HTML log file containing both basic and advanced information, which can be used by the InterSystems Worldwide Response Center (WRC)Opens in a new tab to resolve system problems. For information about the Diagnostic Report task, including the cstat options that it uses, see the “Using the Caché Diagnostic Report” chapter in this guide.

Note:

The Diagnostic Report task cannot be run on a hung system; if your system is hung, see CacheHung Script in this appendix.

CacheHung Script

The CacheHung script is an OS tool used to collect data on the system when a Caché instance is hung. The name of the script, which is located in the install-dir/Bin directory, is platform-specific, as specified in the following table:

Platform	Script name
Microsoft Windows	CacheHung.cmd
UNIX®/Linux	CacheHung.sh

The CacheHung script should be run with Administrator privileges. Like the Diagnostic Report Task, the CacheHung script runs cstat twice, 30 seconds apart, in case the status is changing, and bundles the reports into an html file together with the other collected data. The cstat reports taken from CacheHung use the following options:

cstat -e2 -f-1 -m-1 -n3 -j5 -g1 -L1 -u-1 -v1 -p-1 -c-1 -q1 -w2 -E-1 -N65535

CacheHung also runs a third cstat using only the -S2 option, which it writes to a separate section of output called “Self-Diagnosis.” The -S2 option causes suspect processes to leave mini-dumps; therefore, running CacheHung is likely to collect information about the specific processes responsible for the hang, whereas simply forcing the instance down does not collect this information.

In addition, CacheHung generates cstat output files that are often very large, in which case they are saved to separate txt files. Remember to check for these files when collecting the output.

^pButtons Utility

The ^pButtons utility collects detailed performance data about a Caché instance and the platform on which it is running. It runs inside Caché for a configurable amount of time, collects samples over the that interval, and generates a report when it finishes. For information about the ^pButtons utility, including the cstat options that it uses, see the “Monitoring Performance Using ^pButtons” chapter in this guide.