ECP Connection States
ECP Connection States
In an operating cluster, an ECP connection can be in one of the following states:
State | Description |
---|---|
Not Connected | The connection is defined but has not been used yet. |
Connection in Progress | The connection is in the process of establishing itself. This is a transitional state that lasts only until the connection is established. |
Normal | The connection is operating normally and has been used recently. |
Trouble | The connection has encountered a problem. If possible, the connection automatically corrects itself. |
Disabled | The connection has been manually disabled by a system administrator. Any application making use of this connection receives a <NETWORK> error. |
The following sections describe each connection state as it relates to application servers or the data server:
Application Server Connection States
The following entries describe the application server side of each of the connection states. The numerical values provided allow you to determine the connection state indicated in a log message; for example, the following message refers to the Application Server Trouble State: jojo96HABER
01/28/24-00:00:11:859 (6552) 2 [SYSTEM MONITOR] ECPClientState Alert: ECP reports Clients state 6
The node is in the state of being initialized (very rare), or has not yet been initialized.
An application server-side ECP connection starts out in the Not Connected state. In this state, there are no ECP daemons for the connection. If an application server process makes a network request, daemons are created for the connection and the connection enters the Connection in Progress state.
In the Connection in Progress state, a network daemon exists for the connection and actively tries to establish a connection to the data server; when the connection is established, it enters the Normal state. While the connection is in the Connection in Progress state, the user process must wait for up to 20 seconds for it to be established. If the connection is not established within that time, the user process receives a <NETWORK> error.
The application server ECP daemon attempts to create a new connection to the data server in the background. If no connection is established within 20 minutes, the connection returns to the Not Connected state and the daemon for the connection goes away.
A connection attempt while in the Connection in Progress state failed. This state persists for a few seconds before transitioning to the Application Server Not Connected state.
An ECP connection is marked Disabled if an administrator declares that it is disabled. In this state, no daemons exist and any network requests that would use that connection immediately receive <NETWORK> errors.
After a connection completes, it enters the Normal (data transfer) state. In this state, the application server-side daemons exist and actively send requests and receive answers across the network. The connection stays in the Normal state until the connection becomes unworkable or until the application server or the data server requests a shutdown of the connection.
If the connection from the application server to the data server encounters problems, the application server ECP connection enters the Trouble state. In this state, application server ECP daemons exist and are actively try to restore the connection. An underlying TCP connection may or may not still exist. The recovery method is similar whether or not the underlying TCP connection gets reset and must be recreated, or if it stops working temporarily.
During the application server Time to wait for recovery timeout (default of 20 minutes), the application server attempts to reconnect to the data server to perform ECP connection recovery. During this interval, existing network requests are preserved, but the originating application server-side user process blocks new network requests, waiting for the connection to resume. If the connection returns within the Time to wait for recovery timeout, it returns to the Normal state and the blocked network requests proceed.
For example, if a data server goes offline, any application server connected to it has its state set to Trouble until the data server becomes available. If the problem is corrected gracefully, a connection’s state reverts to Normal; otherwise, if the trouble state is not recovered, it reverts to Not Connected.
Applications continue running until they require network access. All locally cached data is available to the application while the server is not responding.
Transitional recovery states are part of the Trouble state. If there is no current TCP connection to the data server, and a new connection is established, the application server and data server engage in a recovery protocol which flushes the application server cache, recovers transactions and locks, and returns to the Normal state.
Similarly, if the data server shuts down, either gracefully or as a result of a crash, and then restarts, it enters a short period (approximately 30 seconds) during which it allows application servers to reconnect and recover their existing sessions. Once again, the application server and the data server engage in the recovery protocol.
If connection recovery is not complete within the Time to wait for recovery timeout, the application server gives up on connection recovery. Specifically, the application server returns errors to all pending network requests and changes the connection state to Not Connected. If it has not already done so, the data server rolls back all the transactions and releases all the locks from this application server the next time this application server connects to the data server.
If the recovery is successful, the connection returns to the Normal state and the blocked network requests proceed.
Data Server Connection States
The following sections describe the data server side of each of the connection states:
When an ECP server instance starts up, all incoming ECP connections are in an initial “unassigned” Free state and are available for connections from any application server that is listed in the connection access control list. If a connection from an application server previously existed and has since gone away, but does not require any recovery steps, the connection is placed in the “idle” Free state. The only difference between these two states is that in the idle state, this connection block is already assigned to a particular application server, rather than being available for any application server that passes the access control list.
In the data server Normal state, the application server connection is normal. At any point in the processing of incoming connections, whenever the application server disconnects from the data server (except as part of the data server’s own shutdown sequence), the data server rolls back any pending transactions and releases any incoming locks from that application server, and places the application server connection in the “idle” Free state.
If the application server is not responding, the data server shows a Trouble state. If the data server crashes or shuts down, it remembers the connections that were active at the time of the crash or shutdown. After restarting, the data server waits for a brief time (usually 30 seconds) for application servers to reclaim their sessions (locks and open transactions). If an application server does not complete recovery during this awaiting recovery interval, all pending work on that connection is rolled back and the connection is placed in the “idle” state.
The data server connection is in a recovery state for a very short time when the application server is in the process of reclaiming its session. The data server keeps the application server in trouble state for the Time interval for Troubled state timeout (default is 60 seconds) for it to reclaim the connection; otherwise, it releases the application resources (rolls back all open transactions and releases locks) and then sets the state to Free.