Case Study: F5 Load Balancer and TCP Idle Timer / fastL4 Profile
This describes a problem whereby a client connects to a server then waits for a report to complete before retrieving it. The report took longer than 5 minutes to complete and the TCP session remained idle whilst the client waited. After a while the TCP connection dropped.
Packet traces were taken at the client, server and intermediate points, which included an F5 load balancer which simply acted as a router. The analysis of the packet traces revealed some interesting things..
What was happening was that the TCP 3-way handshake completed to setup the TCP session. Then the client sends an HTTP GET requestor method (of TCP segment length 734 bytes) to submit the data, which is then received by a client-side firewall. The firewall then forwards it onwards towards the server in the direction of an F5 load balancer …
BUT: The HTTP GET doesn’t seem to arrive at the F5. The server-side firewall however, DID receive the GET and forwards it onto the application server, which then sends back an ACK to the client – which DOES go via the F5. Huh?
It was initially thought that the The F5 therefore saw an ACK packet for a TCP segment that it hasn’t seen, so it sends a RST packet in both directions to tear down the TCP session. This is a little confusing because the TCP session goes through the F5 but the HTTP GET request seemingly bypasses the F5 but does arrive at the server. After a bit of head-scratching and furrowed brows because it made no sense. The delay. The fact that there clearly wasn’t any asymmetry anyway because the ACK came back via the same path. So why the reset?
Further investigation (i.e. Googling the F5 Knowledge Base) revealed that a fastL4 profile might explain the absense of the HTTP GET request because tcpdump on the F5 sometimes doesn’t catch all of the packets (See SOL1433 and also when PVA chips are used see SOL6546 ).
BUT the article also revealed that the fastL4 profile has a tuneable TCP idle timer of 5 minutes after which it would send a RST in both directions. This is exactly what was happening:
Furthermore the packet capture showed that the RST packet from the F5 was after 5 minutes (300 seconds) of Inactivity:
ABOUT F5 LTM PROFILES:
Forwarding virtual servers allow traffic to connect through the F5 LTM to specific destinations. They have an attribute called a fastL4 profile that defines the settings for layer 2-4 traffic:
- Connection Idle Timeout of 300 seconds – If an established session does not send a packet within this time the sessions is timed out on the LTM.
- Reset on Timeout – When a session times out TCP resets are sent to client and server to terminate the connection.
- Loose Initiation disabled by default – With this settings being disabled by default, this means that TCP session can only be established by proper TCP handshake with the initial packet having the SYN flag set
Our F5 had a fastL4 profile on the Common partition and the default values were set:
IN MY OPINION it would certainly be worth modifying this to disable Reset on Timeout and to enable Loose Initiation. However sometimes this change might not be approved (which happened in my case) so therefore the solution was to increase the value in the fastL4 profile to a value above that of the end-points. rather than guess the value or rely on Googling the defaults, it is always good to check the actual settings in case they have been modified:
How to determine the TCP Socket timeout:
[dstilogon1@ukcmutacd ~]$ sudo cat /proc/sys/net/ipv4/tcp_keepalive_time
7200
[dstilogon1@ukcmutacd ~]$
7200 = 2 Hours
How to determine the socket connection up time on Linux:
1. Get pid of the socket with netstat.
[dstilogon1@ukcmutacd ~]$ sudo netstat -plan | grep 129.0.52.74
tcp 0 52 172.23.185.41:22 129.0.52.74:24937 ESTABLISHED 27014/sshd
[dstilogon1@ukcmutacd ~]$
2. Check process details with ps.
[dstilogon1@ukcmutacd ~]$ sudo ps -eo uid,pid,etime | grep 27014
0 27014 13:26
[dstilogon1@ukcmutacd ~]$
The above values are UID PID ELAPSED
Determining The Number of TCP Connections for Each IP Address:
[dstilogon1@ukcmutacd ~]$ netstat -ntu | awk ‘{print $5}’ | cut -d: -f1 | sort | uniq -c | sort -n
1 129.0.52.74
1 23.61.255.225
1 Address
1 servers)
[dstilogon1@ukcmutacd ~]$
To verify the TCP Socket (per-session) Idle Timer on the end-points (Windows client and Linux server):
TCP settings can be found on /proc/sys/net/ipv4 . Here are some other tuneable values:
tcp_keepalive_probes : Number of KEEPALIVE probes sent before the connection is reset.
tcp_keepalive_time : Frequency of KEEPALIVE messages. The default is 7200 (2 hours).
tcp_syn_retries : Number of SYNs for a TCP connection establishment (outbound connections)
tcp_retries1 : Frequency of ACKs to a TCP SYN. (inbound connections)
tcp_fin_timeout : Number of seconds before receiving the final FIN before the socket is closed. (DDoS protection)
You can change the values by updating the files in /proc/sys/net/ipv4 or sysctl .
To make it permanent add it to /etc/sysctl.conf.
The configuration might not contain these:
[dstilogon1@ukcmutacd ~]$ sudo cat /etc/sysctl.conf | grep net.ipv4
net.ipv4.ip_forward = 0
net.ipv4.conf.default.rp_filter = 1
net.ipv4.conf.default.accept_source_route = 0
net.ipv4.tcp_syncookies = 1
[dstilogon1@ukcmutacd ~]$
So you can add them. DEFAULTS VALUES ARE:
# vi /etc/sysctl
net.ipv4.tcp_fin_timeout = 60
net.ipv4.tcp_retries1 = 3
net.ipv4.tcp_keepalive_probes = 9
net.ipv4.tcp_keepalive_time = 7200
net.ipv4.tcp_syn_retries = 5
#
If you needed to alter the timeouts on TCP sockets, modify /proc/sys/net/ipv4/tcp_keepalive_time to setup new value.
The number of seconds a connection needs to be idle before TCP begins sending out keep-alive probes. Keep-alives are only sent when the SO_KEEPALIVE socket option is enabled. The default value is 7200 seconds (2 hours).
For example set value to 2400 seconds:
echo 2400 > /proc/sys/net/ipv4/tcp_keepalive_time
You can make changes to /proc filesystem permanently using /etc/sysctl.conf
HOW TO DETERMINE TCP IDLE TIMER ON WINDOWS:
Microsoft Windows TCP Idle Timer:
KeepAliveTime
Key: Tcpip\Parameters
Value Type: REG_DWORD—time in milliseconds
Valid Range: 1–0xFFFFFFFF
Default: 7,200,000 (two hours)
Description: The parameter controls how often TCP attempts to verify that an idle connection is still intact by sending a keep-alive packet. If the remote system is still reachable and functioning, it acknowledges the keep-alive transmission. Keep-alive packets are not sent by default. This feature may be enabled on a connection by an application.
All of the TCP/IP parameters are registry values located under the registry key
HKEY_LOCAL_MACHINE
\SYSTEM
\CurrentControlSet
\Services:
\Tcpip
\Parameters
Adapter-specific values are listed under subkeys for each adapter identified by the adapter’s globally unique identifier (GUID).
To determine the GUID value for an adapter corresponding to a LAN connection in the Network Connections folder, do the following:
Open the Network Connections folder and note the name of the LAN connection, such as “Local Area Connection.”
Click Start, click Run, type regedit.exe, and then click OK.
Use the tree view (the left pane) of the Registry Editor tool to open the following key: HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Network\{4D36E972-E325-11CE-BFC1-08002BE10318}
Under this key are one or more keys for the globally unique identifiers (GUIDs) corresponding to the installed LAN connections. Each of these GUID keys has a Connection subkey. Open each of the GUID\Connection keys and look for the Name setting in the contents pane whose value matches the name of your LAN connection from step 1.
When you have found the GUID\Connection key that contains the Name setting that matches the name of your LAN connection, write down or otherwise note the GUID value.
Depending on whether the system or adapter is DHCP-configured or static override values are specified, parameters may have both DHCP and statically configured values. If any of these parameters are changed using the registry editor, a restart of the system is generally required for the change to take effect. A restart is usually not required if values are changed using the Network Connections folder.