{"id":2146,"date":"2015-03-06T10:42:55","date_gmt":"2015-03-06T10:42:55","guid":{"rendered":"http:\/\/mccltd.net\/blog\/?p=2146"},"modified":"2015-07-30T19:39:02","modified_gmt":"2015-07-30T18:39:02","slug":"case-study-f5-load-balancer-and-tcp-idle-timer-fastl4-profile","status":"publish","type":"post","link":"http:\/\/darenmatthews.com\/blog\/?p=2146","title":{"rendered":"Case Study: F5 Load Balancer and TCP Idle Timer \/ fastL4 Profile"},"content":{"rendered":"<p><strong>This describes a problem<\/strong> whereby a client connects to a server then waits for a report to complete before retrieving it.\u00a0 The report took longer than 5 minutes to complete and the TCP session remained idle whilst the client waited.\u00a0 After a while the TCP connection dropped.<\/p>\n<p>Packet traces were taken at the client, server and intermediate points, which included an F5 load balancer which simply acted as a router. The analysis of the packet traces revealed some interesting things..<\/p>\n<p>What was happening was that the TCP 3-way handshake completed to setup the TCP session.\u00a0 Then the client sends an HTTP GET requestor method (of TCP segment length 734 bytes) to submit the data, which is then received by a client-side firewall.\u00a0 The firewall then forwards it onwards towards the server in the direction of an F5 load balancer &#8230;<\/p>\n<p>BUT: The HTTP GET <strong>doesn\u2019t <span style=\"text-decoration: underline;\">seem<\/span> to arrive at the F5<\/strong>.\u00a0 The server-side firewall however, DID receive the GET and forwards it onto the application server, which then sends back an ACK to the client &#8211; which <strong>DOES go via the F5<\/strong>. Huh?<\/p>\n<p>It was initially thought that the The F5 therefore saw an ACK packet for a TCP segment that it hasn\u2019t seen, so it sends a RST packet in both directions to tear down the TCP session. This is a little confusing because the TCP session goes through the F5 but the HTTP GET request <em>seemingly<\/em> bypasses the F5 but <em>does<\/em> arrive at the server. After a bit of\u00a0head-scratching and furrowed brows because it made no sense. \u00a0The delay. The fact that there clearly wasn&#8217;t any asymmetry anyway because the ACK came back via the same path. So why the reset? <!--more--><\/p>\n<p>Further investigation (i.e. Googling the F5 Knowledge Base) revealed that a fastL4 profile might explain the absense of the HTTP GET request because tcpdump on the F5 sometimes doesn&#8217;t catch all of the packets (<a title=\"SOL1433\" href=\"https:\/\/support.f5.com\/kb\/en-us\/solutions\/public\/14000\/300\/sol14335\" target=\"_blank\">See SOL1433<\/a>\u00a0and also when PVA chips are used <a title=\"SOL6546\" href=\"https:\/\/support.f5.com\/kb\/en-us\/solutions\/public\/6000\/500\/sol6546.html\" target=\"_blank\">see SOL6546<\/a> ).<\/p>\n<p>BUT the article also revealed that the fastL4 profile has a tuneable TCP idle timer of 5 minutes after which it would send a RST in both directions.\u00a0 This is exactly what was happening:<\/p>\n<p><a href=\"http:\/\/darenmatthews.com\/blog\/wp-content\/uploads\/2015\/03\/F5-Reset-diagram.jpg\"><img loading=\"lazy\" class=\"aligncenter size-full wp-image-2147\" src=\"http:\/\/darenmatthews.com\/blog\/wp-content\/uploads\/2015\/03\/F5-Reset-diagram.jpg\" alt=\"F5-Reset-diagram\" width=\"1083\" height=\"545\" srcset=\"http:\/\/darenmatthews.com\/blog\/wp-content\/uploads\/2015\/03\/F5-Reset-diagram.jpg 1083w, http:\/\/darenmatthews.com\/blog\/wp-content\/uploads\/2015\/03\/F5-Reset-diagram-300x150.jpg 300w, http:\/\/darenmatthews.com\/blog\/wp-content\/uploads\/2015\/03\/F5-Reset-diagram-1024x515.jpg 1024w\" sizes=\"(max-width: 1083px) 100vw, 1083px\" \/><\/a><\/p>\n<p>Furthermore the packet capture showed that the RST packet from the F5 was after 5 minutes (300 seconds) of Inactivity:<\/p>\n<p><a href=\"http:\/\/darenmatthews.com\/blog\/wp-content\/uploads\/2015\/03\/wshark.jpg\"><img loading=\"lazy\" class=\"aligncenter size-full wp-image-2148\" src=\"http:\/\/darenmatthews.com\/blog\/wp-content\/uploads\/2015\/03\/wshark.jpg\" alt=\"wshark\" width=\"699\" height=\"298\" srcset=\"http:\/\/darenmatthews.com\/blog\/wp-content\/uploads\/2015\/03\/wshark.jpg 699w, http:\/\/darenmatthews.com\/blog\/wp-content\/uploads\/2015\/03\/wshark-300x127.jpg 300w\" sizes=\"(max-width: 699px) 100vw, 699px\" \/><\/a><\/p>\n<p><strong>ABOUT F5 LTM PROFILES:<\/strong><\/p>\n<p>Forwarding virtual servers allow traffic to connect through the F5 LTM to specific destinations. They have an attribute called a fastL4 profile that defines the settings for layer 2-4 traffic:<\/p>\n<ul>\n<li><strong>Connection Idle Timeout of 300 seconds<\/strong> \u2013 If an established session does not send a packet within this time the sessions is timed out on the LTM.<\/li>\n<li><strong>Reset on Timeout<\/strong> \u2013 When a session times out TCP resets are sent to client and server to terminate the connection.<\/li>\n<li><strong>Loose Initiation<\/strong> disabled by default \u2013 With this settings being disabled by default, this means that TCP session can only be established by proper TCP handshake with the initial packet having the SYN flag set<\/li>\n<\/ul>\n<p>Our F5 had a fastL4 profile on the Common partition and the default values were set:<\/p>\n<p><a href=\"http:\/\/darenmatthews.com\/blog\/wp-content\/uploads\/2015\/03\/F5-fastL4.jpg\"><img loading=\"lazy\" class=\"aligncenter size-full wp-image-2149\" src=\"http:\/\/darenmatthews.com\/blog\/wp-content\/uploads\/2015\/03\/F5-fastL4.jpg\" alt=\"F5-fastL4\" width=\"768\" height=\"953\" srcset=\"http:\/\/darenmatthews.com\/blog\/wp-content\/uploads\/2015\/03\/F5-fastL4.jpg 768w, http:\/\/darenmatthews.com\/blog\/wp-content\/uploads\/2015\/03\/F5-fastL4-241x300.jpg 241w\" sizes=\"(max-width: 768px) 100vw, 768px\" \/><\/a><\/p>\n<p><strong>IN MY OPINION<\/strong> it would certainly be worth modifying this to <em><span style=\"text-decoration: underline;\"><strong>disable Reset on Timeout and to enable Loose Initiation<\/strong><\/span><\/em>. However sometimes this change might not be approved (which happened in my case) so therefore the solution was to increase the value in the fastL4 profile to a value above that of the end-points.\u00a0 rather than guess the value or rely on Googling the defaults, it is always good to check the actual settings in case they have been modified:<\/p>\n<p><strong>How to determine the TCP Socket timeout:<\/strong><\/p>\n<p>[dstilogon1@ukcmutacd ~]$ sudo cat \/proc\/sys\/net\/ipv4\/tcp_keepalive_time<br \/>\n7200<br \/>\n[dstilogon1@ukcmutacd ~]$<\/p>\n<p>7200 = 2 Hours<\/p>\n<p><strong>How to determine the socket connection up time on Linux:<\/strong><\/p>\n<p>1. Get pid of the socket with netstat.<br \/>\n[dstilogon1@ukcmutacd ~]$ sudo netstat -plan | grep 129.0.52.74<br \/>\ntcp\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 0\u00a0\u00a0\u00a0\u00a0 52 172.23.185.41:22\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 129.0.52.74:24937\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 ESTABLISHED 27014\/sshd<br \/>\n[dstilogon1@ukcmutacd ~]$<\/p>\n<p>2. Check process details with ps.<br \/>\n[dstilogon1@ukcmutacd ~]$ sudo ps -eo uid,pid,etime | grep 27014<br \/>\n0 27014\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 13:26<br \/>\n[dstilogon1@ukcmutacd ~]$<\/p>\n<p>The above values are UID PID ELAPSED<\/p>\n<p><strong>Determining The Number of TCP Connections for Each IP Address:<\/strong><\/p>\n<p>[dstilogon1@ukcmutacd ~]$ netstat -ntu | awk &#8216;{print $5}&#8217; | cut -d: -f1 | sort | uniq -c | sort -n<br \/>\n1 129.0.52.74<br \/>\n1 23.61.255.225<br \/>\n1 Address<br \/>\n1 servers)<br \/>\n[dstilogon1@ukcmutacd ~]$<\/p>\n<p><strong>To verify the TCP Socket (per-session) Idle Timer on the end-points (Windows client and Linux server):<\/strong><\/p>\n<p>TCP settings can be found on \/proc\/sys\/net\/ipv4 . Here are some other tuneable values:<\/p>\n<p>tcp_keepalive_probes : Number of KEEPALIVE probes sent before the connection is reset.<br \/>\ntcp_keepalive_time : Frequency of KEEPALIVE messages. The default is 7200 (2 hours).<br \/>\ntcp_syn_retries : Number of SYNs for a TCP connection establishment (outbound connections)<br \/>\ntcp_retries1 : Frequency of ACKs to a TCP SYN. (inbound connections)<br \/>\ntcp_fin_timeout : Number of seconds before receiving the final FIN before the socket is closed. (DDoS protection)<\/p>\n<p>You can change the values by updating the files in \/proc\/sys\/net\/ipv4 or sysctl .<br \/>\nTo make it permanent add it to \/etc\/sysctl.conf.<\/p>\n<p>The configuration might not contain these:<br \/>\n[dstilogon1@ukcmutacd ~]$ sudo cat \/etc\/sysctl.conf\u00a0 | grep net.ipv4<br \/>\nnet.ipv4.ip_forward = 0<br \/>\nnet.ipv4.conf.default.rp_filter = 1<br \/>\nnet.ipv4.conf.default.accept_source_route = 0<br \/>\nnet.ipv4.tcp_syncookies = 1<br \/>\n[dstilogon1@ukcmutacd ~]$<\/p>\n<p>So you can add them. DEFAULTS VALUES ARE:<\/p>\n<p># vi \/etc\/sysctl<br \/>\nnet.ipv4.tcp_fin_timeout = 60<br \/>\nnet.ipv4.tcp_retries1 = 3<br \/>\nnet.ipv4.tcp_keepalive_probes = 9<br \/>\nnet.ipv4.tcp_keepalive_time = 7200<br \/>\nnet.ipv4.tcp_syn_retries = 5<br \/>\n#<\/p>\n<p>If you needed to alter the timeouts on TCP sockets, modify \/proc\/sys\/net\/ipv4\/tcp_keepalive_time to setup new value.<\/p>\n<p>The number of seconds a connection needs to be idle before TCP begins sending out keep-alive probes. Keep-alives are only sent when the SO_KEEPALIVE socket option is enabled. The default value is 7200 seconds (2 hours).<\/p>\n<p>For example set value to 2400 seconds:<br \/>\necho 2400 &gt; \/proc\/sys\/net\/ipv4\/tcp_keepalive_time<\/p>\n<p>You can make changes to \/proc filesystem permanently using \/etc\/sysctl.conf<\/p>\n<p><strong>HOW TO DETERMINE TCP IDLE TIMER ON WINDOWS:<\/strong><\/p>\n<p>Microsoft Windows TCP Idle Timer:<\/p>\n<p>KeepAliveTime<\/p>\n<p>Key: Tcpip\\Parameters<\/p>\n<p>Value Type: REG_DWORD\u2014time in milliseconds<\/p>\n<p>Valid Range: 1\u20130xFFFFFFFF<\/p>\n<p>Default: 7,200,000 (two hours)<\/p>\n<p>Description: The parameter controls how often TCP attempts to verify that an idle connection is still intact by sending a keep-alive packet. If the remote system is still reachable and functioning, it acknowledges the keep-alive transmission. Keep-alive packets are not sent by default. This feature may be enabled on a connection by an application.<\/p>\n<p>All of the TCP\/IP parameters are registry values located under the registry key<\/p>\n<p>HKEY_LOCAL_MACHINE<\/p>\n<p>\\SYSTEM<\/p>\n<p>\\CurrentControlSet<\/p>\n<p>\\Services:<\/p>\n<p>\\Tcpip<\/p>\n<p>\\Parameters<\/p>\n<p>Adapter-specific values are listed under subkeys for each adapter identified by the adapter&#8217;s globally unique identifier (GUID).<\/p>\n<p>To determine the GUID value for an adapter corresponding to a LAN connection in the Network Connections folder, do the following:<\/p>\n<p>Open the Network Connections folder and note the name of the LAN connection, such as &#8220;Local Area Connection.&#8221;<\/p>\n<p>Click Start, click Run, type regedit.exe, and then click OK.<\/p>\n<p>Use the tree view (the left pane) of the Registry Editor tool to open the following key: HKEY_LOCAL_MACHINE\\SYSTEM\\CurrentControlSet\\Control\\Network\\{4D36E972-E325-11CE-BFC1-08002BE10318}<\/p>\n<p>Under this key are one or more keys for the globally unique identifiers (GUIDs) corresponding to the installed LAN connections. Each of these GUID keys has a Connection subkey. Open each of the GUID\\Connection keys and look for the Name setting in the contents pane whose value matches the name of your LAN connection from step 1.<\/p>\n<p>When you have found the GUID\\Connection key that contains the Name setting that matches the name of your LAN connection, write down or otherwise note the GUID value.<\/p>\n<p>Depending on whether the system or adapter is DHCP-configured or static override values are specified, parameters may have both DHCP and statically configured values. If any of these parameters are changed using the registry editor, a restart of the system is generally required for the change to take effect. A restart is usually not required if values are changed using the Network Connections folder.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>This describes a problem whereby a client connects to a server then waits for a report to complete before retrieving it.\u00a0 The report took longer than 5 minutes to complete and the TCP session remained idle whilst the client waited.\u00a0 After a while the TCP connection dropped. Packet traces were taken at the client, server [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":[],"categories":[84],"tags":[80],"_links":{"self":[{"href":"http:\/\/darenmatthews.com\/blog\/index.php?rest_route=\/wp\/v2\/posts\/2146"}],"collection":[{"href":"http:\/\/darenmatthews.com\/blog\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"http:\/\/darenmatthews.com\/blog\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"http:\/\/darenmatthews.com\/blog\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"http:\/\/darenmatthews.com\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=2146"}],"version-history":[{"count":13,"href":"http:\/\/darenmatthews.com\/blog\/index.php?rest_route=\/wp\/v2\/posts\/2146\/revisions"}],"predecessor-version":[{"id":2211,"href":"http:\/\/darenmatthews.com\/blog\/index.php?rest_route=\/wp\/v2\/posts\/2146\/revisions\/2211"}],"wp:attachment":[{"href":"http:\/\/darenmatthews.com\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=2146"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"http:\/\/darenmatthews.com\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=2146"},{"taxonomy":"post_tag","embeddable":true,"href":"http:\/\/darenmatthews.com\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=2146"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}