Parameter Description

Reference

net.ipv4.tcp_congestion_control

See BBR.

net.ipv4.tcp_rmem

Contains three values that represent the minimum, default and maximum size of the TCP socket receive buffer.

The minimum represents the smallest receive buffer size guaranteed, even under memory pressure. The minimum value defaults to 1 page or 4096 bytes.

The default value represents the initial size of a TCP sockets receive buffer. This value supersedes net.core.rmem_default used by other protocols. The default value for this setting is 87380 bytes. It also sets the tcp_adv_win_scale and initializes the TCP window size to 65535 bytes.

The maximum represents the largest receive buffer size automatically selected for TCP sockets. This value does not override net.core.rmem_max. The default value for this setting is somewhere between 87380 bytes and 6M bytes based on the amount of memory in the system.

The recommendation is to use the maximum value of 16M bytes or higher (kernel level dependent) especially for 10 Gigabit adapters.

net.ipv4.tcp_wmem

Similar to the net.ipv4.tcp_rmem this parameter consists of 3 values, a minimum, default, and maximum.

The minimum represents the smallest receive buffer size a newly created socket is entitled to as part of its creation. The minimum value defaults to 1 page or 4096 bytes.

The default value represents the initial size of a TCP sockets receive buffer. This value supersedes net.core.rmem_default used by other protocols. It is typically set lower than net.core.wmem_default. The default value for this setting is 16K bytes.

The maximum represents the largest receive buffer size for auto-tuned send buffers for TCP sockets. This value does not override net.core.rmem_max. The default value for this setting is somewhere between 64K bytes and 4M bytes based on the amount of memory available in the system.

The recommendation is to use the maximum value of 16M bytes or higher (kernel level dependent) especially for 10 Gigabit adapters.

net.ipv4.tcp_max_tw_buckets

Specifies the maximum number of sockets in the “time-wait” state allowed to exist at any time. If the maximum value is exceeded, sockets in the “time-wait” state are immediately destroyed and a warning is displayed. This setting exists to thwart certain types of “Denial of Service” attacks. Care should be exercised before lowering this value. When changed, its value should be increased, especially when more memory has been added to the system or when the network demands are high and environment is less exposed to external threats.

The default value is 262,144. When network demands are high and the environment is less exposed to external threats the value can be increased to 450,000.

net.ipv4.tcp_fin_timeout

This parameter determines the length of time an orphaned (unreferenced) connection will wait before it is aborted at the local end. This parameter is especially helpful for when something happens to the remote peer which prevents or excessively delays a response. Since each socket used for connections consumes approximately 1.5K bytes of memory, the kernel must pro-actively abort and purge dead or stale resources.

The default value for this parameter is typically 60 (seconds).

[root@kvmhost ~] # sysctl net.ipv4.tcp_fin_timeout net.ipv4.tcp_fin_timeout = 60

For workloads or systems that generate or support high levels of network traffic, it can be advantageous to more aggressively reclaim dead or stale resources. For these configurations, it is recommended to reduce this value to below 10 (seconds).

Recommendations

Cloudflare (High BDP HTTP requests)

For high bandwidth-delay product sessions, the maximum amount of data on the network at any time (equiv. BDP) is large.

Therefore, a large TCP receive window must be used, which is prone to introduce latency spikes.

The goal is to open the throughput floodgates for high BDP connections while simultaneously ensuring very low HTTP request latency, and Cloudflare achieved it with kernel patching and the following parameters.

net.ipv4.tcp_rmem = 8192 262144 536870912
net.ipv4.tcp_wmem = 4096 16384 536870912
net.ipv4.tcp_adv_win_scale = -2
net.ipv4.tcp_collapse_max_bytes = 6291456
net.ipv4.tcp_notsent_lowat = 131072

Reference: https://blog.cloudflare.com/optimizing-tcp-for-high-throughput-and-low-latency