The network subsystem is complex and its tuning highly depends on the system use scenario and on external factors such as software clients or hardware components (switches, routers, or gateways) in your network. The Linux kernel aims more at reliability and low latency than low overhead and high throughput. Other settings can mean less security, but better performance.
Most of modern networking is based on the TCP/IP protocol and a socket interface for communication; for more information about TCP/IP, see Book “Reference”, Chapter 13 “Basic networking”. The Linux kernel handles data it receives or sends via the socket interface in socket buffers. These kernel socket buffers are tunable.
Since kernel version 2.6.17 full autotuning with 4 MB maximum buffer size exists. This means that manual tuning does not improve networking performance considerably. It is often the best not to touch the following variables, or, at least, to check the outcome of tuning efforts carefully.
If you update from an older kernel, it is recommended to remove manual TCP tunings in favor of the autotuning feature.
The special files in the /proc
file system can
modify the size and behavior of kernel socket buffers; for general
information about the /proc
file system, see
Section 2.6, “The /proc
file system”. Find networking related files in:
/proc/sys/net/core /proc/sys/net/ipv4 /proc/sys/net/ipv6
General net
variables are explained in the
kernel documentation
(linux/Documentation/sysctl/net.txt
). Special
ipv4
variables are explained in
linux/Documentation/networking/ip-sysctl.txt
and
linux/Documentation/networking/ipvs-sysctl.txt
.
In the /proc
file system, for example, it is
possible to either set the Maximum Socket Receive Buffer and Maximum
Socket Send Buffer for all protocols, or both these options for the TCP
protocol only (in ipv4
) and thus overriding the
setting for all protocols (in core
).
/proc/sys/net/ipv4/tcp_moderate_rcvbuf
If /proc/sys/net/ipv4/tcp_moderate_rcvbuf
is set
to 1
, autotuning is active and buffer size is
adjusted dynamically.
/proc/sys/net/ipv4/tcp_rmem
The three values setting the minimum, initial, and maximum size of the Memory Receive Buffer per connection. They define the actual memory usage, not only TCP window size.
/proc/sys/net/ipv4/tcp_wmem
The same as tcp_rmem
, but for Memory Send Buffer
per connection.
/proc/sys/net/core/rmem_max
Set to limit the maximum receive buffer size that applications can request.
/proc/sys/net/core/wmem_max
Set to limit the maximum send buffer size that applications can request.
Via /proc
it is possible to disable TCP features
that you do not need (all TCP features are switched on by default). For
example, check the following files:
/proc/sys/net/ipv4/tcp_timestamps
TCP time stamps are defined in RFC1323.
/proc/sys/net/ipv4/tcp_window_scaling
TCP window scaling is also defined in RFC1323.
/proc/sys/net/ipv4/tcp_sack
Select acknowledgments (SACKS).
Use sysctl
to read or write variables of the
/proc
file system. sysctl
is
preferable to cat
(for reading) and
echo
(for writing), because it also reads settings
from /etc/sysctl.conf
and, thus, those settings
survive reboots reliably. With sysctl
you can read all
variables and their values easily; as root
use the following
command to list TCP related settings:
>
sudo
sysctl -a | grep tcp
Tuning network variables can affect other system resources such as CPU or memory use.
Before starting with network tuning, it is important to isolate network bottlenecks and network traffic patterns. There are certain tools that can help you with detecting those bottlenecks.
The following tools can help analyzing your network traffic:
netstat
, tcpdump
, and
wireshark
. Wireshark is a network traffic analyzer.
The Linux firewall and masquerading features are provided by the Netfilter kernel modules. This is a highly configurable rule based framework. If a rule matches a packet, Netfilter accepts or denies it or takes special action (“target”) as defined by rules such as address translation.
There are many properties Netfilter can take into account. Thus, the more rules are defined, the longer packet processing may last. Also advanced connection tracking could be rather expensive and, thus, slowing down overall networking.
When the kernel queue becomes full, all new packets are dropped, causing existing connections to fail. The 'fail-open' feature allows a user to temporarily disable the packet inspection and maintain the connectivity under heavy network traffic. For reference, see https://home.regit.org/netfilter-en/using-nfqueue-and-libnetfilter_queue/.
For more information, see the home page of the Netfilter and iptables project, https://www.netfilter.org
Modern network interface devices can move so many packets that the host can become the limiting factor for achieving maximum performance. To keep up, the system must be able to distribute the work across multiple CPU cores.
Some modern network interfaces can help distribute the work to multiple CPU cores through the implementation of multiple transmission and multiple receive queues in hardware. However, others are only equipped with a single queue and the driver must deal with all incoming packets in a single, serialized stream. To work around this issue, the operating system must “parallelize” the stream to distribute the work across multiple CPUs. On openSUSE Leap this is done via Receive Packet Steering (RPS). RPS can also be used in virtual environments.
RPS creates a unique hash for each data stream using IP addresses and port numbers. The use of this hash ensures that packets for the same data stream are sent to the same CPU, which helps to increase performance.
RPS is configured per network device receive queue and interface. The configuration file names match the following scheme:
/sys/class/net/<device>/queues/<rx-queue>/rps_cpus
<device> stands for the network
device, such as eth0
, eth1
.
<rx-queue> stands for the receive queue,
such as rx-0
, rx-1
.
If the network interface hardware only supports a single receive queue,
only rx-0
exists. If it supports multiple receive
queues, there is an rx-N directory for
each receive queue.
These configuration files contain a comma-delimited list of CPU bitmaps.
By default, all bits are set to 0
. With this setting,
RPS is disabled and therefore the CPU that handles the interrupt also processes the packet queue.
To enable RPS and enable specific CPUs to process packets for the receive
queue of the interface, set the value of their positions in the bitmap to
1
. For example, to enable CPUs 0-3 to process packets
for the first receive queue for eth0, set the bit positions
0-3 to 1 in binary: 00001111
. This representation then
needs to be converted to hex—which results in F
in
this case. Set this hex value with the following command:
>
sudo
echo "f" > /sys/class/net/eth0/queues/rx-0/rps_cpus
If you wanted to enable CPUs 8-15:
1111 1111 0000 0000 (binary) 15 15 0 0 (decimal) F F 0 0 (hex)
The command to set the hex value of ff00
would be:
>
sudo
echo "ff00" > /sys/class/net/eth0/queues/rx-0/rps_cpus
On NUMA machines, best performance can be achieved by configuring RPS to use the CPUs on the same NUMA node as the interrupt for the interface's receive queue.
On non-NUMA machines, all CPUs can be used. If the interrupt rate is high, excluding the CPU handling the network interface can boost
performance. The CPU being used for the network interface can be
determined from /proc/interrupts
. For example:
>
sudo
cat /proc/interrupts CPU0 CPU1 CPU2 CPU3 ... 51: 113915241 0 0 0 Phys-fasteoi eth0 ...
In this case, CPU 0
is the only CPU processing
interrupts for eth0
, since only
CPU0
contains a non-zero value.
On x86 and AMD64/Intel 64 platforms, irqbalance
can be used
to distribute hardware interrupts across CPUs. See man 1
irqbalance
for more details.