The tuned utility is a simple package which allows you to configure default "tunes" using various pre-defined profiles (or you can make a custom profile). Most of these tunes are pretty minor, but some can make a little bit of a difference, based on the exact mode. For example, the "throughput-performance" mode does things like set the default readahead for block devices (disks) to 4MB, up from the Ubuntu default of 128K. Interestingly, Redhat seems to default to 4MB readahead. This can improve backup throughput of modes that read from attached block devices like hotadd or Direct SAN mode, but is unlikely to be major increase. In my lab I've seen maybe a 15-20% increase in single stream backups with this tune, but not much if your are reading mulitple VMs/disks.
The other thing the performance/throughput tune does is set the CPU governor to performance mode and locks the clock to maximum performance (basically, the equivalent of the BIOS "performance" settings for the CPU) so be aware that it can use more energy and generate more heat.
It also tunes the kernel scheduler to make it less likely to be preempt already running tasks, which helps with throughput while increases latency (constantly switching between tasks is less efficient than given each tasks slightly longer schedule time and switching less). It also tunes the memory and swappiness settings to more typical "server" workloads. In general this tune is likely good for overall throughput, but, other than the readahead tweaks, the other stuff is pretty minimal benefit for the typical Veeam workload.
Tuned also has a network-throughput tune which mostly just increases the maximum rmem/wmem values, and, honestly, this seems not very useful unless you are at >25Gb speeds or perhaps in some weird case where the proxy and repo are connected via high speed links with higher latency, I think this tune is a little outdated at this point because modern linux system set the rmem/wmem values based on available memory and, even on my fairly small lab systems, this tune actually sets the "default" values slightly smaller than the out-of-box default, which is crazy, although admittedly it does set the maximum size to the absolute maximum value. Personally I've been unable to measure any benefit from this tune.
Being designed as a shared use system, for the most part, Linux systems are tuned for fairness between processes, not for absolute maximum performance of any single process or network stream, and it usually does a pretty good job of balancing the load for all of the different Veeam processes. The only tuned profiles I recommend are throughput-performance or virtual-guest (which includes throughput-performance) if running on a VM, but other than the readahead values I've found very little difference overall.
Regarding the rx/tx buffers, my guess is this is referring to the rx/tx buffers allocated to the device driver which cat be read and configured with ethtool:
Code: Select all
# ethtool -g ens192
Ring parameters for ens192:
Pre-set maximums:
RX: 4096
RX Mini: 2048
RX Jumbo: 4096
TX: 4096
Current hardware settings:
RX: 1024
RX Mini: 128
RX Jumbo: 256
TX: 512
You can see in this case that both my RX and TX buffers can be set to a maximum of 4096, but the default is TX 512 and RX 1024. Indeed increasing these can improve throughput on 10Gb and faster networks. Tuning these are easy, just use the following command as an example:
If things improve, you can make the changes persistent by just adding these commands to rc.local. Well, there are probably better ways, but that's the "easy" way and should work for all distros while exactly how to persist ethtool options otherwise varies by distro. You can always Google "make ethtool settings persistent" for your distro to find the preferred distro specific way.
Also, if it's possible, always use jumbo frames, this really increases the throughput for speeds of 10Gb or greater but giving the system a lot less to do but also by increasing the maximum possible efficiency of the TCP streams from 94% to 99%. So many environments ask me about performance but then don't use Jumbo frame while it's the single biggest thing you can do to improve throughput on >=10Gb networks as well as for improving NBD mode performance with VMware. Admittedly, do it right, don't mix standard and jumbo frame devices on the same layer-2 network/VLAN, and make sure your equipment supports it, but it's 2021, hopefully everything out there supports it well enough at this point.
The only other tuning example I've seen be useful with Linux proxies is to increase the default NFS read-ahead when using Direct NFS. Linux default to a pretty conservative value of 128K, which is probably great for most workloads but not so great for Veeam which typically reads in 1MB chunks. I've seen performance improve by 30-40% for a single stream VMDK backup just by increasing the readahead to 2MB instead. Since Veeam automatically mounts and dismount NFS volumes when using direct NFS and there doesn't appear to be any way to globally force NFS readhead (maybe somebody can tell me I'm wrong on that), the best way I've found is to use a udev rule to set the readhead on the virtual block device that is automatically created whenever any NFS share is mounted. I just drop the following line into /etc/udev/rules.d/99-nfs-readahead.rules
Code: Select all
SUBSYSTEM=="bdi", ACTION=="add", PROGRAM="/bin/awk -v bdi=$kernel 'BEGIN{ret=1} {if ($4 == bdi) {ret=0}} END{exit ret}' /proc/fs/nfsfs/volumes", ATTR{read_ahead_kb}="2048"
Good luck and feel free to share any of your own tips on Linux proxies.