Comprehensive data protection for all workloads
Post Reply
fg_pi
Lurker
Posts: 2
Liked: never
Joined: Jul 17, 2024 5:07 pm
Contact:

Slow Backup Copy job from hardened repository to service provider

Post by fg_pi »

I recently submitted a support request regarding slow a slow Backup Copy Job from our on-premises Linux Hardened (Scale-out) Repository to a third-party Veeam Cloud Connect service provider - we reached a suitable resolution but the technician recommended posting here. The case ID is 07324845.

[moderator: included correct case ID]

We have the copy interval period set to 24 hours (i.e. run the BCJ once per day); many times the job would finish just (minutes to hours) before the copy interval expired, and occasionally the job would fail to complete all VMs before expiring.

The primary culprit was one of the disks attached to a file server which has a high delta/churn of data each day. Often, the normal backup job would process ~175+ GB for this disk. When the BCJ ran, it was processing the disk at only 3 MB/s. The other disks on this VM (and other VMs) were approximately the same speed, sometimes a bit faster or slower, but no other disks on any other VM approached the 175 GB size. This disk was often taking 16-20 hours (and sometimes more, if the size that day was more than 200 GB) to process, leaving a very small margin within the backup copy interval window. The primary bottleneck was always the Network for all (42) VMs in the BCJ.

During troubleshooting, I found (via tcptrack) there were only 5 streams per VM (so when the last remaining VM with the large disk was still being processed after all other VMs were finished, only 5 outbound connections) to the Service Provider and each of these streams/connections never went above 1 MB/s -- this tracked with the ~3 MB/s speed reported in the VBR console. We "resolved" the issue by increasing the number of upload streams (set in Global Network Traffic Rules) from the default (5) to the maximum (100). This reduced the BCJ time from 16-20+ hours to ~3-5 hours to complete. Interestingly, the job summary still shows the primary bottleneck is the network.

In the support case, I noted multiple times that I had noticed the individual stream/connection never going above 1 MB/s but the technician offered no comment or explanation. This VBR implementation is unique to our organization in that it's the only one running a Nutanix AHV cluster, and also the only one using a Linux Hardened Repository. We have four other smaller facilities (on older, slower hardware) running VBR - they all run ESXi with simple NAS (SMB) repositories. The BCJs at those facilities point to the same Service Provider and we have experienced no issues with slowness (only limited by internet bandwidth) and have not changed the number of upload streams from the default.

It appears there is a limitation (each stream being capped at 1 MB/s) either in the Nutanix AHV implementation or with the Linux Hardened Repository running the transport service (or a combination of the two).

Our setup is as follows:
VBR v. 12.1.2.172
Veeam Backup for Nutanix AHV (proxy) v. 5.1.0.7
Nutanix AHV (AOS) v. 6.5.5.5
Linux Hardened Repository: Ubuntu 22.04.4 LTS
Storage: HPE/Nimble SAN; volumes mounted on LHR via iSCSI

Again, we have worked around what appears to be some kind of limitation by increasing the number of upload streams to the max allowed (100) but it seems performance could be further improved if each stream could transfer faster than 1 MB/s (or if we could run more than 100 concurrent streams). I'm happy to answer questions, provide screenshots or any additional information where it might be helpful.
david.domask
Veeam Software
Posts: 2685
Liked: 620 times
Joined: Jun 28, 2016 12:12 pm
Contact:

Re: Slow Backup Copy job from hardened repository to service provider

Post by david.domask »

Hi fg_pi,

Thank you for the detailed explanation of the situation you were experiencing and the resolution you identified.

There is not intentional hard-cap on the speed of each connection stream, it just controls how many simultaneous TCP upload streams will be used.

Can I ask, is it feasible to test with iperf between the two repositories for the Backup Copy job using the -P flag?

10 upload streams: iperf.exe -c 192.168.0.1 -p 9999 -n 1G -P 10
5 upload streams: iperf.exe -c 192.168.0.1 -p 9999 -n 1G -P 5
1 upload stream: iperf.exe -c 192.168.0.1 -p 9999 -n 1G

That would be a few client side examples, and it should mimic the behavior changing the multiple upload streams value that you identified -- does iPerf also show similar caps per stream or it's significantly faster?
David Domask | Product Management: Principal Analyst
fg_pi
Lurker
Posts: 2
Liked: never
Joined: Jul 17, 2024 5:07 pm
Contact:

Re: Slow Backup Copy job from hardened repository to service provider

Post by fg_pi »

Thank you for the prompt response. During our investigation, we actually did an end-to-end iperf3 test from our transport server to the Service Provider's VCC server. We ran a few tests and tweaked some of the parameters, however we didn't run a test with 5 or 10 streams. I also confirmed neither side had enabled any throttling or traffic shaping.

Please see below for some of the results we saw:

1 stream:

Code: Select all

user@server:~$ iperf3 -c x.x.x.x
Connecting to host x.x.x.x, port 5201
[  5] local a.b.c.d port 58446 connected to x.x.x.x port 5201
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec  1.24 MBytes  10.4 Mbits/sec   27   37.1 KBytes
[  5]   1.00-2.00   sec   941 KBytes  7.71 Mbits/sec    3   41.3 KBytes
[  5]   2.00-3.00   sec   941 KBytes  7.71 Mbits/sec    4   41.3 KBytes
[  5]   3.00-4.00   sec   753 KBytes  6.17 Mbits/sec    6   31.4 KBytes
[  5]   4.00-5.00   sec   941 KBytes  7.71 Mbits/sec    5   25.7 KBytes
[  5]   5.00-6.00   sec   376 KBytes  3.08 Mbits/sec    5   22.8 KBytes
[  5]   6.00-7.00   sec   941 KBytes  7.71 Mbits/sec    0   42.8 KBytes
[  5]   7.00-8.00   sec  1.10 MBytes  9.25 Mbits/sec    1   41.3 KBytes
[  5]   8.00-9.00   sec  1.10 MBytes  9.25 Mbits/sec    6   28.5 KBytes
[  5]   9.00-10.00  sec   941 KBytes  7.71 Mbits/sec    0   48.5 KBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec  9.14 MBytes  7.67 Mbits/sec   57             sender
[  5]   0.00-10.04  sec  8.75 MBytes  7.31 Mbits/sec                  receiver

iperf Done.


2 streams:

Code: Select all

user@server:~$ iperf3 -b 1G -P 2 -c x.x.x.x
Connecting to host x.x.x.x, port 5201
[  5] local a.b.c.d port 41420 connected to x.x.x.x port 5201
[  7] local a.b.c.d port 41422 connected to x.x.x.x port 5201
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec  1002 KBytes  8.20 Mbits/sec   11   35.6 KBytes
[  7]   0.00-1.00   sec   699 KBytes  5.72 Mbits/sec   17   22.8 KBytes
[SUM]   0.00-1.00   sec  1.66 MBytes  13.9 Mbits/sec   28
- - - - - - - - - - - - - - - - - - - - - - - - -
[  5]   1.00-2.00   sec   640 KBytes  5.24 Mbits/sec    5   27.1 KBytes
[  7]   1.00-2.00   sec   502 KBytes  4.11 Mbits/sec    5   20.0 KBytes
[SUM]   1.00-2.00   sec  1.12 MBytes  9.35 Mbits/sec   10
- - - - - - - - - - - - - - - - - - - - - - - - -
[  5]   2.00-3.00   sec   628 KBytes  5.15 Mbits/sec    7   17.1 KBytes
[  7]   2.00-3.00   sec   502 KBytes  4.11 Mbits/sec    5   22.8 KBytes
[SUM]   2.00-3.00   sec  1.10 MBytes  9.26 Mbits/sec   12
- - - - - - - - - - - - - - - - - - - - - - - - -
[  5]   3.00-4.00   sec   376 KBytes  3.08 Mbits/sec    4   17.1 KBytes
[  7]   3.00-4.00   sec   627 KBytes  5.14 Mbits/sec   10   20.0 KBytes
[SUM]   3.00-4.00   sec  1004 KBytes  8.22 Mbits/sec   14
- - - - - - - - - - - - - - - - - - - - - - - - -
[  5]   4.00-5.00   sec   502 KBytes  4.11 Mbits/sec    5   11.4 KBytes
[  7]   4.00-5.00   sec   627 KBytes  5.14 Mbits/sec    5   17.1 KBytes
[SUM]   4.00-5.00   sec  1.10 MBytes  9.25 Mbits/sec   10
- - - - - - - - - - - - - - - - - - - - - - - - -
[  5]   5.00-6.00   sec   376 KBytes  3.08 Mbits/sec    6   8.55 KBytes
[  7]   5.00-6.00   sec   502 KBytes  4.11 Mbits/sec    2   17.1 KBytes
[SUM]   5.00-6.00   sec   878 KBytes  7.19 Mbits/sec    8
- - - - - - - - - - - - - - - - - - - - - - - - -
[  5]   6.00-7.00   sec   376 KBytes  3.08 Mbits/sec    3   21.4 KBytes
[  7]   6.00-7.00   sec   376 KBytes  3.08 Mbits/sec    5   17.1 KBytes
[SUM]   6.00-7.00   sec   753 KBytes  6.17 Mbits/sec    8
- - - - - - - - - - - - - - - - - - - - - - - - -
[  5]   7.00-8.00   sec   502 KBytes  4.11 Mbits/sec    2   22.8 KBytes
[  7]   7.00-8.00   sec   627 KBytes  5.14 Mbits/sec    7   22.8 KBytes
[SUM]   7.00-8.00   sec  1.10 MBytes  9.25 Mbits/sec    9
- - - - - - - - - - - - - - - - - - - - - - - - -
[  5]   8.00-9.00   sec   502 KBytes  4.11 Mbits/sec    3   20.0 KBytes
[  7]   8.00-9.00   sec   753 KBytes  6.17 Mbits/sec    0   41.3 KBytes
[SUM]   8.00-9.00   sec  1.23 MBytes  10.3 Mbits/sec    3
- - - - - - - - - - - - - - - - - - - - - - - - -
[  5]   9.00-10.00  sec   627 KBytes  5.14 Mbits/sec    3   27.1 KBytes
[  7]   9.00-10.00  sec  1004 KBytes  8.22 Mbits/sec    6   31.4 KBytes
[SUM]   9.00-10.00  sec  1.59 MBytes  13.4 Mbits/sec    9
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec  5.40 MBytes  4.53 Mbits/sec   49             sender
[  5]   0.00-10.16  sec  5.12 MBytes  4.23 Mbits/sec                  receiver
[  7]   0.00-10.00  sec  6.07 MBytes  5.09 Mbits/sec   62             sender
[  7]   0.00-10.16  sec  5.88 MBytes  4.85 Mbits/sec                  receiver
[SUM]   0.00-10.00  sec  11.5 MBytes  9.63 Mbits/sec  111             sender
[SUM]   0.00-10.16  sec  11.0 MBytes  9.08 Mbits/sec                  receiver

iperf Done.

Code: Select all

user@server:~$ iperf3 -P 2 -c x.x.x.x
Connecting to host x.x.x.x, port 5201
[  5] local a.b.c.d port 55694 connected to x.x.x.x port 5201
[  7] local a.b.c.d port 55698 connected to x.x.x.x port 5201
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec  1.11 MBytes  9.34 Mbits/sec   10   34.2 KBytes
[  7]   0.00-1.00   sec   556 KBytes  4.55 Mbits/sec    5   20.0 KBytes
[SUM]   0.00-1.00   sec  1.66 MBytes  13.9 Mbits/sec   15
- - - - - - - - - - - - - - - - - - - - - - - - -
[  5]   1.00-2.00   sec   753 KBytes  6.17 Mbits/sec    3   28.5 KBytes
[  7]   1.00-2.00   sec   640 KBytes  5.24 Mbits/sec    1   25.7 KBytes
[SUM]   1.00-2.00   sec  1.36 MBytes  11.4 Mbits/sec    4
- - - - - - - - - - - - - - - - - - - - - - - - -
[  5]   2.00-3.00   sec   753 KBytes  6.17 Mbits/sec    4   35.6 KBytes
[  7]   2.00-3.00   sec   502 KBytes  4.11 Mbits/sec    3   22.8 KBytes
[SUM]   2.00-3.00   sec  1.23 MBytes  10.3 Mbits/sec    7
- - - - - - - - - - - - - - - - - - - - - - - - -
[  5]   3.00-4.00   sec   878 KBytes  7.19 Mbits/sec    1   37.1 KBytes
[  7]   3.00-4.00   sec   565 KBytes  4.62 Mbits/sec    3   22.8 KBytes
[SUM]   3.00-4.00   sec  1.41 MBytes  11.8 Mbits/sec    4
- - - - - - - - - - - - - - - - - - - - - - - - -
[  5]   4.00-5.00   sec   816 KBytes  6.68 Mbits/sec    3   27.1 KBytes
[  7]   4.00-5.00   sec   627 KBytes  5.14 Mbits/sec    1   29.9 KBytes
[SUM]   4.00-5.00   sec  1.41 MBytes  11.8 Mbits/sec    4
- - - - - - - - - - - - - - - - - - - - - - - - -
[  5]   5.00-6.00   sec   941 KBytes  7.71 Mbits/sec    0   44.2 KBytes
[  7]   5.00-6.00   sec   502 KBytes  4.11 Mbits/sec   11   17.1 KBytes
[SUM]   5.00-6.00   sec  1.41 MBytes  11.8 Mbits/sec   11
- - - - - - - - - - - - - - - - - - - - - - - - -
[  5]   6.00-7.00   sec   878 KBytes  7.19 Mbits/sec    4   37.1 KBytes
[  7]   6.00-7.00   sec   627 KBytes  5.14 Mbits/sec    3   24.2 KBytes
[SUM]   6.00-7.00   sec  1.47 MBytes  12.3 Mbits/sec    7
- - - - - - - - - - - - - - - - - - - - - - - - -
[  5]   7.00-8.00   sec   753 KBytes  6.17 Mbits/sec    7   18.5 KBytes
[  7]   7.00-8.00   sec   502 KBytes  4.11 Mbits/sec    4   22.8 KBytes
[SUM]   7.00-8.00   sec  1.23 MBytes  10.3 Mbits/sec   11
- - - - - - - - - - - - - - - - - - - - - - - - -
[  5]   8.00-9.00   sec   502 KBytes  4.11 Mbits/sec    1   31.4 KBytes
[  7]   8.00-9.00   sec   627 KBytes  5.14 Mbits/sec    1   31.4 KBytes
[SUM]   8.00-9.00   sec  1.10 MBytes  9.25 Mbits/sec    2
- - - - - - - - - - - - - - - - - - - - - - - - -
[  5]   9.00-10.00  sec   627 KBytes  5.14 Mbits/sec    8   28.5 KBytes
[  7]   9.00-10.00  sec   753 KBytes  6.17 Mbits/sec    3   25.7 KBytes
[SUM]   9.00-10.00  sec  1.35 MBytes  11.3 Mbits/sec   11
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec  7.85 MBytes  6.59 Mbits/sec   41             sender
[  5]   0.00-10.26  sec  7.75 MBytes  6.34 Mbits/sec                  receiver
[  7]   0.00-10.00  sec  5.76 MBytes  4.83 Mbits/sec   35             sender
[  7]   0.00-10.26  sec  5.62 MBytes  4.60 Mbits/sec                  receiver
[SUM]   0.00-10.00  sec  13.6 MBytes  11.4 Mbits/sec   76             sender
[SUM]   0.00-10.26  sec  13.4 MBytes  10.9 Mbits/sec                  receiver

iperf Done.
The 1- and 2-stream tests do seem to align with the per-stream limits I was seeing with tcptrack at +/- 1 MB/s.

It may be possible to re-engage with the SP to try running it with 5 or 10 streams but would require a bit of coordination. I'm curious about where that bottleneck comes in (perhaps something our/their firewall is doing to "fragment" the data?) or if it's just a symptom of sending the data over the public internet. If anything, it would be great if we could increase the maximum number of streams beyond 100 -- even with it set at 100, the transport server is only hitting ~20% CPU and ~50% RAM (4 vCPU, 8 GB RAM).
david.domask
Veeam Software
Posts: 2685
Liked: 620 times
Joined: Jun 28, 2016 12:12 pm
Contact:

Re: Slow Backup Copy job from hardened repository to service provider

Post by david.domask »

Hi fg_pi,

Thank you for the detailed results, these are very helpful. May I ask though, is this speed expected for you with a single stream? As we can see, it looks like it's not about Veeam but just in general each connection is capped at around 1 MB/s or less usually.

If the connection should be able to sustain a higher bandwidth that 1 MB/s, then I don't think this is about the number of streams Veeam is using, it looks like indeed something in between is bottlenecking.

I actually advise "against" trying to increase streams further, as each additional stream means a port used for each stream and you can easily exhaust available ports this way.

I would review the iperf test with your Provider and do further testing, as from my perspective, looks like it's something outside of the applications as even iPerf is showing the same results.
David Domask | Product Management: Principal Analyst
Post Reply

Who is online

Users browsing this forum: Bing [Bot], Google [Bot] and 87 guests