I recently submitted a support request regarding slow a slow Backup Copy Job from our on-premises Linux Hardened (Scale-out) Repository to a third-party Veeam Cloud Connect service provider - we reached a suitable resolution but the technician recommended posting here. The case ID is 07324845.
[moderator: included correct case ID]
We have the copy interval period set to 24 hours (i.e. run the BCJ once per day); many times the job would finish just (minutes to hours) before the copy interval expired, and occasionally the job would fail to complete all VMs before expiring.
The primary culprit was one of the disks attached to a file server which has a high delta/churn of data each day. Often, the normal backup job would process ~175+ GB for this disk. When the BCJ ran, it was processing the disk at only 3 MB/s. The other disks on this VM (and other VMs) were approximately the same speed, sometimes a bit faster or slower, but no other disks on any other VM approached the 175 GB size. This disk was often taking 16-20 hours (and sometimes more, if the size that day was more than 200 GB) to process, leaving a very small margin within the backup copy interval window. The primary bottleneck was always the Network for all (42) VMs in the BCJ.
During troubleshooting, I found (via tcptrack) there were only 5 streams per VM (so when the last remaining VM with the large disk was still being processed after all other VMs were finished, only 5 outbound connections) to the Service Provider and each of these streams/connections never went above 1 MB/s -- this tracked with the ~3 MB/s speed reported in the VBR console. We "resolved" the issue by increasing the number of upload streams (set in Global Network Traffic Rules) from the default (5) to the maximum (100). This reduced the BCJ time from 16-20+ hours to ~3-5 hours to complete. Interestingly, the job summary still shows the primary bottleneck is the network.
In the support case, I noted multiple times that I had noticed the individual stream/connection never going above 1 MB/s but the technician offered no comment or explanation. This VBR implementation is unique to our organization in that it's the only one running a Nutanix AHV cluster, and also the only one using a Linux Hardened Repository. We have four other smaller facilities (on older, slower hardware) running VBR - they all run ESXi with simple NAS (SMB) repositories. The BCJs at those facilities point to the same Service Provider and we have experienced no issues with slowness (only limited by internet bandwidth) and have not changed the number of upload streams from the default.
It appears there is a limitation (each stream being capped at 1 MB/s) either in the Nutanix AHV implementation or with the Linux Hardened Repository running the transport service (or a combination of the two).
Our setup is as follows:
VBR v. 12.1.2.172
Veeam Backup for Nutanix AHV (proxy) v. 5.1.0.7
Nutanix AHV (AOS) v. 6.5.5.5
Linux Hardened Repository: Ubuntu 22.04.4 LTS
Storage: HPE/Nimble SAN; volumes mounted on LHR via iSCSI
Again, we have worked around what appears to be some kind of limitation by increasing the number of upload streams to the max allowed (100) but it seems performance could be further improved if each stream could transfer faster than 1 MB/s (or if we could run more than 100 concurrent streams). I'm happy to answer questions, provide screenshots or any additional information where it might be helpful.
-
- Lurker
- Posts: 2
- Liked: never
- Joined: Jul 17, 2024 5:07 pm
- Contact:
-
- Veeam Software
- Posts: 2685
- Liked: 620 times
- Joined: Jun 28, 2016 12:12 pm
- Contact:
Re: Slow Backup Copy job from hardened repository to service provider
Hi fg_pi,
Thank you for the detailed explanation of the situation you were experiencing and the resolution you identified.
There is not intentional hard-cap on the speed of each connection stream, it just controls how many simultaneous TCP upload streams will be used.
Can I ask, is it feasible to test with iperf between the two repositories for the Backup Copy job using the -P flag?
10 upload streams: iperf.exe -c 192.168.0.1 -p 9999 -n 1G -P 10
5 upload streams: iperf.exe -c 192.168.0.1 -p 9999 -n 1G -P 5
1 upload stream: iperf.exe -c 192.168.0.1 -p 9999 -n 1G
That would be a few client side examples, and it should mimic the behavior changing the multiple upload streams value that you identified -- does iPerf also show similar caps per stream or it's significantly faster?
Thank you for the detailed explanation of the situation you were experiencing and the resolution you identified.
There is not intentional hard-cap on the speed of each connection stream, it just controls how many simultaneous TCP upload streams will be used.
Can I ask, is it feasible to test with iperf between the two repositories for the Backup Copy job using the -P flag?
10 upload streams: iperf.exe -c 192.168.0.1 -p 9999 -n 1G -P 10
5 upload streams: iperf.exe -c 192.168.0.1 -p 9999 -n 1G -P 5
1 upload stream: iperf.exe -c 192.168.0.1 -p 9999 -n 1G
That would be a few client side examples, and it should mimic the behavior changing the multiple upload streams value that you identified -- does iPerf also show similar caps per stream or it's significantly faster?
David Domask | Product Management: Principal Analyst
-
- Lurker
- Posts: 2
- Liked: never
- Joined: Jul 17, 2024 5:07 pm
- Contact:
Re: Slow Backup Copy job from hardened repository to service provider
Thank you for the prompt response. During our investigation, we actually did an end-to-end iperf3 test from our transport server to the Service Provider's VCC server. We ran a few tests and tweaked some of the parameters, however we didn't run a test with 5 or 10 streams. I also confirmed neither side had enabled any throttling or traffic shaping.
Please see below for some of the results we saw:
1 stream:
2 streams:
The 1- and 2-stream tests do seem to align with the per-stream limits I was seeing with tcptrack at +/- 1 MB/s.
It may be possible to re-engage with the SP to try running it with 5 or 10 streams but would require a bit of coordination. I'm curious about where that bottleneck comes in (perhaps something our/their firewall is doing to "fragment" the data?) or if it's just a symptom of sending the data over the public internet. If anything, it would be great if we could increase the maximum number of streams beyond 100 -- even with it set at 100, the transport server is only hitting ~20% CPU and ~50% RAM (4 vCPU, 8 GB RAM).
Please see below for some of the results we saw:
1 stream:
Code: Select all
user@server:~$ iperf3 -c x.x.x.x
Connecting to host x.x.x.x, port 5201
[ 5] local a.b.c.d port 58446 connected to x.x.x.x port 5201
[ ID] Interval Transfer Bitrate Retr Cwnd
[ 5] 0.00-1.00 sec 1.24 MBytes 10.4 Mbits/sec 27 37.1 KBytes
[ 5] 1.00-2.00 sec 941 KBytes 7.71 Mbits/sec 3 41.3 KBytes
[ 5] 2.00-3.00 sec 941 KBytes 7.71 Mbits/sec 4 41.3 KBytes
[ 5] 3.00-4.00 sec 753 KBytes 6.17 Mbits/sec 6 31.4 KBytes
[ 5] 4.00-5.00 sec 941 KBytes 7.71 Mbits/sec 5 25.7 KBytes
[ 5] 5.00-6.00 sec 376 KBytes 3.08 Mbits/sec 5 22.8 KBytes
[ 5] 6.00-7.00 sec 941 KBytes 7.71 Mbits/sec 0 42.8 KBytes
[ 5] 7.00-8.00 sec 1.10 MBytes 9.25 Mbits/sec 1 41.3 KBytes
[ 5] 8.00-9.00 sec 1.10 MBytes 9.25 Mbits/sec 6 28.5 KBytes
[ 5] 9.00-10.00 sec 941 KBytes 7.71 Mbits/sec 0 48.5 KBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bitrate Retr
[ 5] 0.00-10.00 sec 9.14 MBytes 7.67 Mbits/sec 57 sender
[ 5] 0.00-10.04 sec 8.75 MBytes 7.31 Mbits/sec receiver
iperf Done.
2 streams:
Code: Select all
user@server:~$ iperf3 -b 1G -P 2 -c x.x.x.x
Connecting to host x.x.x.x, port 5201
[ 5] local a.b.c.d port 41420 connected to x.x.x.x port 5201
[ 7] local a.b.c.d port 41422 connected to x.x.x.x port 5201
[ ID] Interval Transfer Bitrate Retr Cwnd
[ 5] 0.00-1.00 sec 1002 KBytes 8.20 Mbits/sec 11 35.6 KBytes
[ 7] 0.00-1.00 sec 699 KBytes 5.72 Mbits/sec 17 22.8 KBytes
[SUM] 0.00-1.00 sec 1.66 MBytes 13.9 Mbits/sec 28
- - - - - - - - - - - - - - - - - - - - - - - - -
[ 5] 1.00-2.00 sec 640 KBytes 5.24 Mbits/sec 5 27.1 KBytes
[ 7] 1.00-2.00 sec 502 KBytes 4.11 Mbits/sec 5 20.0 KBytes
[SUM] 1.00-2.00 sec 1.12 MBytes 9.35 Mbits/sec 10
- - - - - - - - - - - - - - - - - - - - - - - - -
[ 5] 2.00-3.00 sec 628 KBytes 5.15 Mbits/sec 7 17.1 KBytes
[ 7] 2.00-3.00 sec 502 KBytes 4.11 Mbits/sec 5 22.8 KBytes
[SUM] 2.00-3.00 sec 1.10 MBytes 9.26 Mbits/sec 12
- - - - - - - - - - - - - - - - - - - - - - - - -
[ 5] 3.00-4.00 sec 376 KBytes 3.08 Mbits/sec 4 17.1 KBytes
[ 7] 3.00-4.00 sec 627 KBytes 5.14 Mbits/sec 10 20.0 KBytes
[SUM] 3.00-4.00 sec 1004 KBytes 8.22 Mbits/sec 14
- - - - - - - - - - - - - - - - - - - - - - - - -
[ 5] 4.00-5.00 sec 502 KBytes 4.11 Mbits/sec 5 11.4 KBytes
[ 7] 4.00-5.00 sec 627 KBytes 5.14 Mbits/sec 5 17.1 KBytes
[SUM] 4.00-5.00 sec 1.10 MBytes 9.25 Mbits/sec 10
- - - - - - - - - - - - - - - - - - - - - - - - -
[ 5] 5.00-6.00 sec 376 KBytes 3.08 Mbits/sec 6 8.55 KBytes
[ 7] 5.00-6.00 sec 502 KBytes 4.11 Mbits/sec 2 17.1 KBytes
[SUM] 5.00-6.00 sec 878 KBytes 7.19 Mbits/sec 8
- - - - - - - - - - - - - - - - - - - - - - - - -
[ 5] 6.00-7.00 sec 376 KBytes 3.08 Mbits/sec 3 21.4 KBytes
[ 7] 6.00-7.00 sec 376 KBytes 3.08 Mbits/sec 5 17.1 KBytes
[SUM] 6.00-7.00 sec 753 KBytes 6.17 Mbits/sec 8
- - - - - - - - - - - - - - - - - - - - - - - - -
[ 5] 7.00-8.00 sec 502 KBytes 4.11 Mbits/sec 2 22.8 KBytes
[ 7] 7.00-8.00 sec 627 KBytes 5.14 Mbits/sec 7 22.8 KBytes
[SUM] 7.00-8.00 sec 1.10 MBytes 9.25 Mbits/sec 9
- - - - - - - - - - - - - - - - - - - - - - - - -
[ 5] 8.00-9.00 sec 502 KBytes 4.11 Mbits/sec 3 20.0 KBytes
[ 7] 8.00-9.00 sec 753 KBytes 6.17 Mbits/sec 0 41.3 KBytes
[SUM] 8.00-9.00 sec 1.23 MBytes 10.3 Mbits/sec 3
- - - - - - - - - - - - - - - - - - - - - - - - -
[ 5] 9.00-10.00 sec 627 KBytes 5.14 Mbits/sec 3 27.1 KBytes
[ 7] 9.00-10.00 sec 1004 KBytes 8.22 Mbits/sec 6 31.4 KBytes
[SUM] 9.00-10.00 sec 1.59 MBytes 13.4 Mbits/sec 9
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bitrate Retr
[ 5] 0.00-10.00 sec 5.40 MBytes 4.53 Mbits/sec 49 sender
[ 5] 0.00-10.16 sec 5.12 MBytes 4.23 Mbits/sec receiver
[ 7] 0.00-10.00 sec 6.07 MBytes 5.09 Mbits/sec 62 sender
[ 7] 0.00-10.16 sec 5.88 MBytes 4.85 Mbits/sec receiver
[SUM] 0.00-10.00 sec 11.5 MBytes 9.63 Mbits/sec 111 sender
[SUM] 0.00-10.16 sec 11.0 MBytes 9.08 Mbits/sec receiver
iperf Done.
Code: Select all
user@server:~$ iperf3 -P 2 -c x.x.x.x
Connecting to host x.x.x.x, port 5201
[ 5] local a.b.c.d port 55694 connected to x.x.x.x port 5201
[ 7] local a.b.c.d port 55698 connected to x.x.x.x port 5201
[ ID] Interval Transfer Bitrate Retr Cwnd
[ 5] 0.00-1.00 sec 1.11 MBytes 9.34 Mbits/sec 10 34.2 KBytes
[ 7] 0.00-1.00 sec 556 KBytes 4.55 Mbits/sec 5 20.0 KBytes
[SUM] 0.00-1.00 sec 1.66 MBytes 13.9 Mbits/sec 15
- - - - - - - - - - - - - - - - - - - - - - - - -
[ 5] 1.00-2.00 sec 753 KBytes 6.17 Mbits/sec 3 28.5 KBytes
[ 7] 1.00-2.00 sec 640 KBytes 5.24 Mbits/sec 1 25.7 KBytes
[SUM] 1.00-2.00 sec 1.36 MBytes 11.4 Mbits/sec 4
- - - - - - - - - - - - - - - - - - - - - - - - -
[ 5] 2.00-3.00 sec 753 KBytes 6.17 Mbits/sec 4 35.6 KBytes
[ 7] 2.00-3.00 sec 502 KBytes 4.11 Mbits/sec 3 22.8 KBytes
[SUM] 2.00-3.00 sec 1.23 MBytes 10.3 Mbits/sec 7
- - - - - - - - - - - - - - - - - - - - - - - - -
[ 5] 3.00-4.00 sec 878 KBytes 7.19 Mbits/sec 1 37.1 KBytes
[ 7] 3.00-4.00 sec 565 KBytes 4.62 Mbits/sec 3 22.8 KBytes
[SUM] 3.00-4.00 sec 1.41 MBytes 11.8 Mbits/sec 4
- - - - - - - - - - - - - - - - - - - - - - - - -
[ 5] 4.00-5.00 sec 816 KBytes 6.68 Mbits/sec 3 27.1 KBytes
[ 7] 4.00-5.00 sec 627 KBytes 5.14 Mbits/sec 1 29.9 KBytes
[SUM] 4.00-5.00 sec 1.41 MBytes 11.8 Mbits/sec 4
- - - - - - - - - - - - - - - - - - - - - - - - -
[ 5] 5.00-6.00 sec 941 KBytes 7.71 Mbits/sec 0 44.2 KBytes
[ 7] 5.00-6.00 sec 502 KBytes 4.11 Mbits/sec 11 17.1 KBytes
[SUM] 5.00-6.00 sec 1.41 MBytes 11.8 Mbits/sec 11
- - - - - - - - - - - - - - - - - - - - - - - - -
[ 5] 6.00-7.00 sec 878 KBytes 7.19 Mbits/sec 4 37.1 KBytes
[ 7] 6.00-7.00 sec 627 KBytes 5.14 Mbits/sec 3 24.2 KBytes
[SUM] 6.00-7.00 sec 1.47 MBytes 12.3 Mbits/sec 7
- - - - - - - - - - - - - - - - - - - - - - - - -
[ 5] 7.00-8.00 sec 753 KBytes 6.17 Mbits/sec 7 18.5 KBytes
[ 7] 7.00-8.00 sec 502 KBytes 4.11 Mbits/sec 4 22.8 KBytes
[SUM] 7.00-8.00 sec 1.23 MBytes 10.3 Mbits/sec 11
- - - - - - - - - - - - - - - - - - - - - - - - -
[ 5] 8.00-9.00 sec 502 KBytes 4.11 Mbits/sec 1 31.4 KBytes
[ 7] 8.00-9.00 sec 627 KBytes 5.14 Mbits/sec 1 31.4 KBytes
[SUM] 8.00-9.00 sec 1.10 MBytes 9.25 Mbits/sec 2
- - - - - - - - - - - - - - - - - - - - - - - - -
[ 5] 9.00-10.00 sec 627 KBytes 5.14 Mbits/sec 8 28.5 KBytes
[ 7] 9.00-10.00 sec 753 KBytes 6.17 Mbits/sec 3 25.7 KBytes
[SUM] 9.00-10.00 sec 1.35 MBytes 11.3 Mbits/sec 11
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bitrate Retr
[ 5] 0.00-10.00 sec 7.85 MBytes 6.59 Mbits/sec 41 sender
[ 5] 0.00-10.26 sec 7.75 MBytes 6.34 Mbits/sec receiver
[ 7] 0.00-10.00 sec 5.76 MBytes 4.83 Mbits/sec 35 sender
[ 7] 0.00-10.26 sec 5.62 MBytes 4.60 Mbits/sec receiver
[SUM] 0.00-10.00 sec 13.6 MBytes 11.4 Mbits/sec 76 sender
[SUM] 0.00-10.26 sec 13.4 MBytes 10.9 Mbits/sec receiver
iperf Done.
It may be possible to re-engage with the SP to try running it with 5 or 10 streams but would require a bit of coordination. I'm curious about where that bottleneck comes in (perhaps something our/their firewall is doing to "fragment" the data?) or if it's just a symptom of sending the data over the public internet. If anything, it would be great if we could increase the maximum number of streams beyond 100 -- even with it set at 100, the transport server is only hitting ~20% CPU and ~50% RAM (4 vCPU, 8 GB RAM).
-
- Veeam Software
- Posts: 2685
- Liked: 620 times
- Joined: Jun 28, 2016 12:12 pm
- Contact:
Re: Slow Backup Copy job from hardened repository to service provider
Hi fg_pi,
Thank you for the detailed results, these are very helpful. May I ask though, is this speed expected for you with a single stream? As we can see, it looks like it's not about Veeam but just in general each connection is capped at around 1 MB/s or less usually.
If the connection should be able to sustain a higher bandwidth that 1 MB/s, then I don't think this is about the number of streams Veeam is using, it looks like indeed something in between is bottlenecking.
I actually advise "against" trying to increase streams further, as each additional stream means a port used for each stream and you can easily exhaust available ports this way.
I would review the iperf test with your Provider and do further testing, as from my perspective, looks like it's something outside of the applications as even iPerf is showing the same results.
Thank you for the detailed results, these are very helpful. May I ask though, is this speed expected for you with a single stream? As we can see, it looks like it's not about Veeam but just in general each connection is capped at around 1 MB/s or less usually.
If the connection should be able to sustain a higher bandwidth that 1 MB/s, then I don't think this is about the number of streams Veeam is using, it looks like indeed something in between is bottlenecking.
I actually advise "against" trying to increase streams further, as each additional stream means a port used for each stream and you can easily exhaust available ports this way.
I would review the iperf test with your Provider and do further testing, as from my perspective, looks like it's something outside of the applications as even iPerf is showing the same results.
David Domask | Product Management: Principal Analyst
Who is online
Users browsing this forum: Bing [Bot], Google [Bot] and 87 guests