I've been trying to tune this deployment and just can't figure it out. Repos are Dell servers with 12 cores and 32GB of Ram running Ubuntu 20.04 and RAID6 with 8-10 drives. Using XFS with fast clone enabled per the hardened setup guide.
3 sites:
Azure which is where the B&R server is with 4 cores and 16GB of mem. Resource usage is always low.
Site 1: Hyper V cluster, SOBR1 with repo1, SOBR2 with repo2 (SOBR2 also has capacity extent to Wasabi)
Site 2: DR Hyper-V Cluster, SOBR3 with repo3
The problem is oodles of delays while copy jobs run from SOBR1 to SOBR2 which ultimate impacts copy jobs from SOBR2 to SOBR3. It spends so much time finalizing a single VM which is at 99%. It could be finalizing for an hour and nothing else is happening during this time. And these are just copy jobs, not from a production VM. When looking at repo1 and repo2 with tools like dstat, they have almost no IO, no cpu or memory usage, no network, of course until it bursts real quick and actually copies. They are just sitting there idle "finalizing" forever. One job is copying backups for 37 average size VMs and when you look at the throughput meter, when it actually decides to copy it flies by in a minute jumping up to 500 MB/s then just sits at 0 KB/s for 30 minutes. Eventually things decide to move again.
What's odd is that during these long stretches of nothing, none of the servers involved appear to be doing anything. Super low load, just sitting taking their time. I am considering moving Veeam B&R out of Azure and back to Site 2 and increasing resources to it, but like I said, during this, it's sitting there at 10% cpu usage and maybe 8GB of ram used.
So I don't know if this is a B&R thing, where moving that server out of Azure and bumping it to 64GB and 8 cores would help? I can't find any bottlenecks anywhere.
I don't have a ticket for this, but was just looking for any tips or insights people might have.
Thanks.
-
- Enthusiast
- Posts: 96
- Liked: 13 times
- Joined: Oct 05, 2010 3:27 pm
- Full Name: Rob Miller
- Contact:
-
- Enthusiast
- Posts: 96
- Liked: 13 times
- Joined: Oct 05, 2010 3:27 pm
- Full Name: Rob Miller
- Contact:
Re: Terribly slow copy jobs
As an example, it will copy Hard disk 1 from a VM in the job in 47 seconds. Then it will sit on the finalizing step for 20 minutes before moving on, with no detectable IO going on anywhere.
And even though this repo is set to 8 concurrent jobs, I will have this 1 VM in the copy sitting at 99%, and finalizing for 20 minutes, while the next 7 are all at 0% "storage initialized". All the rest are pending. It seems odd though that it doesn't process 8 jobs at once. It sits forever on 0% "storage initialized" on those 7 while the first one is sitting on "finalizing".
When you put this all together, a copy every 6 hours for these 37 VMs right next to each other takes almost the full 6 hours to complete due to all the false delays. It's only actually copying for maybe 20 minutes out of that time.
All the while, there is no activity happening anywhere. It's like artificial pauses are being introduced.
And even though this repo is set to 8 concurrent jobs, I will have this 1 VM in the copy sitting at 99%, and finalizing for 20 minutes, while the next 7 are all at 0% "storage initialized". All the rest are pending. It seems odd though that it doesn't process 8 jobs at once. It sits forever on 0% "storage initialized" on those 7 while the first one is sitting on "finalizing".
When you put this all together, a copy every 6 hours for these 37 VMs right next to each other takes almost the full 6 hours to complete due to all the false delays. It's only actually copying for maybe 20 minutes out of that time.
All the while, there is no activity happening anywhere. It's like artificial pauses are being introduced.
-
- Veeam Software
- Posts: 3626
- Liked: 608 times
- Joined: Aug 28, 2013 8:23 am
- Full Name: Petr Makarov
- Location: Prague, Czech Republic
- Contact:
Re: Terribly slow copy jobs
Hi Rob,
As far as I see you've already done some testing but so far it does not show any "bottlenecks" at the level of infrastructure. Also, the described symptoms are not so common, so it does not look like a known issue. Therefore, you should not try your luck in seeking for the permanent solution on forum, the chances to guess the root cause are not really high. The best way to deal with that problem is to open a support request and to ask our engineers to examine debug logs in order to clarify what's going on when backup copy is just idling for long time at 0 Kb/s. Please share a support case ID for our reference.
Thanks!
As far as I see you've already done some testing but so far it does not show any "bottlenecks" at the level of infrastructure. Also, the described symptoms are not so common, so it does not look like a known issue. Therefore, you should not try your luck in seeking for the permanent solution on forum, the chances to guess the root cause are not really high. The best way to deal with that problem is to open a support request and to ask our engineers to examine debug logs in order to clarify what's going on when backup copy is just idling for long time at 0 Kb/s. Please share a support case ID for our reference.
Thanks!
-
- Enthusiast
- Posts: 96
- Liked: 13 times
- Joined: Oct 05, 2010 3:27 pm
- Full Name: Rob Miller
- Contact:
Re: Terribly slow copy jobs
Ok I already have a case for another issue so I will bring it up to them. Thanks.
Who is online
Users browsing this forum: DanielJ and 14 guests