Ressource scheduler lock incapacitates VBR

Post by **mdiver** » Sep 06, 2022 11:15 am this post

In a customer environment we have the following setup:

100 HyperV hosts (onhost proxies, 2 threads each = 200 theoretical proxy threads aka virtual disks to be backed up in parallel)

~900 VMs to be backed up daily and copied to a second fire zone - in addition some SQL/Oracle log-shipping

2 SOBRs consisting of 6 Windows extents each with 4 threads each (24 repo threads aka 24 VMs to be backed up in parallel)

~40 primary backup jobs to one of the SOBRs

3 backup copy jobs from the primary to the other SOBR

The environment was sized according to ADO/VBRAD guidelines and worked flawlessly and performant with up to 24 VMs being backed up in parallel for many months.

But in December last year for the first time and now once again last week we observed the following:
Backup performance suddenly became very bad. Only 3-5 proxy threads were handled at the same time, though in theory we should be able to accomodate at least 24 VMs within the repo, depending on the number of vdisks a VM carries of course.
During the issue, backup copy jobs were slowly "dripping" to the other SOBR. Primary backup jobs could only backup 2-3 VMs at the same time violating SLAs heavily.

For some reason it looks as if the ressource scheduler was in a lockdown state not able to distribute the threads in a timely manner.

Together with Veeam support (#05594646) no solution other than rebooting VBR with stopped jobs was found. After that, all the ressources were available again and the backup ran fast as before.

The core reason was estimated to be the backup copy jobs not freeing up the ressources (repo threads) due to undefined scheduler issues with overlapping primary / backup copy jobs.
As a workaround we suggested to lock out the backup copy jobs via scheduling times from the estimated primary backup window to seperate the ressource consumption.
In theory VBR should be able to handle them side by side as backup copy has lower priority than primary backup.
Support was not able to determine the root cause neither could they provide measures to prevent the issue from happening.

Has anyone else seen something alike?

Thanks.

Post by **RomanK** » Sep 06, 2022 11:46 am this post

Hello mdiver,

Support was not able to determine the root cause

Do you know that it is possible to escalate the ticket (talk to manager button within support portal), if you are not satisfied with the quality or time of response?

I can see that #05594646 is marked as closed and the issue was resolved. As for me rebooting VBR doesn't look like a solution. I think you may ask to reopen this case or create a new one and request a detailed root cause analysis.

Thanks

R&D Forums

Ressource scheduler lock incapacitates VBR

Re: Ressource scheduler lock incapacitates VBR

Who is online