possible SureBackup bottlenecks

pirx · Post by **pirx** » Nov 05, 2021 11:01 am this post

We are struggeling with implementing our first surebackup job in a small cluster. We tried to verify 3-4 VMs in parallel but this leads to timeout failures. Most of the time the same VMs are failing, not different ones in each run. When those VMs are verified with troubleshoot option or if only 1 VM is allows to run, the test is successful. We already expanded the timeout to 1200sec = 20min.

Code: Select all

05.11.2021 08:29:41          Waiting for OS to boot for up to 1200 seconds (stable IP algorithm)...
05.11.2021 08:29:41          Note: Will proceed to the next step at 05.11.2021 08:49:41 or earlier
05.11.2021 08:49:41 Error    Results: Cannot detect VM starting because of timeout
05.11.2021 08:49:41          Error: Results: OS did not boot in the allotted time

We have SOBR on Linux XFS with Apollos, we have physical mount hosts, we have 10GbE. No other jobs are running at that time. In Linux I used historical data to check extent latency, it does not show more than ~8ms. I'll try to check this in real time next, but I can't imagine that the repository performance too low to start 3-4 VMs in a decent time.

A collegue opened case 05102692 for this.

Post by **Gostev** » Nov 05, 2021 11:52 am this post

Yeah you can just open the VM console from SureBackup session and see that the VM booted up correctly (and likely very fast with the storage you have). This will limit required troubleshooting to detection issues (VMware Tools, networking etc.)

pirx · Post by **pirx** » Nov 05, 2021 12:20 pm this post

That's the thing, we see in console that the VM is not booting, no real progress. Lets see what support can find out, it's just strange that this depends so much on the number of VMs. As only VM the VM boots pretty quick and test is succesful.

Post by **Gostev** » Nov 05, 2021 1:06 pm this post

This would point to lack of RAM on the mount server.

pirx · Post by **pirx** » Nov 05, 2021 4:55 pm this post

I doubt that, the server has 128 GB RAM an the VMs that fail are rather small. With longer timeouts we now have 2 VMs that still fail, the one that only fails with parallel processing. And one (vROPs) that always fails. In troubleshooting mode I can see that vROPS VM boots without any problems and very quick. I can login and ping the gateway (helper appliance), but still the job failed with not reachable. SureBackup is really a tough one.

Post by **Gostev** » Nov 06, 2021 1:04 am this post

There's nothing tough about SureBackup. But it does not carry magic that allows for establishing network connections to unreachable hosts

just need to understand why this particular machine is unreachable when other are. Should not be hard to troubleshoot I hope...

pirx · Post by **pirx** » Nov 18, 2021 3:44 pm this post

After some weeks of debugging.... We were trying to find the root cause why the vROPS VM always is failing. The problem is that the SureBackup jobs immediately starts the ping test once the VMware tools are running and an IP is displayed in vSphere client. The problem here is that this VM is still not reachable for a couple of minutes (maybe iptabels rules, don't know). Anyhow, we added a larger timeout of 20min, but this timeout is not honored because its only valid for the time until the VMware tools are up. So once the vmware tools are up there is no way to tell Veeam that it should wait a little longer. Only option here is to disable ping test. The VM is reachable after ~8-10minutes.

And we still have 2 VMs that fail - and thus the whole job - even though we have disabled heartbeat and ping test because we know both VM's have issues. And I'm pretty sure that all our backup jobs have some of those VM's (jobs have 100-150 VM's).

So for me, SureBackup is a tought one where only magic can help prevent running into errors.

R&D Forums

possible SureBackup bottlenecks

Re: possible SureBackup bottlenecks

Re: possible SureBackup bottlenecks

Re: possible SureBackup bottlenecks

Re: possible SureBackup bottlenecks

Re: possible SureBackup bottlenecks

Re: possible SureBackup bottlenecks

Who is online