-
- Service Provider
- Posts: 372
- Liked: 120 times
- Joined: Nov 25, 2016 1:56 pm
- Full Name: Mihkel Soomere
- Contact:
Re: Health check on large backups
Real world is almost always more complex than best practices... Easy solution would be migrating repositories to SSD (effectively unlimited IO), but that's way too expensive for multi-hundred TB repositories, for most customers.
In this particular case, blocking health check was running on inbound backup copy job (total ~50TB) so it's less time-critical than backup jobs. Under normal circumstances backup jobs complete in ~hour, backup copies in 2-3h (partially parallel to backup, mostly intersite bandwith cap bottleneck). But as datasets are large, at some point it gets too hard to cut them down into smaller jobs.
If health check will run async from main processing in v12, some slowdown might be acceptable. After all, the main objective is to not block primary backup jobs. When that goal is accomplished, the health checks can run for much longer - even old sync read behavior might be acceptable for most cases.
In this particular case, blocking health check was running on inbound backup copy job (total ~50TB) so it's less time-critical than backup jobs. Under normal circumstances backup jobs complete in ~hour, backup copies in 2-3h (partially parallel to backup, mostly intersite bandwith cap bottleneck). But as datasets are large, at some point it gets too hard to cut them down into smaller jobs.
If health check will run async from main processing in v12, some slowdown might be acceptable. After all, the main objective is to not block primary backup jobs. When that goal is accomplished, the health checks can run for much longer - even old sync read behavior might be acceptable for most cases.
-
- Chief Product Officer
- Posts: 31805
- Liked: 7298 times
- Joined: Jan 01, 2006 1:01 am
- Location: Baar, Switzerland
- Contact:
Re: Health check on large backups
We will test lowering the I/O priority of a health check process and go from there based on the results. The change itself looks pretty simple and we might be able to add that as an option. Apparently we already have such option in the Veeam Agent for Windows (I always thought it was only affecting the process priority but apparently we set both process and I/O priorities to Low there).
Thanks a lot for bringing this up!
Thanks a lot for bringing this up!
-
- Service Provider
- Posts: 372
- Liked: 120 times
- Joined: Nov 25, 2016 1:56 pm
- Full Name: Mihkel Soomere
- Contact:
Re: Health check on large backups
If I had a nickel for every Veeam bug/feature deficiency/obscure edge-case, I'd have quite a few dollars by now.
Simply lowering the IO priority is not that simple as it presumes that storage system has nothing else to do (eg current queue depth for normal priority processes is 0). If you have some other relatively low IO stuff running, health check is still heavily choked.
Coming back to this particular case, I eventually had to revert IO priority to normal because background S3 offload was generating enough IO to keep health check limping at so low throughput that it would have likely taken weeks to complete. That's why I suggested occasionally switching between two priorities - changing priorities would likely have no effect if storage has nothing else to do but health check would back down regularly to give other tasks chance to make progress. Naive and not very elegant, but maybe your engineers can come up with something better.
Simply lowering the IO priority is not that simple as it presumes that storage system has nothing else to do (eg current queue depth for normal priority processes is 0). If you have some other relatively low IO stuff running, health check is still heavily choked.
Coming back to this particular case, I eventually had to revert IO priority to normal because background S3 offload was generating enough IO to keep health check limping at so low throughput that it would have likely taken weeks to complete. That's why I suggested occasionally switching between two priorities - changing priorities would likely have no effect if storage has nothing else to do but health check would back down regularly to give other tasks chance to make progress. Naive and not very elegant, but maybe your engineers can come up with something better.
-
- Chief Product Officer
- Posts: 31805
- Liked: 7298 times
- Joined: Jan 01, 2006 1:01 am
- Location: Baar, Switzerland
- Contact:
Re: Health check on large backups
Anything more complex that just setting lower I/O priority would have to wait until after V12 for sure at this point.
-
- Service Provider
- Posts: 372
- Liked: 120 times
- Joined: Nov 25, 2016 1:56 pm
- Full Name: Mihkel Soomere
- Contact:
Re: Health check on large backups
Then sounds like something to put behind an undocumented registry key, like many edge cases (whichever the default would be).
Who is online
Users browsing this forum: gerardjm, janbe, mathien, miguel.salinas, Mircea Dragomir and 122 guests