Comprehensive data protection for all workloads
DonZoomik
Service Provider
Posts: 368
Liked: 120 times
Joined: Nov 25, 2016 1:56 pm
Full Name: Mihkel Soomere
Contact:

Re: Health check on large backups

Post by DonZoomik »

Real world is almost always more complex than best practices... Easy solution would be migrating repositories to SSD (effectively unlimited IO), but that's way too expensive for multi-hundred TB repositories, for most customers.
In this particular case, blocking health check was running on inbound backup copy job (total ~50TB) so it's less time-critical than backup jobs. Under normal circumstances backup jobs complete in ~hour, backup copies in 2-3h (partially parallel to backup, mostly intersite bandwith cap bottleneck). But as datasets are large, at some point it gets too hard to cut them down into smaller jobs.

If health check will run async from main processing in v12, some slowdown might be acceptable. After all, the main objective is to not block primary backup jobs. When that goal is accomplished, the health checks can run for much longer - even old sync read behavior might be acceptable for most cases.
Gostev
Chief Product Officer
Posts: 31561
Liked: 6725 times
Joined: Jan 01, 2006 1:01 am
Location: Baar, Switzerland
Contact:

Re: Health check on large backups

Post by Gostev »

We will test lowering the I/O priority of a health check process and go from there based on the results. The change itself looks pretty simple and we might be able to add that as an option. Apparently we already have such option in the Veeam Agent for Windows (I always thought it was only affecting the process priority but apparently we set both process and I/O priorities to Low there).

Thanks a lot for bringing this up!
DonZoomik
Service Provider
Posts: 368
Liked: 120 times
Joined: Nov 25, 2016 1:56 pm
Full Name: Mihkel Soomere
Contact:

Re: Health check on large backups

Post by DonZoomik »

If I had a nickel for every Veeam bug/feature deficiency/obscure edge-case, I'd have quite a few dollars by now. :lol:

Simply lowering the IO priority is not that simple as it presumes that storage system has nothing else to do (eg current queue depth for normal priority processes is 0). If you have some other relatively low IO stuff running, health check is still heavily choked.
Coming back to this particular case, I eventually had to revert IO priority to normal because background S3 offload was generating enough IO to keep health check limping at so low throughput that it would have likely taken weeks to complete. That's why I suggested occasionally switching between two priorities - changing priorities would likely have no effect if storage has nothing else to do but health check would back down regularly to give other tasks chance to make progress. Naive and not very elegant, but maybe your engineers can come up with something better.
Gostev
Chief Product Officer
Posts: 31561
Liked: 6725 times
Joined: Jan 01, 2006 1:01 am
Location: Baar, Switzerland
Contact:

Re: Health check on large backups

Post by Gostev »

Anything more complex that just setting lower I/O priority would have to wait until after V12 for sure at this point.
DonZoomik
Service Provider
Posts: 368
Liked: 120 times
Joined: Nov 25, 2016 1:56 pm
Full Name: Mihkel Soomere
Contact:

Re: Health check on large backups

Post by DonZoomik »

Then sounds like something to put behind an undocumented registry key, like many edge cases (whichever the default would be).
Post Reply

Who is online

Users browsing this forum: No registered users and 106 guests