Backup of enterprise applications (Microsoft stack, IBM Db2, MongoDB, Oracle, PostgreSQL, SAP)
Post Reply
JGranden
Novice
Posts: 9
Liked: 13 times
Joined: Jun 07, 2022 7:36 pm
Full Name: Jesse Granden
Contact:

[Feature Request] MS SQL Plugin make -Parallelism deterministic: Case # 07245426

Post by JGranden » 2 people like this post

follow-up FR discussion for Case # 07245426

TL/DR: -Parallelism parameter may be reduced by available task slots on the repository. e.g -Parallelism=8 may be reduced to 6 or 2 or 1 depending on other tasks.

Additional thoughts regarding the feature request for guaranteeing parallelism:

The two motivating factors for guaranteeing a specific number of backup streams are:
1. Ensure backups complete within a designated backup window. This avoids negative production impacts.
2. Ensure restores can complete within a specific amount of time. This allows us to hit our RTO.

For a multi-TB database, both these points become crucial as we've seen backup times go from ~2 hours to ~12 with analogous potential restore time increases as well. Having a 10+ hr variance in RTO in unacceptable.
  • Sometimes we are ok with less streams, even 1.
  • Sometimes we want to wait for slots to become available.
  • Sometimes we would be ok with less than requested as long as we get a set minimum.
  • Sometimes we want to fail if a backup can't be taken with the requested number of streams.
(failing a backup job fast has advantages regarding monitoring and/or taking alternate actions, e.g. take a DIFF instead of a FULL)

Making the -Parallelism parameter behave as documented would be a breaking change.
2 additional parameters are needed to accomplish this without changing existing behavior:

-MinParallelism -- required minimum number of streams to run job. (defaults to 1)
-ResourceWaitTimeout -- Minutes to wait for resources to become available, (defaults to infinite) after that, error out.

Adding these two parameters would give customers the ability to ensure they stay within their backup window and hit specific RTO objectives. Existing behavior could remain unchanged as it is equivalent to -MinParallelism=1 and -ResourceWaitTimeout=infinite

Alternatively, you could modify the existing Parallelism parameter to actually do what it is documented to do, but we would still need the new timeout parameter.

Some special cases, like specifying -MinParallelism=48 when the repo only has 10 slots should fast fail instead of waiting forever.

final thoughts:
As an enterprise DBA, I need consistency and predictability for backup jobs (and hence control over parallelism). it's much easier for me to troubleshoot backup job failures versus random variations in backup time caused by the complex interplay of whatever else happens to be using task slots at the time I kick off a SQL Backup. a simple error message "insufficient repository tasks slots are available" would have probably eliminated the need for this support case.
PetrM
Veeam Software
Posts: 3626
Liked: 608 times
Joined: Aug 28, 2013 8:23 am
Full Name: Petr Makarov
Location: Prague, Czech Republic
Contact:

Re: [Feature Request] MS SQL Plugin make -Parallelism deterministic: Case # 07245426

Post by PetrM »

Hi Jesse,

You made so many interesting requests today! There is no error in this case because I thought it would be better to have a slow backup than to not have a backup at all. Looks like you have the opposite point of view. I noted your request but I cannot comment on ETA.

May I ask you what is the reason for the issue with the task slots on the repository? Why it's not enough? Is there a chance to use a dedicated repository for plug-in backups?

Thanks!
JGranden
Novice
Posts: 9
Liked: 13 times
Joined: Jun 07, 2022 7:36 pm
Full Name: Jesse Granden
Contact:

Re: [Feature Request] MS SQL Plugin make -Parallelism deterministic: Case # 07245426

Post by JGranden » 1 person likes this post

I've been saving up requests while we've been onboarding veeam. I promise it will slow down after today :)

These Vbr hosts are dedicated to SQL. Our main issue was that our standard backup job kicked off everything at approximately the same time. (oops!) We then got confusing results as specifying 8 streams resulted in "waits for backup resources" followed by 1,2,3,4 or sometimes 6 or 8 streams actually being used.

This veeam environ has ~120 SQL Servers spread evenly across 2 Vbr servers, each hosting 5 repos (dedicated volume per repo). so about 10-12 SQL servers per repo. we want to run 8 backup streams on a handful multi-TB db's, and fewer streams on our smaller db's. the Wasabi SOBR tiering jobs kick off and tie up slots as well.

The first thing we did was to re-schedule things to avoid multiple SQL servers backing up to the same repo at the same time. This made the biggest improvement.
Also, we increased the number of task slots to 12 per repo (to allow for the big 8 stream backups + some extra for SOBR tiering). This has resolved our issues.

At first glance, we could potentially have 60 tasks running concurrently, but it's usually like 8-12 and we're getting good cpu utilization on the Vbr servers, so we're happy.

So, in summary, sub-optimal configuration for our job scheduling combined with not enough task slots.
PetrM
Veeam Software
Posts: 3626
Liked: 608 times
Joined: Aug 28, 2013 8:23 am
Full Name: Petr Makarov
Location: Prague, Czech Republic
Contact:

Re: [Feature Request] MS SQL Plugin make -Parallelism deterministic: Case # 07245426

Post by PetrM »

It's crystal clear now, thanks again. By the way, I saw a similar question a few weeks ago, I agree with all the arguments, it fully makes sense to fine-tune our task scheduling algorithms in the future.

Thanks!
Post Reply

Who is online

Users browsing this forum: No registered users and 2 guests