Comprehensive data protection for all workloads
Post Reply
maanlicht
Enthusiast
Posts: 32
Liked: 6 times
Joined: Apr 05, 2023 1:06 pm
Full Name: maanlicht
Contact:

BackupCopy jobs fail after 6x onsite backups

Post by maanlicht »

Dear Veeam community,

The locations I have to protect with my B&R environment are in very remote, poorly connected sites. Veeam has been great in utilizing these high latency connections. Even so… the laws of physics state the initial offsite copy will take several weeks to complete. This is perfectly acceptable and understandable under these conditions.

While the BackupCopy job is running it gets briefly ‘paused’ as the daily onsite job locks the files as it takes priority. Every time this happens the BackupCopy jobs halts… nicely waits for the onsite job to finish… and then continues. This repeats every day. However after 6x times the BackupCopy jobs fails consistently with the error: ‘Source restore point is locked by another job’

I ran some experiments and it seems it fails after a fixed number of continuations. If I reduce the frequency of the primary job to half it allows the BackupCopy job to run twice as long but also again failing at the 6th interruption. It seems this is by design and I observe it in multiple B&R instances.

As a workaround I can simply disable the primary job, but I feel it’s unacceptable to go without new backups for weeks.
My question to the community or any Veeam tech is: Is there a way to manipulate the number of retries before the copy jobs fails?

PS: As a side note, the same thing happens in sites with frequent short internet outages. Copy jobs forgive exactly 6 outages before the BackupCopy job fails. This appears to be the same mechanism.
PPS: The jobs schedules are set to retry 3x
Mildur
Product Manager
Posts: 10976
Liked: 3014 times
Joined: May 13, 2017 4:51 pm
Full Name: Fabian K.
Location: Switzerland
Contact:

Re: BackupCopy jobs fail after 6x onsite backups

Post by Mildur »

Hi Maanlicht

Before I check the technical details. Did you have considered seeding of the copy job? Seeding must be much faster than waiting for several weeks till the job has finished.
https://helpcenter.veeam.com/docs/backu ... ml?ver=120

Best,
Fabian
Product Management Analyst @ Veeam Software
maanlicht
Enthusiast
Posts: 32
Liked: 6 times
Joined: Apr 05, 2023 1:06 pm
Full Name: maanlicht
Contact:

Re: BackupCopy jobs fail after 6x onsite backups

Post by maanlicht »

Yes I have. And its a valid workaround in some use cases. However some of these locations shipping harddisks has proven to be impractical, expensive and also takes several weeks. Especially in remote areas of Africa. Unfortunately DHL doesn't come everywhere.
tyler.jurgens
Veeam Software
Posts: 441
Liked: 260 times
Joined: Apr 11, 2023 1:18 pm
Full Name: Tyler Jurgens
Contact:

Re: BackupCopy jobs fail after 6x onsite backups

Post by tyler.jurgens »

This is what I've done in the past for copying across slow links.

Backup Jobs - Set to forward forever incremental. Make the retention policy long enough to finish the backup copy job, so that it doesn't merge/remove the original full backup before the backup copy job completes at least the initial full backup.
Backup Copy Jobs - Make sure these are in "immediate" mode. Set the retention policy to many days (and no GFS). I like immediate mode backup copy jobs because it doesn't restart periodically, making it easier for that looooooong time seeding to happen. This also means once it finishes with the full backup, it will start working through the incrementals.
Do not use Reverse Incrementals!

Once you get that fully seeded, you can adjust your retention policies on your Backup Jobs and Backup Copy Jobs to the retention policy you like. Enable synthetic fulls and GFS as required. The goal of the above is to limit any impact on the backup or backup copy jobs during the initial seeding process. Since Veeam can modify the retention policy 'on the fly', you can always fix it later and Veeam will generate the new chain to match what you set after, cleaning up the old chain once your new chain is built.
Tyler Jurgens
Blog: https://explosive.cloud
Twitter: @Tyler_Jurgens BlueSky: @explosive.cloud
maanlicht
Enthusiast
Posts: 32
Liked: 6 times
Joined: Apr 05, 2023 1:06 pm
Full Name: maanlicht
Contact:

Re: BackupCopy jobs fail after 6x onsite backups

Post by maanlicht »

@tjurgens-s2d
Yes I can confirm the method you describe is indeed the best practice for slow BackupCopy jobs. I had to learn these lessons the hard way. Hopefully someone in the future can benefit from this experience. However note that the problem described in the original post already has these optimization's implemented. On top of this the copy jobs still get interrupted if the primary job runs and only tolerates 6 interruptions.
Once's interrupted the job restarts and gets into an infinite retry loop.
tyler.jurgens
Veeam Software
Posts: 441
Liked: 260 times
Joined: Apr 11, 2023 1:18 pm
Full Name: Tyler Jurgens
Contact:

Re: BackupCopy jobs fail after 6x onsite backups

Post by tyler.jurgens »

I've only seen the copy job get interrupted due to the primary job running and merge operations happening. Are you certain the backup job is still creating incremental files and not merging the oldest into the full backup?
Tyler Jurgens
Blog: https://explosive.cloud
Twitter: @Tyler_Jurgens BlueSky: @explosive.cloud
maanlicht
Enthusiast
Posts: 32
Liked: 6 times
Joined: Apr 05, 2023 1:06 pm
Full Name: maanlicht
Contact:

Re: BackupCopy jobs fail after 6x onsite backups

Post by maanlicht »

@tjurgens-s2d
I think we are talking about the same thing here. The interruptions you mention because of the primary job is exactly what I mean. Veeam only tolerates 6 of these interruptions before it fails. In practice this means a copyjob cannot run for longer that 6 days if the primary backup runs daily.
tyler.jurgens
Veeam Software
Posts: 441
Liked: 260 times
Joined: Apr 11, 2023 1:18 pm
Full Name: Tyler Jurgens
Contact:

Re: BackupCopy jobs fail after 6x onsite backups

Post by tyler.jurgens »

Right. Hence, push your retention policy on your primary backup job out to have more incremental backups (no synthetic, no active full, no GFS). Make it something crazy like an additional 90 days above what your existing retention is. You want to avoid that interruption ever happening because when it starts, your job will never succeed. It can't succeed once it starts to be interrupted, no matter how many retries you want, because the original full backup no longer exists to be copied - it gets transformed to a new VBK when the incrementals start merging. Hence, if you have a 30 day forever forward backup job, make it a 120 day forever forward backup job.

Fix the retention on your backup job *after* your backup copy job has caught up, assuming it can at least push the incremental delta across the wire in between backup job runs (Eg: If you run your backup daily, make sure the backup copy of that incremental can complete within that day).
Tyler Jurgens
Blog: https://explosive.cloud
Twitter: @Tyler_Jurgens BlueSky: @explosive.cloud
mcz
Veteran
Posts: 948
Liked: 223 times
Joined: Jul 19, 2016 8:39 am
Full Name: Michael
Location: Rheintal, Austria
Contact:

Re: BackupCopy jobs fail after 6x onsite backups

Post by mcz »

btw, a stopped bcj should continue on the next run at the point where it stopped - assumed that the source point is still existent...
Mildur
Product Manager
Posts: 10976
Liked: 3014 times
Joined: May 13, 2017 4:51 pm
Full Name: Fabian K.
Location: Switzerland
Contact:

Re: BackupCopy jobs fail after 6x onsite backups

Post by Mildur » 1 person likes this post

Hello @maanlicht

I talked to our QA team. We have indeed a limit on task retries with locked source backup files.

In version 12, a periodic backup copy session will continue copying a restore point until it is successfully copied. If the copy job is interrupted by the source job due to locked backup files, the copy job will pause the copy session until the backup files are released. If this interruption occurs multiple times, the copy job will fail, and a notification will be sent out.

This behaviour is intentional as it serves the purpose of informing the backup administrator that there was insufficient time to copy restore points between the scheduled times of the copy job.

To configure the maximum of retries, you can use a registry key. Please don't use too high numbers. If you are only affected during the initial copy, please remove the key after successfully copying your first backup.

Code: Select all

Path: HKLM\SOFTWARE\Veeam\Veeam Backup and Replication
Name: BackupSyncMaxRetriesPerOib
Type: DWORD
Value: x (Default 5)
Product Management Analyst @ Veeam Software
maanlicht
Enthusiast
Posts: 32
Liked: 6 times
Joined: Apr 05, 2023 1:06 pm
Full Name: maanlicht
Contact:

Re: BackupCopy jobs fail after 6x onsite backups

Post by maanlicht »

Ooh that is exactly what I was looking for! Thanks a lot! Go Veeam!
Post Reply

Who is online

Users browsing this forum: Amazon [Bot] and 28 guests