Backups stopping offload

pirx · Post by **pirx** » Feb 26, 2024 8:44 am this post

# 07148962

I've seen this before in the last few years but now during migration from AWS S3 to Wasabi it's happening more often. In SOBR settings I've enabled copy mode. My understanding and observation is that once a backup job starts and a restore point is created, it immediately gets offloaded to capacity tier. But this is not always working. Regularly the offloads are stopped with an error by its backup job.

23.02.2024 13:27:51 :: Processing xxxx yyyyy Error: Stopped by job 'xxxxxx' (Backup)

Backup job starts at 13:00.... first offload starts at 13:xx... a few minutes later offload fails as it is stopped by backup job.

Usually we have backup, copy and offload tasks. All of them are fighting for resources and files locks. AFAIK backup and copy have higher prio than offload, hence backup and copy stops offload.

Because of that I set blackout windows on capacity tier, for the first x hours after backup/copy no offload is allowed to run. Which makes the offload copy mode kind of useless. It's worse now as we have initial offloads to Wasabi running for days, so offloads fail with error in any case, either because stopped by blackout window or backup job.

There is no resource limit on repos. Support tells me this is how it works and that this can not be configured in any way to work better.

To summarize:

- I'd like to be able to somehow configure SOBR capacity tier copy mode to not be stopped by backup/copy jobs, if this is a resource issue, I'd like to be able to configure a minimum of resources/slots that can be used all the time for offload. Maybe offload can be paused instead of stopped?

- I think it does not make sense to report that as error, but I discusses this 3 years ago, and following Veeams logic, it counts as error (object-storage-as-a-backup-target-f52/u ... 73727.html)

Feb 26, 2024 12:48 pm

AFAIK backup and copy have higher prio than offload, hence backup and copy stops offload

You are correct. Offload jobs have the lowest priority and any available repository task slots that become available will be allocated to backup job and restore jobs (for example) before they are allocated to offload jobs.

Thanks for the posting.

Steve

pirx · Post by **pirx** » Feb 27, 2024 5:55 pm this post

That still does not explain why copy mode for offload is failing sometimes and offload tasks are stopped immediately after they started.

Backup starts - first RPs are created - offload copy mode starts for those RPs - same backup job terminated offload tasks

Today I got feedback that at least one issue I witnessed and uploaded logs for is a bug and a hotfix will be prepared.

We have a history with offload bugs and hotfixes since we started using it nearly 4 years ago. At our main site I'm not able use copy mode without using a 8 hour blackout window on capacity tier. With this it's not really a copy mode anymore. And we still have regular offload mode limited to just run once every 24h. Without those settings, we had just too many offload errors.

Post by **Ivan239** » Mar 02, 2024 4:28 pm this post

A backup job does have a higher priority than an offload job, but in the case of a copy mod, stopping the offload by backup job is not an error and should be displayed as a green message.

The general stopping logic looks like this: if a backup job sees that the lock it requires on storage has been captured by an offload job, it sends a stop request to the offload task.
The offload task, receiving such a signal, stops, suppresses the error if it is an copy mod, and retry, again trying to get a lock on the storage (thereby waiting for the end of the backup activity).

There were 2 bugs in this logic (yes, it’s sad, 2 bugs in one place). The error suppression scope was too small and if a stop occurred at the time of resource scheduling, we did not suppress the error.
Also, if a stop occurred while the agent was executing a command (stop signal sended to agent), it used wrong type of stop reason and the error was not suppressed. Privatfix should fix both bugs.

I hope there are no more errors in this logic, but if they do appear when offload jobs are stopped by backup activity, this is definitely not normal behavior and should be escalated to R&D, since if there are offload logs it’s relatively easy to understand why exactly the suppress\retray logic in the offload job doesn’t work.

Post by **veremin** » Mar 05, 2024 5:07 pm this post

The R&D team has reviewed the case and has requested the support team to escalate the ticket accordingly. It appears that you may be experiencing a known issue, for which we are fortunately already equipped with a fix.

Once the case is escalated and the issue is confirmed, you will be promptly provided with the fix.

Thanks!

pirx · Post by **pirx** » Mar 06, 2024 6:29 pm this post

Well, I already received a fix couple of days ago and installed it just today. For me interesting is, that we had this issue for years and at some point a few years back, support recommended that we should use blackout periods so that this does not happen. Is this a relatively new patch?

pirx · Post by **pirx** » Mar 07, 2024 8:13 am this post

I installed the hotfix yesterday, but at least one offload was still stopped by its backup job. This might be a corner case as the backup job for this offload is also having issues for some days now (case 07162880). This is not directly related to this post, but its again an example of old backup data not properly removed from capacity tier. We have a retention of 10 weeks and 14 days immutability. But there are still backup for this VM from 2023-11. Even with block generation it can not be explained. And even worse, only the last 5 backups (latest chain) is on performance tier, but 2 more chains are on capacity tier....

This is a issue for us since we started using offloading ~4 years ago. After one year we had a case that was open for months, at the end we had to cleanup capacity tier as there were >50TB of outdated data. The data was not visible in Veeam in any view. Similar things still happen, most of the time backups are just not deleted according to retention and prolonged over and over.

[PICTURES REMOVED BY ADMINISTRATOR]

Post by **veremin** » Mar 07, 2024 5:31 pm this post

Just to update:

We currently have a dedicated support engineer who is handling escalated cases. He is working closely with the R&D team to confirm the root causes of reported problems and provide the corresponding fixes, or ensure that the previously provided ones are working as expected.

Thanks!

R&D Forums

Backups stopping offload

Re: Backups stopping offload

Re: Backups stopping offload

Re: Backups stopping offload

Re: Backups stopping offload

Re: Backups stopping offload

Re: Backups stopping offload

Re: Backups stopping offload

Who is online