Discussions related to using object storage as a backup target.
jshapiro
Enthusiast
Posts: 35
Liked: 2 times
Joined: Jun 23, 2011 3:11 pm
Full Name: Jonathan Shapiro
Contact:

Offloading Jobs stuck at 99% for Days

Post by jshapiro » 1 person likes this post

Hello:

I have a SOBR using Wasabi for the capacity tier. Recently, I swapped in a new storage bucket with object lock to set up immutability for my backups. Initial offloading took about a week to get my most recent backup chains into the capacity tier. What I've noticed, now that I'm mostly caught up, is that general offloading jobs still kick off and seem to progress nicely till they hit 99%. They seem essentially done, but they hang for days at 99% without transferring any more data, and I don't know what the system is doing. The jobs do close out eventually, but I'm left with more general offloading jobs kicking off and hanging for a similar amount of time. I'm on v12. What is going on? What logs should I check? Should I cancel these long running jobs?
Mildur
Product Manager
Posts: 8735
Liked: 2294 times
Joined: May 13, 2017 4:51 pm
Full Name: Fabian K.
Location: Switzerland
Contact:

Re: Offloading Jobs stuck at 99% for Days

Post by Mildur »

Hi Jonathan

I strongly recommend to open a case with our customer support. We cannot solve such issues over this forum.
Without a case number this topic may be deleted by a moderator.

Best,
Fabian
Product Management Analyst @ Veeam Software
jshapiro
Enthusiast
Posts: 35
Liked: 2 times
Joined: Jun 23, 2011 3:11 pm
Full Name: Jonathan Shapiro
Contact:

Re: Offloading Jobs stuck at 99% for Days

Post by jshapiro »

Thanks. I opened a case. Case #05968616
jshapiro
Enthusiast
Posts: 35
Liked: 2 times
Joined: Jun 23, 2011 3:11 pm
Full Name: Jonathan Shapiro
Contact:

Re: Offloading Jobs stuck at 99% for Days

Post by jshapiro »

5 days later, and all I heard was that my ticket would be sent to the object storage group. Nobody from that group contacted me.
EWMarco
Service Provider
Posts: 39
Liked: 7 times
Joined: Feb 20, 2023 9:28 am
Full Name: Marco Glavas
Contact:

Re: Offloading Jobs stuck at 99% for Days

Post by EWMarco »

All I can tell you right now is that we see similar things.
EWMarco
Service Provider
Posts: 39
Liked: 7 times
Joined: Feb 20, 2023 9:28 am
Full Name: Marco Glavas
Contact:

Re: Offloading Jobs stuck at 99% for Days

Post by EWMarco »

I think it's a grave oversight that some things are not displayed in the job transcripts... like checkpoint cleanups. You only get to see a message, when it fails. And since some of them take a day or more, they usually get interrupted by the next backup cycle.

I have no idea what that does to data integrity but I'm assuming we keep loosing literal days on our offloading on things like this.
Mildur
Product Manager
Posts: 8735
Liked: 2294 times
Joined: May 13, 2017 4:51 pm
Full Name: Fabian K.
Location: Switzerland
Contact:

Re: Offloading Jobs stuck at 99% for Days

Post by Mildur »

Hi @jshapiro

I'm very sorry that you had to wait for two days.
Please let me know if it happens again.

You can also use the Escalate to Support Management option:
https://www.veeam.com/kb2320

Hi @EWMarco
Indeed. Session details don't display everything. There are a lot of background tasks which are only visible in our debug logs.
If you see the same issue, please open a support case and provide me with the case number. Thank you.

Best,
Fabian
Product Management Analyst @ Veeam Software
jshapiro
Enthusiast
Posts: 35
Liked: 2 times
Joined: Jun 23, 2011 3:11 pm
Full Name: Jonathan Shapiro
Contact:

Re: Offloading Jobs stuck at 99% for Days

Post by jshapiro »

I finally heard back from support, and their log analysis possibly revealed something. They noticed that when I upgraded from version 11 to 12, I was set in Direct Mode for connection type to the capacity tier. Support told me to change that to Connect through a gateway server, and then select my preferred gateway(s). In my case, I selected my storage server so it could offload directly to Wasabi over the Internet. I made this change late yesterday, and at the time I did it, I had two offload jobs stuck at 99%. Both wrapped up into the night, and another one started at 2:00 AM this morning and finished very quickly. Maybe this was the issue. Right now, my Veeam server is idle. I will continue to watch it.
Mildur
Product Manager
Posts: 8735
Liked: 2294 times
Joined: May 13, 2017 4:51 pm
Full Name: Fabian K.
Location: Switzerland
Contact:

Re: Offloading Jobs stuck at 99% for Days

Post by Mildur »

Hi Jonathan

Thanks for the update.

Best,
Fabian
Product Management Analyst @ Veeam Software
jshapiro
Enthusiast
Posts: 35
Liked: 2 times
Joined: Jun 23, 2011 3:11 pm
Full Name: Jonathan Shapiro
Contact:

Re: Offloading Jobs stuck at 99% for Days

Post by jshapiro »

Seemed to be better for a couple of days, but now I have some offload jobs stuck at 99% again. I updated the ticket notes.
slide999
Novice
Posts: 5
Liked: never
Joined: Sep 25, 2014 9:40 pm
Full Name: Kevin
Contact:

Re: Offloading Jobs stuck at 99% for Days

Post by slide999 »

same exact issue..opening ticket. Case # 06003117.
veremin
Product Manager
Posts: 20284
Liked: 2258 times
Joined: Oct 26, 2012 3:28 pm
Full Name: Vladimir Eremin
Contact:

Re: Offloading Jobs stuck at 99% for Days

Post by veremin »

Kindly post its number here, once it's opened. This way we can follow and assist the investigation. Thanks!
nathanrsafti
Influencer
Posts: 18
Liked: 10 times
Joined: Nov 07, 2022 4:48 pm
Full Name: Nathan
Contact:

Re: Offloading Jobs stuck at 99% for Days

Post by nathanrsafti »

Been seeing the same thing here ever since upgrading to V12 and switching S3 repo to Gateway per case 05878826. Offloads stuck at 99%, usually takes 24hrs+ to remove a single checkpoint from S3. Everything eventually goes through but very very slow. Seems to be a toss-up whether or not the offloads will complete quickly or stall every night.
TonioRoffo
Enthusiast
Posts: 53
Liked: 5 times
Joined: Jun 18, 2009 2:27 pm
Full Name: Yves Smolders
Contact:

Re: Offloading Jobs stuck at 99% for Days

Post by TonioRoffo »

I've got the same going on with V11.

A small server being offloaded to WASABI, something takes a long time for very small incrementals:

The delta's are truly only in the megabyte range (500mb to a few gigs at most) - usually the offload completes within minutes, sometimes it takes up to half an hour and one even took 10 hours.

in the logs I have repetitions of this:

[19.04.2023 10:54:51.040] < 17344> srv | Waiting for the next server command.
[19.04.2023 10:54:51.040] < 17344> srv | _______________________________________________________________________________
[19.04.2023 10:54:54.739] < 17344> srv | retrieved command: 154 (HandleRemoteArchClient(154))
[19.04.2023 10:54:54.739] < 17344> arh | Cleaning up storage blocks in archive
[19.04.2023 10:54:54.739] < 17344> arh | Using local client for archive repository '6996b00e-2a4b-43cd-8288-bfcb79831333'.
[19.04.2023 10:54:55.037] < 17344> srv | Command successfully processed, elapsed: 0.3020
[19.04.2023 10:54:55.037] < 17344> srv |
[19.04.2023 10:54:55.037] < 17344> srv | Waiting for the next server command.
[19.04.2023 10:54:55.037] < 17344> srv | _______________________________________________________________________________
[19.04.2023 10:54:57.184] < 17344> srv | retrieved command: 154 (HandleRemoteArchClient(154))
[19.04.2023 10:54:57.184] < 17344> arh | Cleaning up storage blocks in archive
[19.04.2023 10:54:57.184] < 17344> arh | Using local client for archive repository '6996b00e-2a4b-43cd-8288-bfcb79831333'.
[19.04.2023 10:54:57.353] < 17344> srv | Command successfully processed, elapsed: 0.1650
[19.04.2023 10:54:57.353] < 17344> srv |
[19.04.2023 10:54:57.353] < 17344> srv | Waiting for the next server command.
[19.04.2023 10:54:57.353] < 17344> srv | _______________________________________________________________________________
[19.04.2023 10:55:10.296] < 17344> srv | retrieved command: 154 (HandleRemoteArchClient(154))
[19.04.2023 10:55:10.296] < 17344> arh | Cleaning up storage blocks in archive
[19.04.2023 10:55:10.296] < 17344> arh | Using local client for archive repository '6996b00e-2a4b-43cd-8288-bfcb79831333'.
[19.04.2023 10:56:00.946] < 17344> srv | Command successfully processed, elapsed: 50.6590
[19.04.2023 10:56:00.946] < 17344> srv |
[19.04.2023 10:56:00.946] < 17344> srv | Waiting for the next server command.
[19.04.2023 10:56:00.946] < 17344> srv | _______________________________________________________________________________
[19.04.2023 10:56:01.942] < 17344> srv | retrieved command: 154 (HandleRemoteArchClient(154))
[19.04.2023 10:56:01.942] < 17344> arh | Cleaning up storage blocks in archive
[19.04.2023 10:56:01.942] < 17344> arh | Using local client for archive repository '6996b00e-2a4b-43cd-8288-bfcb79831333'.

Edit: about to open a case
TonioRoffo
Enthusiast
Posts: 53
Liked: 5 times
Joined: Jun 18, 2009 2:27 pm
Full Name: Yves Smolders
Contact:

Re: Offloading Jobs stuck at 99% for Days

Post by TonioRoffo »

Opened a case #06018563
TheJourney
Influencer
Posts: 22
Liked: 4 times
Joined: Dec 10, 2009 8:44 pm
Full Name: Sam Journagan
Contact:

Re: Offloading Jobs stuck at 99% for Days

Post by TheJourney »

Same issue, using Wasabi as well. Guess ill open a Ticket...
jrogers_winsor
Service Provider
Posts: 2
Liked: never
Joined: Jun 29, 2021 8:22 pm
Contact:

Re: Offloading Jobs stuck at 99% for Days

Post by jrogers_winsor »

Same issue, but using Cloudian storage. I'll be opening a ticket as soon as I'm allowed to do so by Veeam, but in the meantime, watching this thread like a hawk...
jshapiro
Enthusiast
Posts: 35
Liked: 2 times
Joined: Jun 23, 2011 3:11 pm
Full Name: Jonathan Shapiro
Contact:

Re: Offloading Jobs stuck at 99% for Days

Post by jshapiro » 1 person likes this post

The Veeam engineer gave me some regedits to apply to the Veeam server to optimize for Wasabi. I applied these, and offload jobs seemed to run better for a few weeks. They are once again getting stuck at 99% for days and getting stacked up. I just opened another ticket. Here's the regedits I had applied:

New-ItemProperty -Path 'HKLM:\SOFTWARE\Veeam\Veeam Backup and Replication\' -Name 'S3ConcurrentTaskLimit' -Value "10" -PropertyType DWORD -Force

New-ItemProperty -Path 'HKLM:\SOFTWARE\Veeam\Veeam Backup and Replication\' -Name 'S3RequestTimeoutSec' -Value "900" -PropertyType DWORD -Force

New-ItemProperty -Path 'HKLM:\SOFTWARE\Veeam\Veeam Backup and Replication\' -Name 'S3RequestRetryTotalTimeoutSec' -Value "9000" -PropertyType DWORD -Force
DeanCTS
Service Provider
Posts: 19
Liked: 2 times
Joined: May 27, 2021 3:48 am
Full Name: Dean Anderson
Contact:

Re: Offloading Jobs stuck at 99% for Days

Post by DeanCTS »

jshapiro wrote: Apr 03, 2023 4:59 pm Seemed to be better for a couple of days, but now I have some offload jobs stuck at 99% again. I updated the ticket notes.
How is it looking like now that you've applied regedit optimizations for the backup service to Wasabi? Any major improvements?
thebdur
Lurker
Posts: 1
Liked: never
Joined: Apr 26, 2023 5:32 pm
Contact:

Re: Offloading Jobs stuck at 99% for Days

Post by thebdur »

I'm having this same issue. Applied the same registry keys and the offloading does go through eventually, but they take an abnormally long time. The upload of the data will take 5-15 minutes, then it will set at 99% for all VMs for an hour or more.
RonanD
Lurker
Posts: 1
Liked: never
Joined: May 17, 2023 11:28 am
Contact:

Re: Offloading Jobs stuck at 99% for Days

Post by RonanD »

I think your job stay block on "cleaning"

i have had the same problem, solved on this case Case #04958240

[HKEY_LOCAL_MACHINE\SOFTWARE\Veeam\Veeam Backup and Replication]
"StgIndexCleanupTaskSize"=hex(b):00,28,00,00,00,00,00,00
"StgIndexEnableCache"=dword:00000001
"StgIndexUploadTaskSize"=hex(b):00,28,00,00,00,00,00,00


StgIndexCleanupTaskSize : 2800
StgIndexEnableCache : 1
StgIndexUploadTaskSize : 2800

You need more RAM and can double the value for increase the process
jshapiro
Enthusiast
Posts: 35
Liked: 2 times
Joined: Jun 23, 2011 3:11 pm
Full Name: Jonathan Shapiro
Contact:

Re: Offloading Jobs stuck at 99% for Days

Post by jshapiro »

I stripped out 1 of the 3 registry tweaks the original Veeam engineer provided for Wasabi S3 storage. The one I stripped limited S3 Concurrent Task Limit to 10. This caused offloading to run way too slowly. Not enough offload threads running. I opened another support ticket with Veeam to complain that offload jobs were back to getting stuck at 99%. The job would get stuck with specific VM's within the backup job getting stuck deleting checkpoints. At this stage, it could take days for some of them to process, and there wasn't a lot of visual feedback anything is happening. Anyway, the solution was to update Veeam to 12.0.0.1420_20230412. This update includes a number of fixes including jobs taking a long time to delete checkpoints. With the S3 Concurrent Task Limit tweak removed, and with the update, things have been good.
orty229
Novice
Posts: 6
Liked: never
Joined: May 21, 2023 4:36 pm
Contact:

Re: Offloading Jobs stuck at 99% for Days

Post by orty229 »

Any updates for the issue with us with 11a?
veremin
Product Manager
Posts: 20284
Liked: 2258 times
Joined: Oct 26, 2012 3:28 pm
Full Name: Vladimir Eremin
Contact:

Re: Offloading Jobs stuck at 99% for Days

Post by veremin »

The R&D team believes that the original issue was caused by unoptimized enumeration logic existing in v12 prior to the latest build (12.0.0.1420 P20230412).

Some of the object storage repository operations (offload, rescan, checkpoint deletion, etc.) relied on that mechanism and experienced performance degradation as a result.

The build 12.0.0.1420 P20230412 enhanced the procedure dramatically and got rid of extensive requests in a few places.

We recommend updating to the latest product version and seeing whether it solves the problem.

Thanks!
veremin
Product Manager
Posts: 20284
Liked: 2258 times
Joined: Oct 26, 2012 3:28 pm
Full Name: Vladimir Eremin
Contact:

Re: Offloading Jobs stuck at 99% for Days

Post by veremin »

orty229 wrote: May 24, 2023 1:49 pm Any updates for the issue with us with 11a?
The issue reported in this thread is caused by part of the code that has not existed in pre-v12 product versions. So even if the symptoms are similar, the causes must be completely different.

So I suggest you create your ticket (and forum thread as well) and provide the debug logs for further investigation to a support engineer.

Thanks!
DeanCTS
Service Provider
Posts: 19
Liked: 2 times
Joined: May 27, 2021 3:48 am
Full Name: Dean Anderson
Contact:

Re: Offloading Jobs stuck at 99% for Days

Post by DeanCTS »

RonanD wrote: May 17, 2023 11:38 am I think your job stay block on "cleaning"

i have had the same problem, solved on this case Case #04958240

[HKEY_LOCAL_MACHINE\SOFTWARE\Veeam\Veeam Backup and Replication]
"StgIndexCleanupTaskSize"=hex(b):00,28,00,00,00,00,00,00
"StgIndexEnableCache"=dword:00000001
"StgIndexUploadTaskSize"=hex(b):00,28,00,00,00,00,00,00


StgIndexCleanupTaskSize : 2800
StgIndexEnableCache : 1
StgIndexUploadTaskSize : 2800

You need more RAM and can double the value for increase the process
Could you let us know on what those numbers are based off? Specific amount of RAM or? What should I config in case of 8 virtual cores and 16GB of RAM?
orty229
Novice
Posts: 6
Liked: never
Joined: May 21, 2023 4:36 pm
Contact:

Re: Offloading Jobs stuck at 99% for Days

Post by orty229 »

veremin wrote: May 24, 2023 3:35 pm The R&D team believes that both the original issue was caused by unoptimized enumeration logic existing prior to 12.0.0.1420 P20230412.

Some of the object storage repository operations (offload, rescan, checkpoint deletion, etc.) relied on that mechanism and experienced performance degradation as a result.

The build 12.0.0.1420 P20230412 enhanced the procedure dramatically and got rid of extensive requests in a few places.

We recommend updating to the latest product version and seeing whether it solves the problem.

Thanks!
So 11a not supported now?
veremin
Product Manager
Posts: 20284
Liked: 2258 times
Joined: Oct 26, 2012 3:28 pm
Full Name: Vladimir Eremin
Contact:

Re: Offloading Jobs stuck at 99% for Days

Post by veremin »

Kindly, read my latest response. I feel I meant the opposite:
The issue reported in this thread is caused by part of the code that has not existed in pre-v12 product versions. So even if the symptoms are similar, the causes must be completely different.

So I suggest you create your own ticket (and forum thread as well) and provide the debug logs for further investigation to a support engineer.
Thanks for understanding.
orty229
Novice
Posts: 6
Liked: never
Joined: May 21, 2023 4:36 pm
Contact:

Re: Offloading Jobs stuck at 99% for Days

Post by orty229 »

So will "unoptimized enumeration logic existing prior to 12.0.0.1420 P20230412" be fixed in 11a that is supported? Or is it only a v12, kinda think some v11 have reported the same thing with Wasabi been pushing.
Gostev
Chief Product Officer
Posts: 31561
Liked: 6725 times
Joined: Jan 01, 2006 1:01 am
Location: Baar, Switzerland
Contact:

Re: Offloading Jobs stuck at 99% for Days

Post by Gostev »

As Vladimir explains, the issue discussed in this thread was first introduced in V12 and it is fixed in V12 P20230412. If you have a similar issue with V11a, please open a support case for investigation, as this would be something totally unrelated to OP's issue.
Post Reply

Who is online

Users browsing this forum: No registered users and 10 guests