-
- Enthusiast
- Posts: 35
- Liked: 2 times
- Joined: Jun 23, 2011 3:11 pm
- Full Name: Jonathan Shapiro
- Contact:
Offloading Jobs stuck at 99% for Days
Hello:
I have a SOBR using Wasabi for the capacity tier. Recently, I swapped in a new storage bucket with object lock to set up immutability for my backups. Initial offloading took about a week to get my most recent backup chains into the capacity tier. What I've noticed, now that I'm mostly caught up, is that general offloading jobs still kick off and seem to progress nicely till they hit 99%. They seem essentially done, but they hang for days at 99% without transferring any more data, and I don't know what the system is doing. The jobs do close out eventually, but I'm left with more general offloading jobs kicking off and hanging for a similar amount of time. I'm on v12. What is going on? What logs should I check? Should I cancel these long running jobs?
I have a SOBR using Wasabi for the capacity tier. Recently, I swapped in a new storage bucket with object lock to set up immutability for my backups. Initial offloading took about a week to get my most recent backup chains into the capacity tier. What I've noticed, now that I'm mostly caught up, is that general offloading jobs still kick off and seem to progress nicely till they hit 99%. They seem essentially done, but they hang for days at 99% without transferring any more data, and I don't know what the system is doing. The jobs do close out eventually, but I'm left with more general offloading jobs kicking off and hanging for a similar amount of time. I'm on v12. What is going on? What logs should I check? Should I cancel these long running jobs?
-
- Product Manager
- Posts: 9848
- Liked: 2610 times
- Joined: May 13, 2017 4:51 pm
- Full Name: Fabian K.
- Location: Switzerland
- Contact:
Re: Offloading Jobs stuck at 99% for Days
Hi Jonathan
I strongly recommend to open a case with our customer support. We cannot solve such issues over this forum.
Without a case number this topic may be deleted by a moderator.
Best,
Fabian
I strongly recommend to open a case with our customer support. We cannot solve such issues over this forum.
Without a case number this topic may be deleted by a moderator.
Best,
Fabian
Product Management Analyst @ Veeam Software
-
- Enthusiast
- Posts: 35
- Liked: 2 times
- Joined: Jun 23, 2011 3:11 pm
- Full Name: Jonathan Shapiro
- Contact:
Re: Offloading Jobs stuck at 99% for Days
Thanks. I opened a case. Case #05968616
-
- Enthusiast
- Posts: 35
- Liked: 2 times
- Joined: Jun 23, 2011 3:11 pm
- Full Name: Jonathan Shapiro
- Contact:
Re: Offloading Jobs stuck at 99% for Days
5 days later, and all I heard was that my ticket would be sent to the object storage group. Nobody from that group contacted me.
-
- Service Provider
- Posts: 48
- Liked: 7 times
- Joined: Feb 20, 2023 9:28 am
- Full Name: Marco Glavas
- Contact:
Re: Offloading Jobs stuck at 99% for Days
All I can tell you right now is that we see similar things.
-
- Service Provider
- Posts: 48
- Liked: 7 times
- Joined: Feb 20, 2023 9:28 am
- Full Name: Marco Glavas
- Contact:
Re: Offloading Jobs stuck at 99% for Days
I think it's a grave oversight that some things are not displayed in the job transcripts... like checkpoint cleanups. You only get to see a message, when it fails. And since some of them take a day or more, they usually get interrupted by the next backup cycle.
I have no idea what that does to data integrity but I'm assuming we keep loosing literal days on our offloading on things like this.
I have no idea what that does to data integrity but I'm assuming we keep loosing literal days on our offloading on things like this.
-
- Product Manager
- Posts: 9848
- Liked: 2610 times
- Joined: May 13, 2017 4:51 pm
- Full Name: Fabian K.
- Location: Switzerland
- Contact:
Re: Offloading Jobs stuck at 99% for Days
Hi @jshapiro
I'm very sorry that you had to wait for two days.
Please let me know if it happens again.
You can also use the Escalate to Support Management option:
https://www.veeam.com/kb2320
Hi @EWMarco
Indeed. Session details don't display everything. There are a lot of background tasks which are only visible in our debug logs.
If you see the same issue, please open a support case and provide me with the case number. Thank you.
Best,
Fabian
I'm very sorry that you had to wait for two days.
Please let me know if it happens again.
You can also use the Escalate to Support Management option:
https://www.veeam.com/kb2320
Hi @EWMarco
Indeed. Session details don't display everything. There are a lot of background tasks which are only visible in our debug logs.
If you see the same issue, please open a support case and provide me with the case number. Thank you.
Best,
Fabian
Product Management Analyst @ Veeam Software
-
- Enthusiast
- Posts: 35
- Liked: 2 times
- Joined: Jun 23, 2011 3:11 pm
- Full Name: Jonathan Shapiro
- Contact:
Re: Offloading Jobs stuck at 99% for Days
I finally heard back from support, and their log analysis possibly revealed something. They noticed that when I upgraded from version 11 to 12, I was set in Direct Mode for connection type to the capacity tier. Support told me to change that to Connect through a gateway server, and then select my preferred gateway(s). In my case, I selected my storage server so it could offload directly to Wasabi over the Internet. I made this change late yesterday, and at the time I did it, I had two offload jobs stuck at 99%. Both wrapped up into the night, and another one started at 2:00 AM this morning and finished very quickly. Maybe this was the issue. Right now, my Veeam server is idle. I will continue to watch it.
-
- Product Manager
- Posts: 9848
- Liked: 2610 times
- Joined: May 13, 2017 4:51 pm
- Full Name: Fabian K.
- Location: Switzerland
- Contact:
Re: Offloading Jobs stuck at 99% for Days
Hi Jonathan
Thanks for the update.
Best,
Fabian
Thanks for the update.
Best,
Fabian
Product Management Analyst @ Veeam Software
-
- Enthusiast
- Posts: 35
- Liked: 2 times
- Joined: Jun 23, 2011 3:11 pm
- Full Name: Jonathan Shapiro
- Contact:
Re: Offloading Jobs stuck at 99% for Days
Seemed to be better for a couple of days, but now I have some offload jobs stuck at 99% again. I updated the ticket notes.
-
- Novice
- Posts: 5
- Liked: never
- Joined: Sep 25, 2014 9:40 pm
- Full Name: Kevin
- Contact:
Re: Offloading Jobs stuck at 99% for Days
same exact issue..opening ticket. Case # 06003117.
-
- Product Manager
- Posts: 20439
- Liked: 2310 times
- Joined: Oct 26, 2012 3:28 pm
- Full Name: Vladimir Eremin
- Contact:
Re: Offloading Jobs stuck at 99% for Days
Kindly post its number here, once it's opened. This way we can follow and assist the investigation. Thanks!
-
- Influencer
- Posts: 20
- Liked: 16 times
- Joined: Nov 07, 2022 4:48 pm
- Full Name: Nathan
- Contact:
Re: Offloading Jobs stuck at 99% for Days
Been seeing the same thing here ever since upgrading to V12 and switching S3 repo to Gateway per case 05878826. Offloads stuck at 99%, usually takes 24hrs+ to remove a single checkpoint from S3. Everything eventually goes through but very very slow. Seems to be a toss-up whether or not the offloads will complete quickly or stall every night.
-
- Enthusiast
- Posts: 56
- Liked: 6 times
- Joined: Jun 18, 2009 2:27 pm
- Full Name: Yves Smolders
- Contact:
Re: Offloading Jobs stuck at 99% for Days
I've got the same going on with V11.
A small server being offloaded to WASABI, something takes a long time for very small incrementals:
The delta's are truly only in the megabyte range (500mb to a few gigs at most) - usually the offload completes within minutes, sometimes it takes up to half an hour and one even took 10 hours.
in the logs I have repetitions of this:
[19.04.2023 10:54:51.040] < 17344> srv | Waiting for the next server command.
[19.04.2023 10:54:51.040] < 17344> srv | _______________________________________________________________________________
[19.04.2023 10:54:54.739] < 17344> srv | retrieved command: 154 (HandleRemoteArchClient(154))
[19.04.2023 10:54:54.739] < 17344> arh | Cleaning up storage blocks in archive
[19.04.2023 10:54:54.739] < 17344> arh | Using local client for archive repository '6996b00e-2a4b-43cd-8288-bfcb79831333'.
[19.04.2023 10:54:55.037] < 17344> srv | Command successfully processed, elapsed: 0.3020
[19.04.2023 10:54:55.037] < 17344> srv |
[19.04.2023 10:54:55.037] < 17344> srv | Waiting for the next server command.
[19.04.2023 10:54:55.037] < 17344> srv | _______________________________________________________________________________
[19.04.2023 10:54:57.184] < 17344> srv | retrieved command: 154 (HandleRemoteArchClient(154))
[19.04.2023 10:54:57.184] < 17344> arh | Cleaning up storage blocks in archive
[19.04.2023 10:54:57.184] < 17344> arh | Using local client for archive repository '6996b00e-2a4b-43cd-8288-bfcb79831333'.
[19.04.2023 10:54:57.353] < 17344> srv | Command successfully processed, elapsed: 0.1650
[19.04.2023 10:54:57.353] < 17344> srv |
[19.04.2023 10:54:57.353] < 17344> srv | Waiting for the next server command.
[19.04.2023 10:54:57.353] < 17344> srv | _______________________________________________________________________________
[19.04.2023 10:55:10.296] < 17344> srv | retrieved command: 154 (HandleRemoteArchClient(154))
[19.04.2023 10:55:10.296] < 17344> arh | Cleaning up storage blocks in archive
[19.04.2023 10:55:10.296] < 17344> arh | Using local client for archive repository '6996b00e-2a4b-43cd-8288-bfcb79831333'.
[19.04.2023 10:56:00.946] < 17344> srv | Command successfully processed, elapsed: 50.6590
[19.04.2023 10:56:00.946] < 17344> srv |
[19.04.2023 10:56:00.946] < 17344> srv | Waiting for the next server command.
[19.04.2023 10:56:00.946] < 17344> srv | _______________________________________________________________________________
[19.04.2023 10:56:01.942] < 17344> srv | retrieved command: 154 (HandleRemoteArchClient(154))
[19.04.2023 10:56:01.942] < 17344> arh | Cleaning up storage blocks in archive
[19.04.2023 10:56:01.942] < 17344> arh | Using local client for archive repository '6996b00e-2a4b-43cd-8288-bfcb79831333'.
Edit: about to open a case
A small server being offloaded to WASABI, something takes a long time for very small incrementals:
The delta's are truly only in the megabyte range (500mb to a few gigs at most) - usually the offload completes within minutes, sometimes it takes up to half an hour and one even took 10 hours.
in the logs I have repetitions of this:
[19.04.2023 10:54:51.040] < 17344> srv | Waiting for the next server command.
[19.04.2023 10:54:51.040] < 17344> srv | _______________________________________________________________________________
[19.04.2023 10:54:54.739] < 17344> srv | retrieved command: 154 (HandleRemoteArchClient(154))
[19.04.2023 10:54:54.739] < 17344> arh | Cleaning up storage blocks in archive
[19.04.2023 10:54:54.739] < 17344> arh | Using local client for archive repository '6996b00e-2a4b-43cd-8288-bfcb79831333'.
[19.04.2023 10:54:55.037] < 17344> srv | Command successfully processed, elapsed: 0.3020
[19.04.2023 10:54:55.037] < 17344> srv |
[19.04.2023 10:54:55.037] < 17344> srv | Waiting for the next server command.
[19.04.2023 10:54:55.037] < 17344> srv | _______________________________________________________________________________
[19.04.2023 10:54:57.184] < 17344> srv | retrieved command: 154 (HandleRemoteArchClient(154))
[19.04.2023 10:54:57.184] < 17344> arh | Cleaning up storage blocks in archive
[19.04.2023 10:54:57.184] < 17344> arh | Using local client for archive repository '6996b00e-2a4b-43cd-8288-bfcb79831333'.
[19.04.2023 10:54:57.353] < 17344> srv | Command successfully processed, elapsed: 0.1650
[19.04.2023 10:54:57.353] < 17344> srv |
[19.04.2023 10:54:57.353] < 17344> srv | Waiting for the next server command.
[19.04.2023 10:54:57.353] < 17344> srv | _______________________________________________________________________________
[19.04.2023 10:55:10.296] < 17344> srv | retrieved command: 154 (HandleRemoteArchClient(154))
[19.04.2023 10:55:10.296] < 17344> arh | Cleaning up storage blocks in archive
[19.04.2023 10:55:10.296] < 17344> arh | Using local client for archive repository '6996b00e-2a4b-43cd-8288-bfcb79831333'.
[19.04.2023 10:56:00.946] < 17344> srv | Command successfully processed, elapsed: 50.6590
[19.04.2023 10:56:00.946] < 17344> srv |
[19.04.2023 10:56:00.946] < 17344> srv | Waiting for the next server command.
[19.04.2023 10:56:00.946] < 17344> srv | _______________________________________________________________________________
[19.04.2023 10:56:01.942] < 17344> srv | retrieved command: 154 (HandleRemoteArchClient(154))
[19.04.2023 10:56:01.942] < 17344> arh | Cleaning up storage blocks in archive
[19.04.2023 10:56:01.942] < 17344> arh | Using local client for archive repository '6996b00e-2a4b-43cd-8288-bfcb79831333'.
Edit: about to open a case
-
- Enthusiast
- Posts: 56
- Liked: 6 times
- Joined: Jun 18, 2009 2:27 pm
- Full Name: Yves Smolders
- Contact:
Re: Offloading Jobs stuck at 99% for Days
Opened a case #06018563
-
- Influencer
- Posts: 22
- Liked: 4 times
- Joined: Dec 10, 2009 8:44 pm
- Full Name: Sam Journagan
- Contact:
Re: Offloading Jobs stuck at 99% for Days
Same issue, using Wasabi as well. Guess ill open a Ticket...
-
- Service Provider
- Posts: 2
- Liked: never
- Joined: Jun 29, 2021 8:22 pm
- Contact:
Re: Offloading Jobs stuck at 99% for Days
Same issue, but using Cloudian storage. I'll be opening a ticket as soon as I'm allowed to do so by Veeam, but in the meantime, watching this thread like a hawk...
-
- Enthusiast
- Posts: 35
- Liked: 2 times
- Joined: Jun 23, 2011 3:11 pm
- Full Name: Jonathan Shapiro
- Contact:
Re: Offloading Jobs stuck at 99% for Days
The Veeam engineer gave me some regedits to apply to the Veeam server to optimize for Wasabi. I applied these, and offload jobs seemed to run better for a few weeks. They are once again getting stuck at 99% for days and getting stacked up. I just opened another ticket. Here's the regedits I had applied:
New-ItemProperty -Path 'HKLM:\SOFTWARE\Veeam\Veeam Backup and Replication\' -Name 'S3ConcurrentTaskLimit' -Value "10" -PropertyType DWORD -Force
New-ItemProperty -Path 'HKLM:\SOFTWARE\Veeam\Veeam Backup and Replication\' -Name 'S3RequestTimeoutSec' -Value "900" -PropertyType DWORD -Force
New-ItemProperty -Path 'HKLM:\SOFTWARE\Veeam\Veeam Backup and Replication\' -Name 'S3RequestRetryTotalTimeoutSec' -Value "9000" -PropertyType DWORD -Force
New-ItemProperty -Path 'HKLM:\SOFTWARE\Veeam\Veeam Backup and Replication\' -Name 'S3ConcurrentTaskLimit' -Value "10" -PropertyType DWORD -Force
New-ItemProperty -Path 'HKLM:\SOFTWARE\Veeam\Veeam Backup and Replication\' -Name 'S3RequestTimeoutSec' -Value "900" -PropertyType DWORD -Force
New-ItemProperty -Path 'HKLM:\SOFTWARE\Veeam\Veeam Backup and Replication\' -Name 'S3RequestRetryTotalTimeoutSec' -Value "9000" -PropertyType DWORD -Force
-
- Service Provider
- Posts: 19
- Liked: 2 times
- Joined: May 27, 2021 3:48 am
- Full Name: Dean Anderson
- Contact:
-
- Novice
- Posts: 3
- Liked: 2 times
- Joined: Apr 26, 2023 5:32 pm
- Contact:
Re: Offloading Jobs stuck at 99% for Days
I'm having this same issue. Applied the same registry keys and the offloading does go through eventually, but they take an abnormally long time. The upload of the data will take 5-15 minutes, then it will set at 99% for all VMs for an hour or more.
-
- Lurker
- Posts: 1
- Liked: never
- Joined: May 17, 2023 11:28 am
- Contact:
Re: Offloading Jobs stuck at 99% for Days
I think your job stay block on "cleaning"
i have had the same problem, solved on this case Case #04958240
[HKEY_LOCAL_MACHINE\SOFTWARE\Veeam\Veeam Backup and Replication]
"StgIndexCleanupTaskSize"=hex(b):00,28,00,00,00,00,00,00
"StgIndexEnableCache"=dword:00000001
"StgIndexUploadTaskSize"=hex(b):00,28,00,00,00,00,00,00
StgIndexCleanupTaskSize : 2800
StgIndexEnableCache : 1
StgIndexUploadTaskSize : 2800
You need more RAM and can double the value for increase the process
i have had the same problem, solved on this case Case #04958240
[HKEY_LOCAL_MACHINE\SOFTWARE\Veeam\Veeam Backup and Replication]
"StgIndexCleanupTaskSize"=hex(b):00,28,00,00,00,00,00,00
"StgIndexEnableCache"=dword:00000001
"StgIndexUploadTaskSize"=hex(b):00,28,00,00,00,00,00,00
StgIndexCleanupTaskSize : 2800
StgIndexEnableCache : 1
StgIndexUploadTaskSize : 2800
You need more RAM and can double the value for increase the process
-
- Enthusiast
- Posts: 35
- Liked: 2 times
- Joined: Jun 23, 2011 3:11 pm
- Full Name: Jonathan Shapiro
- Contact:
Re: Offloading Jobs stuck at 99% for Days
I stripped out 1 of the 3 registry tweaks the original Veeam engineer provided for Wasabi S3 storage. The one I stripped limited S3 Concurrent Task Limit to 10. This caused offloading to run way too slowly. Not enough offload threads running. I opened another support ticket with Veeam to complain that offload jobs were back to getting stuck at 99%. The job would get stuck with specific VM's within the backup job getting stuck deleting checkpoints. At this stage, it could take days for some of them to process, and there wasn't a lot of visual feedback anything is happening. Anyway, the solution was to update Veeam to 12.0.0.1420_20230412. This update includes a number of fixes including jobs taking a long time to delete checkpoints. With the S3 Concurrent Task Limit tweak removed, and with the update, things have been good.
-
- Novice
- Posts: 6
- Liked: never
- Joined: May 21, 2023 4:36 pm
- Contact:
Re: Offloading Jobs stuck at 99% for Days
Any updates for the issue with us with 11a?
-
- Product Manager
- Posts: 20439
- Liked: 2310 times
- Joined: Oct 26, 2012 3:28 pm
- Full Name: Vladimir Eremin
- Contact:
Re: Offloading Jobs stuck at 99% for Days
The R&D team believes that the original issue was caused by unoptimized enumeration logic existing in v12 prior to the latest build (12.0.0.1420 P20230412).
Some of the object storage repository operations (offload, rescan, checkpoint deletion, etc.) relied on that mechanism and experienced performance degradation as a result.
The build 12.0.0.1420 P20230412 enhanced the procedure dramatically and got rid of extensive requests in a few places.
We recommend updating to the latest product version and seeing whether it solves the problem.
Thanks!
Some of the object storage repository operations (offload, rescan, checkpoint deletion, etc.) relied on that mechanism and experienced performance degradation as a result.
The build 12.0.0.1420 P20230412 enhanced the procedure dramatically and got rid of extensive requests in a few places.
We recommend updating to the latest product version and seeing whether it solves the problem.
Thanks!
-
- Product Manager
- Posts: 20439
- Liked: 2310 times
- Joined: Oct 26, 2012 3:28 pm
- Full Name: Vladimir Eremin
- Contact:
Re: Offloading Jobs stuck at 99% for Days
The issue reported in this thread is caused by part of the code that has not existed in pre-v12 product versions. So even if the symptoms are similar, the causes must be completely different.
So I suggest you create your ticket (and forum thread as well) and provide the debug logs for further investigation to a support engineer.
Thanks!
-
- Service Provider
- Posts: 19
- Liked: 2 times
- Joined: May 27, 2021 3:48 am
- Full Name: Dean Anderson
- Contact:
Re: Offloading Jobs stuck at 99% for Days
Could you let us know on what those numbers are based off? Specific amount of RAM or? What should I config in case of 8 virtual cores and 16GB of RAM?RonanD wrote: ↑May 17, 2023 11:38 am I think your job stay block on "cleaning"
i have had the same problem, solved on this case Case #04958240
[HKEY_LOCAL_MACHINE\SOFTWARE\Veeam\Veeam Backup and Replication]
"StgIndexCleanupTaskSize"=hex(b):00,28,00,00,00,00,00,00
"StgIndexEnableCache"=dword:00000001
"StgIndexUploadTaskSize"=hex(b):00,28,00,00,00,00,00,00
StgIndexCleanupTaskSize : 2800
StgIndexEnableCache : 1
StgIndexUploadTaskSize : 2800
You need more RAM and can double the value for increase the process
-
- Novice
- Posts: 6
- Liked: never
- Joined: May 21, 2023 4:36 pm
- Contact:
Re: Offloading Jobs stuck at 99% for Days
So 11a not supported now?veremin wrote: ↑May 24, 2023 3:35 pm The R&D team believes that both the original issue was caused by unoptimized enumeration logic existing prior to 12.0.0.1420 P20230412.
Some of the object storage repository operations (offload, rescan, checkpoint deletion, etc.) relied on that mechanism and experienced performance degradation as a result.
The build 12.0.0.1420 P20230412 enhanced the procedure dramatically and got rid of extensive requests in a few places.
We recommend updating to the latest product version and seeing whether it solves the problem.
Thanks!
-
- Product Manager
- Posts: 20439
- Liked: 2310 times
- Joined: Oct 26, 2012 3:28 pm
- Full Name: Vladimir Eremin
- Contact:
Re: Offloading Jobs stuck at 99% for Days
Kindly, read my latest response. I feel I meant the opposite:
Thanks for understanding.The issue reported in this thread is caused by part of the code that has not existed in pre-v12 product versions. So even if the symptoms are similar, the causes must be completely different.
So I suggest you create your own ticket (and forum thread as well) and provide the debug logs for further investigation to a support engineer.
-
- Novice
- Posts: 6
- Liked: never
- Joined: May 21, 2023 4:36 pm
- Contact:
Re: Offloading Jobs stuck at 99% for Days
So will "unoptimized enumeration logic existing prior to 12.0.0.1420 P20230412" be fixed in 11a that is supported? Or is it only a v12, kinda think some v11 have reported the same thing with Wasabi been pushing.
-
- Chief Product Officer
- Posts: 31836
- Liked: 7328 times
- Joined: Jan 01, 2006 1:01 am
- Location: Baar, Switzerland
- Contact:
Re: Offloading Jobs stuck at 99% for Days
As Vladimir explains, the issue discussed in this thread was first introduced in V12 and it is fixed in V12 P20230412. If you have a similar issue with V11a, please open a support case for investigation, as this would be something totally unrelated to OP's issue.
Who is online
Users browsing this forum: yusukea and 11 guests