Discussions related to using object storage as a backup target.
Post Reply
bg.ranken
Expert
Posts: 123
Liked: 21 times
Joined: Feb 18, 2015 8:13 pm
Full Name: Randall Kender
Contact:

Offload to Archive Jobs Stuck in Stopping State

Post by bg.ranken »

Case #07231435

I believe I have found a bug that's reproducible in the latest version (12.1.1.156).

We added the capacity tier to our current SoBR to upload data into Azure. While this data was uploading (it will take us many weeks to finish) we also added the Archive Tier as well. However, due to some missing permissions on the storage account not allowing access from the vnet where the archiver appliances are being deployed, they were giving HTTP error code 403, AuthorizationFailure.

This is not the actual bug though, the bug comes in how the Offload to Archive jobs handle this, and that they get stuck. For any servers that are still being uploaded with the SOBR Tiering job, the Offload to Archive job gets stuck on the "Locking backup chains on Capacity Tier". For servers that have finished uploading they will show the error and be marked as failed. But the other servers will stay on "Locking backup chains on Capacity Tier" forever, until the upload from the SOBR Tiering job is finished or canceled. If you attempt to manually stop the Offload to Archive job, the VM in the job will still stay in the Stopping state indefinitely with the "Locking backup chains on Capacity Tier" step until the SOBR Tiering is finished or stopped. The logs appear to show thousands of "Item [] is locked by running session" errors every minute so it seems to enter some loop that is preventing the stop command from initiating.

The reason this is a problem is that the Offload to Archive job has already spun up multiple proxy appliances in Azure. And the job will continue to start new Offload to Archive jobs every 8 hours. So you can end up with dozens (or hundreds) of Archiving Proxies within Azure, all running, and all generating costs. And since manually triggering a job to stop has no affect, the only solution is to stop the SOBR Tiering job, which then allows all the currently running Offload to Archive jobs to either fail out or stop successfully, which lets them remove all the proxy appliances within Azure.

I created a ticket for this before resolving ourselves because we weren't able to wait for a response, as we had gotten up to over 50 proxy appliances and costs were starting to pile up. I uploaded logs to the case in case anyone from R&D wants to review, but I can confirm this is repeatable.

The actual bug should be fixed to allow the job to be stopped manually without needing to stop the SOBR Tiering job. But I do think that if the job has no active Tiering VMs at the interval that the next Offload to Archive job starts, and it is just waiting to lock any backup chains, it should just cancel itself and release all the proxy appliances. The next job will re-create them anyways and queue the same VM for tiering. This would save costs as I could see large uploads in normal operations causing the same behavior.
Mildur
Product Manager
Posts: 8775
Liked: 2311 times
Joined: May 13, 2017 4:51 pm
Full Name: Fabian K.
Location: Switzerland
Contact:

Re: Offload to Archive Jobs Stuck in Stopping State

Post by Mildur »

Hi Randall

Thank you for reporting.
Let's give our support team some time to confirm the bug. We will escalate it to RnD if required.

Best,
Fabian
Product Management Analyst @ Veeam Software
Post Reply

Who is online

Users browsing this forum: No registered users and 2 guests