Agent-based backup of Windows, Linux, Max, AIX and Solaris machines.
Post Reply
bg.ranken
Expert
Posts: 122
Liked: 21 times
Joined: Feb 18, 2015 8:13 pm
Full Name: Randall Kender
Contact:

Windows Agent Backup Job Stuck

Post by bg.ranken »

So I think I found a potential bug with V11 that I'm hoping could be reported somewhere. This isn't the first time it happened.

One of our agent backup jobs appears to have gotten stuck on one of the servers on the Creating VSS Snapshot step. Looking through the task logs on the B&R server they looked like this:

Code: Select all

[01.04.2021 11:02:09] <69> Info         Checking remote agent is alive...
[01.04.2021 11:04:09] <69> Info         Checking remote agent is alive...
[01.04.2021 11:06:09] <69> Info         Checking remote agent is alive...
[01.04.2021 11:08:09] <69> Info         Checking remote agent is alive...
[01.04.2021 11:10:09] <69> Info         Checking remote agent is alive...
And that had been going on for about 20 hours worth. We decided to try to cancel the job but then the job just entered the stopping state but never was able to stop. The task logs around the stop looked like this:

Code: Select all

[01.04.2021 11:42:10] <69> Info         Checking remote agent is alive...
[01.04.2021 11:44:10] <69> Info         Checking remote agent is alive...
[01.04.2021 11:46:10] <69> Info         Checking remote agent is alive...
[01.04.2021 11:48:10] <69> Info         Checking remote agent is alive...
[01.04.2021 11:50:10] <69> Info         Checking remote agent is alive...
[01.04.2021 11:52:10] <69> Info         Stop signal has been received
[01.04.2021 11:52:10] <69> Info         Stop signal has been received
[01.04.2021 11:52:10] <69> Info         Stop signal has been received
[01.04.2021 11:52:10] <69> Info         Stop signal has been received
[01.04.2021 11:52:10] <69> Info         [EpAgentSource] Stopping managed backup session '105d746b-e1f7-49f1-97f4-9b8204a45d80' with reason 'Unknown'
[01.04.2021 11:52:10] <69> Info         Checking remote agent is alive...
[01.04.2021 11:54:10] <69> Info         Checking remote agent is alive...
[01.04.2021 11:56:10] <69> Info         Checking remote agent is alive...
[01.04.2021 11:58:10] <69> Info         Checking remote agent is alive...
[01.04.2021 12:00:10] <69> Info         Checking remote agent is alive...
[01.04.2021 12:02:10] <69> Info         Checking remote agent is alive...
[01.04.2021 12:04:10] <69> Info         Checking remote agent is alive...
The job never did stop, eventually reaching a 40 hour run-time, with the task log continuously reporting "Checking remote agent is alive...". It continued to be stopping into the next night which means that other servers in the job did not receive any backups the next night as well. And more worrisome is that there's no notification of any kind when a job gets into this state and manual intervention is required.

We did open a case (04733596) and were initially told to restart the Veeam services to clear out the job. While I know that fixes the issue as we had to do it in the past it seemed like too much of a nuclear option, so I called in. Got one of the techs to show me how to look through the task log to find the PID of the backup process and kill it via task manager, and that eventually let the job finish stopping.

So I guess what I'm trying to say is this seems like a bug that's in the latest version of V11 as we have the CU installed. Worst case I would say that if you try to manually cancel a job or have a backup window to terminate the job it should be able to do it without having to kill the process.

Can someone look into this and get it submitted as an actual bug?
HannesK
Product Manager
Posts: 14314
Liked: 2889 times
Joined: Sep 01, 2014 11:46 am
Full Name: Hannes Kasparick
Location: Austria
Contact:

Re: Windows Agent Backup Job Stuck

Post by HannesK »

Hello,
please understand, that the forum is run by product management.

Bugs are submitted via support. If you ask the support engineer to give you a bug number, then you can be sure the bug is in the system.

According to the case comments, you overloaded your resources with too many tasks configured. So it might be a configuration issue.

Best regards,
Hannes
bg.ranken
Expert
Posts: 122
Liked: 21 times
Joined: Feb 18, 2015 8:13 pm
Full Name: Randall Kender
Contact:

Re: Windows Agent Backup Job Stuck

Post by bg.ranken »

Thanks for the reply Hannes.

Yes we have followed recommendations from the case to adjust performance, however from a resource perspective the server was doing no other tasks when the agent job was canceled, so it not being able to cancel seems like a bug. I was requested by support to create a post here by the support rep to gain visibility.
Dima P.
Product Manager
Posts: 14415
Liked: 1576 times
Joined: Feb 04, 2013 2:07 pm
Full Name: Dmitry Popov
Location: Prague
Contact:

Re: Windows Agent Backup Job Stuck

Post by Dima P. »

Randall,

Thank you for your post. We'll discuss your case with QA and support team.
bg.ranken
Expert
Posts: 122
Liked: 21 times
Joined: Feb 18, 2015 8:13 pm
Full Name: Randall Kender
Contact:

Re: Windows Agent Backup Job Stuck

Post by bg.ranken »

Thank you!
scrat
Influencer
Posts: 19
Liked: 2 times
Joined: Jul 26, 2022 5:14 pm
Contact:

Re: Windows Agent Backup Job Stuck

Post by scrat »

Hello,

Is there a solution to this issue? I have a similar case, the job doesn't stop and stays in "Checking remote agent is alive..."

Thanks,
Regards.
scrat
Influencer
Posts: 19
Liked: 2 times
Joined: Jul 26, 2022 5:14 pm
Contact:

Re: Windows Agent Backup Job Stuck

Post by scrat » 1 person likes this post

Solved, I had to configure the hosts file of the agents to resolve master server
Post Reply

Who is online

Users browsing this forum: No registered users and 6 guests