We've got a customer Veeam Agent backing up over a remote satellite link. Most of the time the job behaves properly but we've now had a couple of occasions where the job has gotten stuck at 99% and needs to be manually terminated (or the machine restarted) and then re-run the task.
This is a concern to us as we don't obviously see that the job has stalled until we start getting backup overdue warnings in VCSP. We've been through support (Case #6070057) and they suggested submitting a feature request so here it is.
FEATURE REQUEST - Can something be added to both Veeam Agent and VBR to detect when a backup or duplicate job has made no progress for an extended period of time and send a notification or even an option to automatically terminate and restart the job.
			
			
									
						
										
						- 
				GrandAdmiral
- Service Provider
- Posts: 19
- Liked: 6 times
- Joined: May 23, 2016 1:56 pm
- Full Name: Benjamin Chennells-Webb
- Contact:
- 
				BackupBytesTim
- Service Provider
- Posts: 507
- Liked: 124 times
- Joined: Apr 29, 2022 2:41 pm
- Full Name: Tim
- Contact:
Re: FEATURE REQUEST - Detect stalled backup task
I have also seen "stalled" backup jobs, though to my knowledge it wasn't on a satellite connection, but I would also think such a feature is useful. 
Unfortunately to my understanding, and Veeam can correct me if I'm wrong, but there's no control of the job from the receiving VBR server or any associated repository that could restart the job or even terminate it. Control like that comes from the VSPC only, but the VSPC has no knowledge of job status until it finishes, so there'd be no server side knowledge of "this job has been at 99% for 9 hours" for instance.
Also unfortunately in my case the only resolution I had was restarting the Backup Agent as no attempt at properly stopping the running job actually worked. Fortunately I could do that remotely from the VSPC via the Management Agent.
It's not a perfect solution, but there's a VSPC alarm for a job running longer than a configured "maximum" time, which has been helpful in determining some such stalled jobs. Unfortunately this also gets triggered if a job starts, then the agent goes offline so the VSPC server thinks the job is still running until the agent comes back online and reports a different status, so it's certainly not perfect, but if you're not using that already and the job has a fairly consistent runtime it might be a good short term solution.
			
			
									
						
										
						Unfortunately to my understanding, and Veeam can correct me if I'm wrong, but there's no control of the job from the receiving VBR server or any associated repository that could restart the job or even terminate it. Control like that comes from the VSPC only, but the VSPC has no knowledge of job status until it finishes, so there'd be no server side knowledge of "this job has been at 99% for 9 hours" for instance.
Also unfortunately in my case the only resolution I had was restarting the Backup Agent as no attempt at properly stopping the running job actually worked. Fortunately I could do that remotely from the VSPC via the Management Agent.
It's not a perfect solution, but there's a VSPC alarm for a job running longer than a configured "maximum" time, which has been helpful in determining some such stalled jobs. Unfortunately this also gets triggered if a job starts, then the agent goes offline so the VSPC server thinks the job is still running until the agent comes back online and reports a different status, so it's certainly not perfect, but if you're not using that already and the job has a fairly consistent runtime it might be a good short term solution.
- 
				GrandAdmiral
- Service Provider
- Posts: 19
- Liked: 6 times
- Joined: May 23, 2016 1:56 pm
- Full Name: Benjamin Chennells-Webb
- Contact:
Re: FEATURE REQUEST - Detect stalled backup task
Sounds right, but I was suggesting this feature be added to the client-side VBR or Agent software rather than the Cloud Provider sideUnfortunately to my understanding, and Veeam can correct me if I'm wrong, but there's no control of the job from the receiving VBR server or any associated repository that could restart the job or even terminate it. Control like that comes from the VSPC only, but the VSPC has no knowledge of job status until it finishes, so there'd be no server side knowledge of "this job has been at 99% for 9 hours" for instance.

We're the customer's IT provider as well as Service Provider so this request was more relevant to the former perspective than the latter (since at the end of the day its a customer-side problem). Even an email notification would be acceptable.
- 
				BackupBytesTim
- Service Provider
- Posts: 507
- Liked: 124 times
- Joined: Apr 29, 2022 2:41 pm
- Full Name: Tim
- Contact:
Re: FEATURE REQUEST - Detect stalled backup task
That does make sense then, since the VSPC Management Agent already reports the Backup Agent status, to the extent of "Running",  to the VSPC server that could probably a good place to implement it. I'd be concerned about implementing it as a feature in the Backup Agent exclusively as it's not unusual for us to see the Backup Agent itself appear stalled, being a service provider I have limited access to the source computers of a backup, but based on various symptoms and what solutions have worked to fix a problem, sometimes the only option I've had is just restarting the Backup Agent service, when the Backup Agent wouldn't respond to attempts to stop a job for instance.
Of course implementing it as part of the Backup Agent software would also allow its use outside of the VSPC Management Agent being installed, like where there's not service provider, so I can also see the appeal in that.
			
			
									
						
										
						Of course implementing it as part of the Backup Agent software would also allow its use outside of the VSPC Management Agent being installed, like where there's not service provider, so I can also see the appeal in that.
Who is online
Users browsing this forum: No registered users and 5 guests