-
- Service Provider
- Posts: 507
- Liked: 124 times
- Joined: Apr 29, 2022 2:41 pm
- Full Name: Tim
- Contact:
Automatic Retries
I'd like to submit a feature request for the "Workstation" license on Windows Agents to have automatic retries like the "Server" license has. Since it's already a fully developed feature I wouldn't think it'd be hard to implement, and it would certainly help a lot of my customers if the agent would retry without waiting until the next scheduled job. We get a lot of failures due to connection issues that get resolved seconds or minutes later, and if someone runs the job again immediately, it'll work fine, but that takes time for a person to do, and it'd be nice if Veeam could just do it automatically like other backup software does.
-
- Product Manager
- Posts: 15598
- Liked: 3445 times
- Joined: Sep 01, 2014 11:46 am
- Full Name: Hannes Kasparick
- Location: Austria
- Contact:
Re: Automatic Retries
Hello,
that feature is already available (just not configurable). Workstation edition retries automatically every 10min for the next 23h
Best regards,
Hannes
that feature is already available (just not configurable). Workstation edition retries automatically every 10min for the next 23h
Best regards,
Hannes
-
- Service Provider
- Posts: 507
- Liked: 124 times
- Joined: Apr 29, 2022 2:41 pm
- Full Name: Tim
- Contact:
Re: Automatic Retries
That sounds like what I want, but if a connection issue results in a failed job with an error message, does it not retry then? It seems to not be retrying in my experience.
I have 3 separate support cases opened for various connection issues, basically when the connection is lost (possibly dependent on the cause of the lost connection) Veeam gives a variety of different error messages. And unless this is another place where Veeam's "retry" procedure is not the same as what happens when a job starts normally, then it seems it is not actually trying again after the error message is produced. As it works fine if I run it manually after the error message is produced.
For slightly easier troubleshooting, if an error is produced and job is listed as Failed, with a "Last Run" time when the job ran, should a retry cause the "Last Run" time to be updated or will it still just show the time when the job originally ran? If it should be updated to show the last retry time, then I can confirm easily that it is not actually retrying.
I have 3 separate support cases opened for various connection issues, basically when the connection is lost (possibly dependent on the cause of the lost connection) Veeam gives a variety of different error messages. And unless this is another place where Veeam's "retry" procedure is not the same as what happens when a job starts normally, then it seems it is not actually trying again after the error message is produced. As it works fine if I run it manually after the error message is produced.
For slightly easier troubleshooting, if an error is produced and job is listed as Failed, with a "Last Run" time when the job ran, should a retry cause the "Last Run" time to be updated or will it still just show the time when the job originally ran? If it should be updated to show the last retry time, then I can confirm easily that it is not actually retrying.
-
- Service Provider
- Posts: 507
- Liked: 124 times
- Joined: Apr 29, 2022 2:41 pm
- Full Name: Tim
- Contact:
Re: Automatic Retries
To consolidate things a bit for diagnosing purposes, I'll follow up your post from veeam-cloud-providers-forum-f34/connect ... 87778.html over here.
Thanks for checking on that. The majority of the error messages I see that I deduce are all just some form of lost connection either say "Task failed unexpectedly", "Job aborted due to server termination", or some sort of DNS lookup failed message.
Basically my attempt to understand what's going on, without access to Veeam's source code or anything, my assumption is first that, as you mentioned, the mid-job "retry" process is not identical to the new-job "start" process. And that the majority of the issues either occur during the "retry" process or that the "retry" process isn't running.
Because I have seen at least a couple times where a job basically stalled and was perpetually retrying and failing (based on reading the logs) I'm assuming that the majority of my issues come in where there's a connection error, but then no retry is attempted at all. The issues that I know occur during an actual "retry" process are actually not very common, usually my issues are that, whether it retried at all or not, at some point an error occurred and it stopped retrying (or possibly didn't even retry once).
Unfortunately support was unable to troubleshoot any of my connection issues using the logs provided by the Veeam software and requested additional information from the customer in all the cases I actually opened (there were more occurrences of problems, but I stopped opening new cases as I wasn't expecting support to do anything different). More info on that here: veeam-service-provider-console-f42/agen ... 87823.html
If we do believe there may be some issue with the retries simply not occurring when the connection is lost then I'm good waiting a bit to see what you find. For the time being I just run through and click "Start" on a bunch of jobs each day when I see the last scheduled run failed, for now that's a simpler solution than contacting the 6 affected customers to get them to download a script, figure out running it with an added path as an argument to specify a file save location, and then figure out uploading the resulting file to us somewhere. If our customers were bigger businesses with full-time IT staff that probably wouldn't be as much of a challenge, but most of our customers are small businesses without any proper IT staff, and we have no software to access their computers with to do something like running a script whenever we want.
I also considered the possibility that somewhere in Veeam's error handling process, the software chooses certain error messages to ignore the normal retry procedure on. So maybe the bigger issue is that, despite all looking like a temporary loss of internet connection, Veeam is thinking that there's a bigger reason for the error, which could also explain why it seems to spit out one of several messages at random when the connection is lost. Like it never says "Connection lost" it's always something else like "Job aborted due to server termination" which, obviously there's nothing on our servers that we know of that's invoking some sort of "Stop" or "End" command on jobs. To my knowledge there's nothing on a VCC server that actually ends a job (because I've asked about it before as I've had computers where the backup agent was online but the management offline, so they would continue backing for several days after being removed from a customer's account in the VSPC, and the best I got was to temporarily revoke access to the account performing the backup job).
I'll add part of my reluctance to work with all the customers to resolve the issues is because it does always, literally every single time, work fine when I just click "Start" on a failed job. So even if gathering additional information from the customer's environment shows something like the computer lost the connection, ultimately Veeam should be retrying after a lost connection, so it shouldn't matter if something else on the computer indicates the computer was completely disconnected from the internet. Similarly for any other scenario, whatever happened doesn't persist. Whether the connection was interrupted by security software, the Veeam service crashing, a network connection being lost, whatever it was wouldn't matter if Veeam actually tried again after the error was produced, but as I mentioned it seems to just sit there until the next scheduled backup time, which for some of our customers may be 2 days or more away (some only backup every other day and some don't back up on weekends) (while I encourage backups as regularly as possible to reduce the incremental size of each backup and ensure as up to date information as possible, some customers have very slow upload speeds and a running backup job interferes with their ability to work).
In any case, I look forward to getting it figured out. Thanks for looking into it.
Thanks for checking on that. The majority of the error messages I see that I deduce are all just some form of lost connection either say "Task failed unexpectedly", "Job aborted due to server termination", or some sort of DNS lookup failed message.
Basically my attempt to understand what's going on, without access to Veeam's source code or anything, my assumption is first that, as you mentioned, the mid-job "retry" process is not identical to the new-job "start" process. And that the majority of the issues either occur during the "retry" process or that the "retry" process isn't running.
Because I have seen at least a couple times where a job basically stalled and was perpetually retrying and failing (based on reading the logs) I'm assuming that the majority of my issues come in where there's a connection error, but then no retry is attempted at all. The issues that I know occur during an actual "retry" process are actually not very common, usually my issues are that, whether it retried at all or not, at some point an error occurred and it stopped retrying (or possibly didn't even retry once).
Unfortunately support was unable to troubleshoot any of my connection issues using the logs provided by the Veeam software and requested additional information from the customer in all the cases I actually opened (there were more occurrences of problems, but I stopped opening new cases as I wasn't expecting support to do anything different). More info on that here: veeam-service-provider-console-f42/agen ... 87823.html
If we do believe there may be some issue with the retries simply not occurring when the connection is lost then I'm good waiting a bit to see what you find. For the time being I just run through and click "Start" on a bunch of jobs each day when I see the last scheduled run failed, for now that's a simpler solution than contacting the 6 affected customers to get them to download a script, figure out running it with an added path as an argument to specify a file save location, and then figure out uploading the resulting file to us somewhere. If our customers were bigger businesses with full-time IT staff that probably wouldn't be as much of a challenge, but most of our customers are small businesses without any proper IT staff, and we have no software to access their computers with to do something like running a script whenever we want.
I also considered the possibility that somewhere in Veeam's error handling process, the software chooses certain error messages to ignore the normal retry procedure on. So maybe the bigger issue is that, despite all looking like a temporary loss of internet connection, Veeam is thinking that there's a bigger reason for the error, which could also explain why it seems to spit out one of several messages at random when the connection is lost. Like it never says "Connection lost" it's always something else like "Job aborted due to server termination" which, obviously there's nothing on our servers that we know of that's invoking some sort of "Stop" or "End" command on jobs. To my knowledge there's nothing on a VCC server that actually ends a job (because I've asked about it before as I've had computers where the backup agent was online but the management offline, so they would continue backing for several days after being removed from a customer's account in the VSPC, and the best I got was to temporarily revoke access to the account performing the backup job).
I'll add part of my reluctance to work with all the customers to resolve the issues is because it does always, literally every single time, work fine when I just click "Start" on a failed job. So even if gathering additional information from the customer's environment shows something like the computer lost the connection, ultimately Veeam should be retrying after a lost connection, so it shouldn't matter if something else on the computer indicates the computer was completely disconnected from the internet. Similarly for any other scenario, whatever happened doesn't persist. Whether the connection was interrupted by security software, the Veeam service crashing, a network connection being lost, whatever it was wouldn't matter if Veeam actually tried again after the error was produced, but as I mentioned it seems to just sit there until the next scheduled backup time, which for some of our customers may be 2 days or more away (some only backup every other day and some don't back up on weekends) (while I encourage backups as regularly as possible to reduce the incremental size of each backup and ensure as up to date information as possible, some customers have very slow upload speeds and a running backup job interferes with their ability to work).
In any case, I look forward to getting it figured out. Thanks for looking into it.
-
- Product Manager
- Posts: 15598
- Liked: 3445 times
- Joined: Sep 01, 2014 11:46 am
- Full Name: Hannes Kasparick
- Location: Austria
- Contact:
Re: Automatic Retries
thanks for the summary. We are putting the different cases together in R&D, but it will probably take some time to come up with an answer.
-
- Service Provider
- Posts: 507
- Liked: 124 times
- Joined: Apr 29, 2022 2:41 pm
- Full Name: Tim
- Contact:
Re: Automatic Retries
I've been having additional issues, including today a DNS error that occurred on a VBR server (VM backup job) and actually said "this is usually a temporary issue", but did not retry. Any update on diagnosing the issue? If it's still being worked on, that's okay. Just making sure it doesn't get forgotten about.
-
- Product Manager
- Posts: 15598
- Liked: 3445 times
- Joined: Sep 01, 2014 11:46 am
- Full Name: Hannes Kasparick
- Location: Austria
- Contact:
Re: Automatic Retries
Hello,
no real news. The log investigations did not show anything that matches the descriptions from the forums. The number of cases where topics are mixed are also making things complicated. So it will take time.
Sorry for the delay,
Hannes
no real news. The log investigations did not show anything that matches the descriptions from the forums. The number of cases where topics are mixed are also making things complicated. So it will take time.
Sorry for the delay,
Hannes
-
- Service Provider
- Posts: 507
- Liked: 124 times
- Joined: Apr 29, 2022 2:41 pm
- Full Name: Tim
- Contact:
Re: Automatic Retries
Okay, I can certainly upload new logs if it helps to isolate anything. I do see these sort of connection issues that resolve themselves and then don't retry the backup automatically almost every day, so while I have no way to actively reproduce the problem, I can certainly get logs fairly quickly if needed.
Who is online
Users browsing this forum: No registered users and 2 guests