I have a job that keeps failing to backup to my cloud provider with the error message shown below. After having my ticket moved to Tier 2 I was told there is a time limit of 7 days before the backup job is canceled. Please see his notes below, I am also having an issue with the job resuming where it left off. The job will restart, delete the files, and make a fresh backup. At this point since the job won't resume and there is no work around for the time limit I can't backup my server !
Note added: 11/24/2020 9:38:57 PM :: Error: Application is shutting down.
Failed to upload disk.
Agent failed to process method {DataTransfer.SyncDisk}.
Exception from server: Application is shutting down.
Unable to retrieve next block transmission command. Number of already processed blocks: [75473927].
Failed to download disk '26aaf4e7-c773-4c00-9293-af13da80a992'.
[17.11.2020 21:34:09] <01> Info Job session is running in full mode
[17.11.2020 21:34:01] <01> Info Starting job. IsRetryMode: 'False'
[24.11.2020 21:34:28] <22> Info Job progress: '50%', '20,565,667,028,480' of '40,558,210,784,768' bytes, object '0' of '1'
[24.11.2020 21:34:34] <10> Error Failed to connect to agent's endpoint '127.0.0.1:2500'. Host: 'STOR01'.
We have a coded timeout of 7 days. You are reaching that before it could complete.The resume will help with this, but the better option is the possibility of seeding.
[b]Because at this rate, the job won't complete for over 2 weeks, so seed would be better if that's possible.
However, if that's not possible, we will have to rely on the retry. This hard-coded timeout cannot be increased.[/b]
Can you please clarify what Veeam Agent for Windows version you are using? How large is the source data set and what is the average throughput during backup?
here is a time limit of 7 days before the backup job is canceled
That's correct, single job run is currently limited to 7 days.
I am also having an issue with the job resuming where it left off.
Resume works within single job run and it triggered by backup job retry logic, have you noticed any retries being performed during 7 day job run? Thanks!
I am doing a 30T backup job, Veeam agent 4.0.1.2169, average throughput 35MB . What is the purpose of the limit? Takes a problem we don't have and doesn't even solve it. As far as resuming goes, how would that take place when it hits the 7 day limit ? As a matter of fact, I've had another reason for the job to fail two days in and the auto resume process did not take place. I've sent so many log samples of tired of it, just need it fixed. REMOVE THE HARD LIMIT!
We are researching the possibility to add the registry to override this limit but the problem wont go away - looks like the entire backup set is large for the average throughput. Any chance you can set up a local repository on the agent side and use backup copy to cloud connect instead (i.e. backup copy to Veeam B&R which could be located at the cloud connect site)?
kubimike wrote: ↑Nov 25, 2020 11:11 pmTakes a problem we don't have and doesn't even solve it.
The problem it solves is backup job just handing forever, thus never sending notifications, and user not knowing they have no backups running as the result. If a patient does not come out of his hospital ward after 7 days, you really don't want to be assuming she is fine. 7 days is when we say "enough is enough".
@Dima P. I think we can simply use a different timeout for full backups.
not sure what you’re talking about the job never hangs. you can see from the vbr console it’s running and not stuck. this needs to get fixed I can’t run backups. the only reason it fails is because of the time limit. this issue from what I can see has been going on since 2017
Dima P. wrote: ↑Nov 26, 2020 8:15 pm
We are researching the possibility to add the registry to override this limit but the problem wont go away - looks like the entire backup set is large for the average throughput. Any chance you can set up a local repository on the agent side and use backup copy to cloud connect instead (i.e. backup copy to Veeam B&R which could be located at the cloud connect site)?
the problem isn’t speed to the internet it’s the underlying storage. So If I did do it locally it probably won’t finish, from whatI can see a full will take 12 days
kubimike wrote: ↑Nov 28, 2020 12:58 pmnot sure what you’re talking about the job never hangs.
It may not hang in your or most other infrastructures, but some infrastructures have issues which may cause the job to hang (or slow down to the crawl so that they never complete). Also, while you may not have such issues now, something can break 1 year later in your environment, causing the job to hang then.
kubimike wrote: ↑Nov 28, 2020 12:58 pmyou can see from the vbr console it’s running and not stuck.
I'm happy that you realize the importance of checking on your jobs periodically, but many people simply never open the VBR console again after the initial set up... up until they need to restore.
I agree that this hard limit is very frustrating in some scenarios. We have had similar issues with this hard coded timeout when managing agents via VSPC (not VBR), low upload connections and do not have the option to seed the backups.
Unfortunately in this instance resume does not work, instead the initial incomplete backup data is deleted from CloudConnect when the job restarts and it starts from scratch all over again (exactly as kubimike noted happens with VBR).
If there was an override key to push it past the hard coded timeout of 7 days that would be great - we could do the initial full then remove the key. Better yet, just let the job keep running but raise a warning that the job is still running after 7 days like you can do in VPSC (this would be my preference as you don't have to fail, apply reg key & start all over again).
Another option would be to fix the resume functionality so that it can actually resume after an artificial timeout. If it has uploaded 500GB and timed out, then fail, restart the backup but keep the existing 500GB of blocks already sent and start sending the rest/changes etc (same as backup copy from VBR).
Your post was deleted because it was redundant: all it did was asking me again to provide an update, and quoting the previous post in its entirety again. Since I provided an update, there was no reason to keep the post, especially due to it containing a huge repeat quote. We're just trying to keep this forum clean and easily readable. Thanks!
let me make it more clear, will running the job in stand alone allow resume to work? I can't do my backups, I pay for support Im asking for help you delete my post. It was not a redundant post
kubimike wrote: ↑Dec 02, 2020 3:58 pmI can't do my backups, I pay for support Im asking for help
Well, as explained when you click New Topic, this is not a support forum. It has a different purpose, and it is maintained accordingly to match this specific purpose. If you have some technical issues doing your backups and need help, you should open a support case (which is the service you're paying Veeam for). Thanks!
Now you should just wait for our support to assist you with the issue.
I'm sorry that I did not see a support case ID in your post (this is an important information for us to have), perhaps it was buried in that big quote - because it was certainly not in the post itself, which only had like 3 words including my nickname.
Support can't offer anymore help. Im coming to you, Ive come to you before to solve issues. You solved the ReFS issue, I can't do backups Ive asked several times now a few questions which you still haven't answered. This ticket has been open for a very long time. I have unprotected data that needs backup. Lets get the registry key to address this problem.
Yes, as both myself and Dima have already said above, we will certainly consider implementing the registry key in future releases, based on your feedback.
I'm sorry that we cannot address your need instantly, but this is the reality of how software development cycles work. I suggest we don't keep beating a dead horse.
I think you should read my support ticket, your helpdesk person thinks thats not the case. Before I waste time running it for 7 days again I'd like to know.
I've reviewed the case details and will discuss it with support folks - some information indeed might be little confusing. Current timeout of 7 days works for all VAW jobs and does not rely on license or management mode. This timeout shuts down the retry functionality as ell, so resume functionality does not work (resume works within the retry cycle of the backup job).
Mike,
Can you please clarify the following:
1. Is that a single 30 TB disk / volume?
2. Are you planning to continue incremental backup for this machine or, by any chance, this is a one-time full backup?
1. Any chance you can access this machine directly from Veeam B&R server? If yes, you can try file level backup (nas backup or file to tape job to VTL in case you don't have actual tape library) and split the content across multiple jobs to make the backup size less impactful.
2. Another option might be split the content of the volume across multiple file level agent jobs, additionally, you can exclude the content that is not required to be protected (and script those to run one after another)
As for the timeout behavior change it's planned for next major version.
I can second your frustration @Kubimike as we have been dealing with the 7-day hard limit since the very beginning. In fact, because we didn't know about this, it took us a good 4 months to initially figure out how to cleverly backup our data and that's even after seeding many TB in anticipation of avoiding lengthy backups. This limit has caused a lot of grief and wasted time. Here's a kicker for ya, we have gotten somewhere between 95%-99% on a number of large backups for the jobs to only fail right at the very last few minutes and as you know, when this happens you have to restart from the very beginning. Talk about infuriating!
I am now facing an issue with the Change Block Mechanisms of the software and it's causing a slow down which also causes us to hit that 7-day limit. Haven't been able to backup our files for a couple weeks now because of this.
I sure hope they can accommodate the needs of their users who need more than 7 days. It can't be that hard to implement an override
I am now facing an issue with the Change Block Mechanisms of the software and it's causing a slow down which also causes us to hit that 7-day limit. Haven't been able to backup our files for a couple weeks now because of this.
Any chance you have a case ID to share for a review?
Update: We've discussed the issue with support folks and it might be possible to create an isolated fix to adjust the limit, so looks like we will have a workaround now which can be requested via support team. Stay tuned.