"Please recreate the job" so tired of this. Robust engine ?

MB-NS · Post by **MB-NS** » Jan 18, 2012 10:05 am this post

Hello,

I'm not accustomed to this but I'm beginning to grow very tired of some recurring issues with VBR and how the support deals with this.
So I'm sorry if this sounds a bit like venting off (it is partially), but I hope that things can improve with that feedback.

We have several environment where the issues arised (VBR4, 5 and even 6).
Some cases are identical, some others are slightly different, but MY core issue is that we found ourselves, far too often, forced to recreate backups from scratch because the current backup files are corrupted.

This can happen when :
- disk space runs out on the target repository ("failed to delete oib" message ). It has happened every time we run out of space.
- when the synthetic FULL (incremental mode) fails for whatever reason (not only when running out of disk space)

Possible consequences seen :
- incremental doesn't work anymore
- incremental still works but synthetic full keeps failing (which in turn results in... running out of disk space)

The support answer is, in itself as infuriating as the issue itself (we opened something like 5-6 cases on these issues and it was almost always the same way of handling it) :
- first they tell us very quickly "please recreate the job". We can tell that very few thoughts were put in this suggestion (for instance, no logs asked). They clearly want to get rid of the case, hoping that we will readily apply their instructions.
- when we insist and ask for further investigation, they do it (ask for more logs and so on), but the answer is still "please recreate the job" in the end.

Recreating the job is a PITA for several reasons :
- recreating the job means a new FULL, which can take hours/days depending on the volumes being backed up. And I'm not even talking about backup over WAN links. It is not always without consequences for the VM being backup up.
- if we don't have the space to hold a new FULL asides from the old backups, we have to DELETE our current backups (most of the time even if the chain is corrupted old recovery points are still usable so there IS a loss)

I still don't understand how we can have a backup chain where recovery points are still usable, BUT nothing can be done to repair the backup chain without recreating it from scratch.

My opinion right now is that the backup files architecture lacks robustness and resiliency. Some suggestions :
- disk space runnning out : create an option to stop jobs BEFORE the disk space runs out, leaving the last session failed of course, but the backup files in a clean state at least. Shouldn't be hard at all since you already checks and warns on low disk space
- make the synthetic FULL work better. We had a lot more issues on jobs with incremental jobs with synthetic full rather than reverse incremental jobs.

Joe Nocella · Post by **Joe Nocella** » Jan 19, 2012 5:52 pm this post

This has been my general impression of the way tech support cases are handled by Veeam. I've opened up 4 cases since owning the product. In every case I've either found the solution on my own or learned to live with the situation the way it was. This is probably the result of a company that grew to big to fast and customer service is playing catch up. I've been surprised by the apparent lack of complaint threads on this forum. I suspect they are censored rather quikly.

Jan 20, 2012 3:23 pm

Joe Nocella wrote:This is probably the result of a company that grew to big to fast and customer service is playing catch up. I've been surprised by the apparent lack of complaint threads on this forum.

The real reason is that satisfaction rate with our support is at 93% as of Q4 2011 (according to customer responses in the feedback form sent when the case is closed). Which is extremely high number indeed (in my previous company, one of the software gorillas, 75% was considered to be great result).

Additionally, the satisfaction ratio has actually improved from 89% in 2010, when we first started to track this metric. So our growth does not negatively affect the quality of support, as you are suggesting.

Joe Nocella wrote:I suspect they are censored rather quikly.

This is not so, we do not censor our forums. You can easily find some topics with complaints around here.
http://forums.veeam.com/search.php?keyw ... rt+manager

Jan 20, 2012 4:38 pm

@MB-NS I agree, I recommend that you simply do not close the support case until the cause is actually confirmed and explained to you. Otherwise, the issue will simply never get to the high support tiers and R&D. Recreating the job is the easiest way to close the support case indeed, but also the fastest way to resolve the issue, which I am guessing is why this is often recommended. I am not involved with support nor I have much insight in our support org. I would definitely recommend to use that possibility of requesting callback from our support manager (one of the options on the feedback form when the case is closed), and let them know what can be improved. Our Director of Support often does these calls himself.

Regarding your technical suggestions:
1. It is impossible to predict if remaining disk space will be enough to store the entire backup file, so this is not feasible to implement really. Also, failing to write the latest backup file cannot affect existing backup files in any way with the incremental backup, so I am not really sure what is the issue here.
2. I am all for making the product work better, however it does not look like there are currently any known issues with synthetic fulls. Of course, we will address any reproducible bugs promptly, and for this it is important to always try to get down to the bottom of the issue with the support. No support case can be closed until you agree to close it anyway.

Thanks!

gcballard · Post by **gcballard** » Jan 20, 2012 8:48 pm this post

I have had issues that have been frustrating and had to recreate jobs, although this was not nearly as problematic for me. I have had some bizarre and frustrating issues but found support to be as helpful as was possible. In some instances, there was actually a hardware failure that I didn't know about (fibre card took a holiday). The root cause of virtually all of my tickets has not been veeam, but rather things like problems w/ vcenter, SAN performance (SATA disks), too many jobs running at one time, etc.

I will probably complain later, but Veeam does better than almost any company I have worked with exception possibly (IMHO) Compellent and Barracuda. YMMV

toabama · Post by **toabama** » Jan 23, 2012 9:52 am this post

I have also hade one frustrating answer from customer support. Whan complaining about the GUI in Veeam v6 is slow and flickering the answare from support was, "reinstall". And when i complain about "reinstall" is not our way of handling errors they tell me this is the fastest way but there will be a patch... but when?
I think the Veeam 6 has gone out out to the users/customers to fast without propper testing. Now the platform is unreliable and not working that well.
Version 5 worked very well and we did not have that much problems that where related to Veeam B&R.
Hope for new patches to get the platform stable!

Joe Nocella · Post by **Joe Nocella** » Jan 23, 2012 1:55 pm this post

Gostev wrote: The real reason is that satisfaction rate with our support is at 93% as of Q4 2011 (according to customer responses in the feedback form sent when the case is closed). Which is extremely high number indeed (in my previous company, one of the software gorillas, 75% was considered to be great result).

Additionally, the satisfaction ratio has actually improved from 89% in 2010, when we first started to track this metric. So our growth does not negatively affect the quality of support, as you are suggesting.
This is not so, we do not censor our forums. You can easily find some topics with complaints around here.
http://forums.veeam.com/search.php?keyw ... rt+manager

I have about twenty emails, several of which have gone unanswered from case #5166166, opened 1/11/12 that says otherwise. tired of it.

Jan 23, 2012 9:30 pm

Hi Joe, you should request your engineer to get a callback from the support management. Until you tell them there is a problem with how your support case is handled, no one would know! They cannot be monitoring every single support case, of course - this is physically impossible. Thanks.

MB-NS · Post by **MB-NS** » Jan 27, 2012 9:53 am this post

Gostev wrote:@MB-NS I agree, I recommend that you simply do not close the support case until the cause is actually confirmed and explained to you. Otherwise, the issue will simply never get to the high support tiers and R&D. Recreating the job is the easiest way to close the support case indeed, but also the fastest way to resolve the issue, which I am guessing is why this is often recommended. I am not involved with support nor I have much insight in our support org. I would definitely recommend to use that possibility of requesting callback from our support manager (one of the options on the feedback form when the case is closed), and let them know what can be improved. Our Director of Support often does these calls himself.

I will do that, thanks for the suggestion.

Gostev wrote: Regarding your technical suggestions:
1. It is impossible to predict if remaining disk space will be enough to store the entire backup file, so this is not feasible to implement really. Also, failing to write the latest backup file cannot affect existing backup files in any way with the incremental backup, so I am not really sure what is the issue here.

I wasn't talking about predicting if remaining space is enough actually.
It is much more simple : if there is less than XX GB disk space remaining, simply abort the job. XX being defined by the user, and this being an option of course.
It does affect jobs if Veeam runs out of disk space during a synthetic full (for example there might be simultaneously a synthetic full on job A and an incremental on job B ; if job B fills the disk up, the synthetic full on job A fails too and might corrupt permanently the backup chain).

Gostev wrote: 2. I am all for making the product work better, however it does not look like there are currently any known issues with synthetic fulls. Of course, we will address any reproducible bugs promptly, and for this it is important to always try to get down to the bottom of the issue with the support. No support case can be closed until you agree to close it anyway.

I agree with that, only problem is that until now we had to insist a lot to get the support to investigate. I'm not sure something can be done for past cases (broken backups were deleted since), but I'll make sure we investigate until everything is explained in the future.

Post by **ian0x0r** » Jan 30, 2012 11:29 am this post

This is an annoying problem (looking at the same issue right now in V5). Other than causes mentioned above, I’ve found changing transport methods can break the job as well. Going from Agent based to agentless mode is not necessarily a good idea. If upgrading to V6 and you retain legacy replica jobs and implement proxies with different transport methods, the same thing can happen.

To be fair, when I first saw this issue when V5 first came out, Veeam support spent a week looking over the logs, and then came to the conclusion that the job must be recreated, after pointing out the issue was the change of transport method.

What I don’t understand is why when the OIB error occurs, does it look for a VRB file that does not exist? The job could have failed for any reason prior to the OIB error. Is it possible to detect job failure and use the previous VRB file with date stamp in the file name and continue from this point? I guess with V6 this is not going to be applicable anymore, and hence the fix to the issue as far as replication is concerned.

Thanks,

Ian

JohannesBrodersen · Jun 19, 2012 9:52 am

I've seen this so many times I've stopped counting. I'm mainly using reversed incrementals on a target holding up to 4 TB. The job will run for a couple of hours and then halt with some error. The following job will then fail with the (in)famous message "failed to delete oib....".

This has been a problem in Veeam since version 5 if not for longer. How come Veeam hasn't got at mechanism that can scan the vrb's and vbk's and fix the problem or at least tell what's wrong with the file? I agree with MB-NS that I could be usefull to have a "storage-minimum"-alert/option.

Recreating the job is not a permanent solution since the window for backup is limited. So every time I recreate the job I will need a backup window for at least 21 hours.
We are having several customers that do not dare to change from their current backup products to Veeam when these problems occur in the POC.

If Veeam can solve this problem it will be a win-win situation for everyone.

thanks

/JB

R&D Forums

"Please recreate the job" so tired of this. Robust engine ?

Re: "Please recreate the job" so tired of this. Robust engin

Re: "Please recreate the job" so tired of this. Robust engin

Re: "Please recreate the job" so tired of this. Robust engin

Re: "Please recreate the job" so tired of this. Robust engin

Re: "Please recreate the job" so tired of this. Robust engin

Re: "Please recreate the job" so tired of this. Robust engin

Re: "Please recreate the job" so tired of this. Robust engin

Re: "Please recreate the job" so tired of this. Robust engin

Re: "Please recreate the job" so tired of this. Robust engin

Re: "Please recreate the job" so tired of this. Robust engin

Who is online