Jobs gone wild: not respecting RPO

bhagen · Post by **bhagen** » Aug 26, 2020 4:21 pm this post

Case # 04341808

I'm posting this for posterity, as support is completely baffled and cannot fix this issue. Maybe this will help somebody here.

All 10 backup jobs at a remote site are setup the same: forever forwards, 32 restore points. No synthetic fulls. Backup files health check once a month. Remove deleted items after 5 days (or 14 days in some cases).

5 of the jobs work fine and have 32 restore points.

The other 5 jobs have almost 170 restore points!

I've followed https://www.veeam.com/kb1990 and https://helpcenter.veeam.com/docs/backu ... ml?ver=100, but I cannot reduce the number of restore points no matter what I try.

Due to me (embarrassingly) not noticing this issue until my jobs started complaining about space issues on the repository, I didn't have enough space to simply do active fulls and wait 32 days to delete the old chains, so I had to cobble together another repository and create new jobs for each of the ones that have gone wild, and point those jobs to the other repository. (I've since set the repository free space alert from 10% to 20%.) Once I meet my RPOs, I'll delete the original chains from disk. Then I'll need to make new jobs again (because I'm not going to move 45TB worth of jobs!), point them to the original repository, wait 32 days, then delete the "other" ones from the temp repository.

The support guys are great, and we have tried several things to fix this, but...they are still totally baffled, and wouldn't have believed me if they hadn't seen it with their own eyes. And since there's "no possible way" that a forever forward job would ever add more restore points than demanded by the job, there's no fix for when it actually happens.

So, I guess I'll be closing the case, shrugging my shoulders, and hoping that the next issue I have with Veeam won't also be "impossible" and therefore have no fix.

Post by **Gostev** » Aug 26, 2020 4:47 pm this post

bhagen wrote: Aug 26, 2020 4:21 pmSo, I guess I'll be closing the case, shrugging my shoulders

Really, you should never do this. What if this is a legitimate bug? Just walking away will leave it unfixed, and no one wins from this...

Whenever you feel the case is stuck, just ask it to be escalated. Note that they will do it regardless according to the standard workflow, if they are not able to resolve within certain time limit - unless of course you agree to close the case before the mandatory escalation time comes.

I actually just looked up your case. It's not Veeam Support that is "completely baffled", but one specific Tier 1 guy. While Veeam Support also includes Tier 2 and Tier 3 folks, as well as actual Veeam R&D as the last resort. And these guys simply don't know the word "impossible"

bhagen · Post by **bhagen** » Aug 26, 2020 4:54 pm this post

I told him on day 1 that he simply need to text you @Gostev, and you'd have the answer! Somehow he wasn't willing to do that...

I'll ask him to escalate, then. Thanks!

Aug 26, 2020 4:56 pm

Oh. That, on the other hand, was a bad suggestion indeed

not something I will scale for with approx. 1000 support cases opened daily

bhagen · Post by **bhagen** » Aug 26, 2020 5:00 pm this post

Ha! Indeed!

bhagen · Aug 27, 2020 8:22 pm

Thanks so much @gostev; I got emails from a manager and a tier 2 tech. A webex session and 2 lines of sql code fixed my issues. Too bad me and the tier 1 guy had destroyed and "deleted from disk" 3 of the 5 chains that went out of control; but at least the remaining 2 are fixed, and I'm slowly bringing them back down from 170 restore points to the 32 I need.

Case will be closed shortly!

ChrisGundry · Post by **ChrisGundry** » Sep 02, 2020 11:53 am this post

I am finding myself increasingly frustrated with Veeam support interactions over the last few years...

I tend to run into this issue a lot where Tier1 say things like "that's not possible", or "the product is working as intended/by design". When in fact it used to work perfectly fine before we did the latest update, or until last week where it randomly stopped working for no reason I can determine.

It is especially frustrating when Tier1 says things like "working by design/as intended" when in relation to something that makes zero sense. If it was working as intended you wouldn't be talking to me… But regardless, I would expect support to see that what they are suggesting (that it is working as intended) is not logical when applied to the situation! In a recent case we had VeeamONE business views wiping their contents, then re-creating the contents each time the view was updated. During that process we had hundreds of alarms trigger for various things because they were no longer excluded whilst the view was empty during the rebuild. This was apparently "working as intended"… That is not logical, is it? Also not logical that it wasn't doing this for the last 2 years, but the day we update VeeamONE it stops working… I had to battle with support because they "checked with engineering and it is working as intended". Eventually after several weeks of constant alerts and pushing support, I get the answer back that I was right in the first instance, "there was a fault and they have now found it", later on we then get a hotfix for the issue.

It would be nice if Tier1 would appreciate that sometimes us admins (especially the Veeam certified ones) do know what we are talking about and do know that the job was working fine before the latest update etc.

Sep 02, 2020 12:39 pm

T1 growth problems!

Our (R&D) big ask is to never accept "bad" answers and keep escalating! Otherwise, we end up with a bug remaining in the product, and no one wins from this.

And yes, I know sometimes it can be too tedious and frustrating to "break through the wall", depending on a support engineer on the case. For such situations, we recommended using Talk to a Manager functionality in the Customer Portal. This guarantees that the case will be reviewed by a very experienced T2/T3 engineer.

Good idea to look at the certification level of the person opening a case though... I'll share this feedback with the support management for consideration.

ChrisGundry · Post by **ChrisGundry** » Sep 02, 2020 12:51 pm this post

Thanks for the reply Gostev (hope you enjoyed Italy!). 'Talk to a manager' is usually the way I 'break through the wall' to be fair, it is just frustrating that it seems to be required more and more in most of my cases.

A suggestion, if I can; One other thing I find very frustrating is the relay we have to have between support and 'engineering' (assuming that in some of my cases at least is actually R&D). I am often finding myself repeating myself 5+ times in various ways, trying to get the message across about why it is not working as intended. Usually the T2+ engineer 'gets it' and progresses the case to 'engineering'. But at that point, the case is still with the T2 engineer, but they are just relaying updates from 'engineering'/R&D, or chasing for updates after I chase them for updates. Is there a better way? Can we talk to 'engineering' directly?

I also find that if I say something like "that doesn't make sense, why is it done that way, can't you do it this way, or add option/button x to allow for y", in regards to a product issue, they normally say "post it on the forum". I already have a case open, talking to a Veeam engineer, who has access to 'engineering'/R&D, why can't they raise a request/query? Instead I have to re-write everything and post it on a forum?

Thanks!

Sep 02, 2020 2:25 pm

No problems, Chris!

1. Talking directly to the engineering is not possible unfortunately. They are not trained to interact directly with clients, nor all of them speak English to start with. Besides, the reality is the moment their contacts are known to a customer, many will start to bypass support and reach out directly, which will turn the support and development processes into an absolute chaos. So, we really need to do our best to "isolate" them. I personally struggle from this issue a lot actually, because I end up copied on sales conversation sometimes - and after 12 years of this happening, it now translates into the continuous stream of customers reaching out directly for all sorts of reasons, or even simply copying me on their support cases - making me receive every single update to the case. It's a pure Inbox disaster!

2. Support suggests using the forums because we (PM) push them to do this, as we want to have the direct conversation with the user. If we just get a request to add a checkbox, it is very unlikely we will add it based on this ask alone, because we hate checkboxes! Usually, we need to hear what a user is actually trying to achieve, as we're often able to propose a better, automated solution. In fact, many of our best features are the direct results of us following this approach to feature requests. Besides, we want to hear opinions not just from a single customer - but also all other having the same issue. And this often results in some REALLY good conversations between customers directly, which again helps us to distill the best approach to addressing the given feature request.

Having said that, our customer support does have a system in place to capture requests from users who are not comfortable to go to the forum, and they present us the top requests from this system periodically.

Thanks!

ChrisGundry · Post by **ChrisGundry** » Sep 03, 2020 8:03 am this post

Thanks for the reply!

R&D Forums

Jobs gone wild: not respecting RPO

Re: Jobs gone wild: not respecting RPO

Re: Jobs gone wild: not respecting RPO

Re: Jobs gone wild: not respecting RPO

Re: Jobs gone wild: not respecting RPO

Re: Jobs gone wild: not respecting RPO

Re: Jobs gone wild: not respecting RPO

Re: Jobs gone wild: not respecting RPO

Re: Jobs gone wild: not respecting RPO

Re: Jobs gone wild: not respecting RPO

Re: Jobs gone wild: not respecting RPO

Who is online