Host-based backup of VMware vSphere VMs.
Post Reply
kevin.boddy
Service Provider
Posts: 155
Liked: 11 times
Joined: Jan 30, 2018 3:24 pm
Full Name: Kevin Boddy
Contact:

CDP is failing but not reporting a failure

Post by kevin.boddy »

Hi,

I had a strange one recently where I have two VMs in a CDP policy. After some upgrades on the target ESXi hosts, the one VM started showing a warning and was failing to remove a restore point on the target. This carried on two days. The SLA for the policy still said 100% but the VM wasn't replicating.

Why is there no alert for this kind of issue?

The next issue was that looking into the logs, the Veeam server was trying to connect to a ESXi host and failing with some certificate error. I moved the CDP replica VM to another host. Put the original ESXi host into maintenance mode and removed it from the VMware cluster. The CDP policy still kept trying to use that one host for the CDP. I even tried rebooting the VBR server thinking if all the services restart the CDP coordinator would see that the target VM is one another ESXi host now.

Nothing I did made it work. The only option I had was delete the replica and restart the CDP policy. This then re-replicated the VM and everything is working again.

What did I do wrong?
How do I get Veeam to update which target ESXi host it's using for the CDP?
If a host fails in your VMware cluster is that going to cause a problem with your CDP replication?

Thanks
Kevin
HannesK
Product Manager
Posts: 14322
Liked: 2890 times
Joined: Sep 01, 2014 11:46 am
Full Name: Hannes Kasparick
Location: Austria
Contact:

Re: CDP is failing but not reporting a failure

Post by HannesK »

Hello,
Why is there no alert for this kind of issue?
That sounds like a technical issue. Please provide a support case ID for this issue, as requested when you click New Topic.

Without case number, the topic will eventually be
deleted by moderators
.
What did I do wrong?
maybe nothing (assuming, that you pointed the CDP policy to the cluster and not to the host). Hard to say. Support should be able to answer that question.

Thank you,
Hannes
PS: support can only help if you upload logs https://www.veeam.com/kb1832
veremin
Product Manager
Posts: 20284
Liked: 2258 times
Joined: Oct 26, 2012 3:28 pm
Full Name: Vladimir Eremin
Contact:

Re: CDP is failing but not reporting a failure

Post by veremin »

Agree with Hannes here, the provided behavior with the stalled restore point removal and stuck CDP replication does not look expected. If you provide us with the support ticket, we can ensure that the case is given due attention by both the support and R&D teams. Thanks!
kevin.boddy
Service Provider
Posts: 155
Liked: 11 times
Joined: Jan 30, 2018 3:24 pm
Full Name: Kevin Boddy
Contact:

Re: CDP is failing but not reporting a failure

Post by kevin.boddy »

I've logged a case #05854706.

I posted in the forums first because I wasn't sure if this was expected behavior. It takes a lot of time to create a case, prepare and upload logs. It's easier to post a quick comments in the forums, at least then you can get an idea of where to go next.

This isn't the first or second time we've had these kind of weird CDP issues either where jobs are failing but not reporting anything is wrong.
veremin
Product Manager
Posts: 20284
Liked: 2258 times
Joined: Oct 26, 2012 3:28 pm
Full Name: Vladimir Eremin
Contact:

Re: CDP is failing but not reporting a failure

Post by veremin »

Based on the description it does not seem like expected behavior, and to investigate those types of issues we always require debug logs and potential access to the environment - which cannot be done via forum correspondence.

Thanks, Kevin, for sharing the support case number, we will review it internally.
kevin.boddy
Service Provider
Posts: 155
Liked: 11 times
Joined: Jan 30, 2018 3:24 pm
Full Name: Kevin Boddy
Contact:

Re: CDP is failing but not reporting a failure

Post by kevin.boddy »

Hi,

Struggling with a language issue and understanding issue on this case. Is the next course of action an escalation request?

Thanks
Kevin
kevin.boddy
Service Provider
Posts: 155
Liked: 11 times
Joined: Jan 30, 2018 3:24 pm
Full Name: Kevin Boddy
Contact:

Re: CDP is failing but not reporting a failure

Post by kevin.boddy »

So this is what I've learned so far from support.

1. A 0% SLA with failures being logged about not being able to enable the CDP policy is not classified as a failure of the CDP policy.
2. Replicas don't seem to ever be rescanned so if something changes in your environment. IO filter issues, host failure, you have to manually go and rescan all the replicas.

Surely this cannot be correct?

How can a 0% SLA for a CDP policy for 2 days be considered nothing more than a warning? It even says in the job log enabling CDP with errors, failed to enable CDP, failed to process disks. How is that not a failure that I should be notified about?

Why does Veeam not re-scan replicas automatically when enabling or disabling a CDP policy or even rebooting the VBR server? I know it has a re-scan interval for the rest of the added infrastructure like backup proxies, vCenter servers etc. Why not replicas?
veremin
Product Manager
Posts: 20284
Liked: 2258 times
Joined: Oct 26, 2012 3:28 pm
Full Name: Vladimir Eremin
Contact:

Re: CDP is failing but not reporting a failure

Post by veremin »

The information does not seem correct. We are currently investigating the case internally. I will keep the topic updated.
veremin
Product Manager
Posts: 20284
Liked: 2258 times
Joined: Oct 26, 2012 3:28 pm
Full Name: Vladimir Eremin
Contact:

Re: CDP is failing but not reporting a failure

Post by veremin » 1 person likes this post

Upon further investigation, it turned out:

1. That you didn't receive the CDP notification emails originally, because the global notification settings were not configured at that moment

2. The issue was caused by I/O filter misbehavior (originated from manual infrastructure reconfiguration). In this case, we recommend going through the I/O filter management wizard to re-apply the settings. Virtual infrastructure as well as backup components get rescanned and queried automatically on regular basis (CDP components included).

Having said that, if you are not satisfied with the level of the support provided, you can always escalate the ticket using the "Talk to Manager" button.

Thanks!
kevin.boddy
Service Provider
Posts: 155
Liked: 11 times
Joined: Jan 30, 2018 3:24 pm
Full Name: Kevin Boddy
Contact:

Re: CDP is failing but not reporting a failure

Post by kevin.boddy »

Hi,

I have requested to talk to a manger.

1. Global notifications are enabled and have always been enabled. I have repeated myself and provided screenshots to prove that we are getting the notifications and yet I still get told that I am wrong and the notifications are not enabled.

2. The I/O filter misbehavior was not caused by manual infrastructure reconfiguration. Changes to the infrastructure were only attempt AFTER it was discovered that the CDP policy was not working. Those manual changes were not detected by Veeam after waiting an hour or even after a complete VBR server reboot so I don't know what the regular basis is, but it clearly doesn't do it after a reboot or every hour.

Making any I/O filter changes to a production cluster can cause all kinds of issues due to the fact that the hosts have to be placed into maintenance mode before the I/O filter can be removed and re-installed. I also already tried to put the host into maintenance mode and removing it from the cluster and noted in the task view of the ESXi host, that the I/O filter was uninstalled and then re-installed when the host was removed and then placed back into the cluster. No further I/O filter reconfiguration should have been required yet it still did not work.

So far I have had no feedback on what the cause of the original I/O filter driver issue was, it's now been 10 days since the case was opened and all I have is that my global notifications are not enabled.

Thanks
Kevin
veremin
Product Manager
Posts: 20284
Liked: 2258 times
Joined: Oct 26, 2012 3:28 pm
Full Name: Vladimir Eremin
Contact:

Re: CDP is failing but not reporting a failure

Post by veremin »

According to the debug logs, global notification settings were not enabled at the time of failure. And unfortunately, the current screenshots could not confirm the opposite.

As to I/O filter misbehavior, let the escalation engineer take a deeper look at the issue and see what might have been the root cause.

Thanks!
kevin.boddy
Service Provider
Posts: 155
Liked: 11 times
Joined: Jan 30, 2018 3:24 pm
Full Name: Kevin Boddy
Contact:

Re: CDP is failing but not reporting a failure

Post by kevin.boddy »

Hi,

I am still working with the original engineer. The escalation engineer did not take over the case.

The engineer has checked my system and confirmed global notification settings are enabled. The CDP policy notifications appear to be an issue. They have a mind of there own.
I have warning notifications turned off yet I still get warning notifications for CDP policies.
Sometimes the RPO violation is flagged as a warning sometimes an error, for the same VM with the same RPO issue in the same CDP policy.

Maybe there is some undocumented behavior for the CDP policy notifications, I don't know but it's not right.
I need consistent correct notifications to make sure my customers VMs are protected as per the SLA.

The engineer hasn't looked into the I/O filter issue yet. Hopefully we'll still get to that.

Thanks
Kevin
FrenchBlue
Expert
Posts: 107
Liked: 19 times
Joined: Mar 18, 2021 6:04 pm
Contact:

Re: CDP is failing but not reporting a failure

Post by FrenchBlue » 1 person likes this post

Hello,

I would suggest to delete all cdp jobs and replicas, upgrade to V12 including cdp proxies, upgrade I/O filters, recreate cdp jobs and see if it is fixed.
cdp had serious flaws in V11. We're hoping it will be more stable in V12 (it is supposed to), we should soon know it.
veremin
Product Manager
Posts: 20284
Liked: 2258 times
Joined: Oct 26, 2012 3:28 pm
Full Name: Vladimir Eremin
Contact:

Re: CDP is failing but not reporting a failure

Post by veremin »

Kevin, we will be in touch with your support engineer to provide assistance from the R&D team (should the technical issue be confirmed). Thanks!
kevin.boddy
Service Provider
Posts: 155
Liked: 11 times
Joined: Jan 30, 2018 3:24 pm
Full Name: Kevin Boddy
Contact:

Re: CDP is failing but not reporting a failure

Post by kevin.boddy »

Second CDP case opened #05886924. Same problem again. It seems to break anytime I need to apply ESXi security patches.
Still waiting to find out what the ultimate cause of the problem is.
Post Reply

Who is online

Users browsing this forum: No registered users and 61 guests