Host-based backup of VMware vSphere VMs.
Post Reply
wasc
Service Provider
Posts: 24
Liked: never
Joined: Jan 11, 2012 4:22 pm
Full Name: Alex
Contact:

Planned Failover with CDP Replica - is there data loss?

Post by wasc »

Hi,

When using normal replica jobs, Veeam has a very nice planned failover feature which we can use to move our workloads to another site prior to performing maintenance.
We're very confident with using this process, as the process;
- Shuts down the source vm's for you
- Replicates the remaining changes, ensuring zero data loss
- Powers up the target VM's
As a result, we find this feature invaluable as it perfectly conducts a full failover with zero data loss.

However, we've moved some critical vm's over to the new CDP replica policy for the 10second RPO. However, there is no planned failure option on these VM's.
Worse, when we click the only option (Failover now button), the replica VM powers up, but the source VM stays on.
In addition, if we power down the source VM, the CDP job stops, stating the VM needs to be powered on to continue. However, we have no idea whether the job copied the last of the data over, giving us the zero data loss guarantee.

Can anyone advise;
a) Is there any data loss if we power down the source, and wait for the CDP job to stop (due to vm needing to be on)? Or is there a chance the CDP job could stop before copying the last of the data?
b) Is there any guide of best practise on the steps to take to perform planned failover with CDP protected jobs? In testing, we;
- shut down source
- wait for CDP job to stop natuarally due to vm being off
- Initiate "Failover now" within Veeam

Is this process ok? Are there any extra steps or checks we should perform to ensure data safety?
veremin
Product Manager
Posts: 20270
Liked: 2252 times
Joined: Oct 26, 2012 3:28 pm
Full Name: Vladimir Eremin
Contact:

Re: Planned Failover with CDP Replica - is there data loss?

Post by veremin »

If you fail over replica VM to the latest restore point (default selection), backup server will try to select the latest available restore point, meaning even if you open a failover wizard, drink a cup of coffee and then press "failover to the latest restore point", we will choose the latest restore point that has been created in the background by constantly running CDP policy.

You can create sort of planned failover with PowerShell:

- Get replica VM
- Disable its CDP policy (or add VM to policy exclusion list or stop source VM or both)
- Fail over replica VM

If you need assistance with scripting process, kindly create a separate thread in our PowerShell subforum.

Thanks!
PetrM
Veeam Software
Posts: 3229
Liked: 520 times
Joined: Aug 28, 2013 8:23 am
Full Name: Petr Makarov
Location: Prague, Czech Republic
Contact:

Re: Planned Failover with CDP Replica - is there data loss?

Post by PetrM »

I guess that failover plan might be an option as well, just add VMs protected by CDP to the plan.

Thanks!
veremin
Product Manager
Posts: 20270
Liked: 2252 times
Joined: Oct 26, 2012 3:28 pm
Full Name: Vladimir Eremin
Contact:

Re: Planned Failover with CDP Replica - is there data loss?

Post by veremin »

But failover plan during its execution does not stop the source VMs. Thanks!
wasc
Service Provider
Posts: 24
Liked: never
Joined: Jan 11, 2012 4:22 pm
Full Name: Alex
Contact:

Re: Planned Failover with CDP Replica - is there data loss?

Post by wasc »

Sorry for the slow reply all - just got back from Annual Leave.

Thanks for all the replies. I do have some further questions.
veremin wrote: Jun 30, 2021 4:07 pm If you fail over replica VM to the latest restore point (default selection), backup server will try to select the latest available restore point, meaning even if you open a failover wizard, drink a cup of coffee and then press "failover to the latest restore point", we will choose the latest restore point that has been created in the background by constantly running CDP policy.
So, with this, is there a guarantee that there is no data loss? When the VM is powered off, the CDP job stops replicating, so I just want to clarify - can we guarantee that the final sync has all the data copied?
To explain this a bit better. Lets say the CDP job is set to 15 seconds RPO.
0 Seconds - Veeam copies a restore point
5 seconds - Shutdown command is initiated
10 seconds - VM completes shutdown
15 seconds - Veeam notices the vm is shutdown and stops the CDP job. Does it do one more sync (meaning no data loss), or does it simply shut down the CDP job (meaning we've lost 15 seconds)?

veremin wrote: Jun 30, 2021 4:07 pm You can create sort of planned failover with PowerShell:

- Get replica VM
- Disable its CDP policy (or add VM to policy exclusion list or stop source VM or both)
- Fail over replica VM

Thanks!
So this sounds interesting although I apologise - i'm not quite understanding the logic here - can you expand further. Are you suggesting that the VM is part of both a replica and a CDP job?
Or that disabling the CDP policy will enable the Failover VM feature?
I'm not quite getting how, once we've disabled CDP for the VM, we can then do a planned failover.

Unless the logic is more like;
- Get replica VM and shut it down
- Wait for shutdown to complete and CDP job to complete (to ensure all data replicated)
- Disable CDP policy
- Presumably the 'failover' feature becomes available at this point (is normally greyed out when CDP is in place), so we trigger a failover?
veremin
Product Manager
Posts: 20270
Liked: 2252 times
Joined: Oct 26, 2012 3:28 pm
Full Name: Vladimir Eremin
Contact:

Re: Planned Failover with CDP Replica - is there data loss?

Post by veremin »

When the VM is powered off, the CDP job stops replicating, so I just want to clarify - can we guarantee that the final sync has all the data copied?
Theoretically you can loose small amount of changes left on an I/O filter that is attached to the source VM. Source VM goes down, so does I/O filter. But we are talking about 1 MB of changes at max - this is the maximum size of data I/O filter can temporarily keep before sending it further. Thanks!
mcz
Veeam Legend
Posts: 835
Liked: 172 times
Joined: Jul 19, 2016 8:39 am
Full Name: Michael
Location: Rheintal, Austria
Contact:

Re: Planned Failover with CDP Replica - is there data loss?

Post by mcz »

Just had a similar situation and wannted to create a thread - when I saw that there was an existing one...

The thing is, if you've got a database on the source-side, you could potentially loose transactions. Yeah, there might be a transaction backup, but then you'd have to restore on the cdp-replica before you could continue. I would also find it nice if you could be sure that the latest changes were replicated. Maybe leave the vm powered on after os shutdown (if that's possible) or maybe it's enough to disconnect from the network, do a cdp replication pass and trigger the shutdown then?

One thing I'd like to mention is the time it takes until the CDP job is stopped: After starting the failover, it really took minutes until the cdp job was stopped for that particular vm - failover (replica start) won't kick in until cdp job has finished and so we had a longer downtime than you'd expect when doing all the RTO calculations.

Thanks for the feedback, PM's.
veremin
Product Manager
Posts: 20270
Liked: 2252 times
Joined: Oct 26, 2012 3:28 pm
Full Name: Vladimir Eremin
Contact:

Re: Planned Failover with CDP Replica - is there data loss?

Post by veremin »

One thing I'd like to mention is the time it takes until the CDP job is stopped: After starting the failover, it really took minutes until the cdp job was stopped for that particular vm - failover (replica start) won't kick in until cdp job has finished and so we had a longer downtime than you'd expect when doing all the RTO calculations.
Does not look expected - should have been seconds instead. If you can reproduce the issue outside of your working hours, collect required logs and open a ticket with us, we can double check it internally.
mcz
Veeam Legend
Posts: 835
Liked: 172 times
Joined: Jul 19, 2016 8:39 am
Full Name: Michael
Location: Rheintal, Austria
Contact:

Re: Planned Failover with CDP Replica - is there data loss?

Post by mcz »

Thanks Vladimir,

just had a situation where I needed CDP so it suited well:

Image

Obviously, CDP failover is quick, but stopping CDP takes ages. We've got a RPO of 3 minutes and I even waited 3+ minutes since the last replica pass - doesn't help. At a certain point I've suspended the vm in the hope that CDP would get stopped by that action and at a certain point, the cdp job was stopped and the failover took place.

Have you got an idea why stopping CDP would take that long (timeouts, etc.), tbh I have never observed it to be that quick (cdp job actions)...

Thanks!
veremin
Product Manager
Posts: 20270
Liked: 2252 times
Joined: Oct 26, 2012 3:28 pm
Full Name: Vladimir Eremin
Contact:

Re: Planned Failover with CDP Replica - is there data loss?

Post by veremin »

Can you collect logs from both backup and vCenter servers, create a ticket, attach them to it and share the ticket number here? I will ask QA team to check the case and see what exactly might affect CDP stoppage time. Thanks!
mcz
Veeam Legend
Posts: 835
Liked: 172 times
Joined: Jul 19, 2016 8:39 am
Full Name: Michael
Location: Rheintal, Austria
Contact:

Re: Planned Failover with CDP Replica - is there data loss?

Post by mcz » 1 person likes this post

Hi Vladimir,

case no is 05270063
veremin
Product Manager
Posts: 20270
Liked: 2252 times
Joined: Oct 26, 2012 3:28 pm
Full Name: Vladimir Eremin
Contact:

Re: Planned Failover with CDP Replica - is there data loss?

Post by veremin »

Thank you, Michael, passed the information to QA team. Will let you know, if we find some interesting or we need additional information.
veremin
Product Manager
Posts: 20270
Liked: 2252 times
Joined: Oct 26, 2012 3:28 pm
Full Name: Vladimir Eremin
Contact:

Re: Planned Failover with CDP Replica - is there data loss?

Post by veremin » 1 person likes this post

R&D team have just tracked this issue internally and currently are investigating the root cause of long shutdown request processing. Will keep you updated.
veremin
Product Manager
Posts: 20270
Liked: 2252 times
Joined: Oct 26, 2012 3:28 pm
Full Name: Vladimir Eremin
Contact:

Re: Planned Failover with CDP Replica - is there data loss?

Post by veremin »

Update: the information provided within the collected dumps was not enough, so support engineer will reach you soon (if hasn't already) and ask you to install the special utility that allows us to log the corresponding process activity into separate file.

R&D team will analyze this file then and try to find the root cause. Will keep you posted.

Thanks!
veremin
Product Manager
Posts: 20270
Liked: 2252 times
Joined: Oct 26, 2012 3:28 pm
Full Name: Vladimir Eremin
Contact:

Re: Planned Failover with CDP Replica - is there data loss?

Post by veremin »

Hi, @mcz,

Michael, support engineer has not heard back from you, any chance you can provide him with an answer any time soon?

Thanks!
mcz
Veeam Legend
Posts: 835
Liked: 172 times
Joined: Jul 19, 2016 8:39 am
Full Name: Michael
Location: Rheintal, Austria
Contact:

Re: Planned Failover with CDP Replica - is there data loss?

Post by mcz » 1 person likes this post

Hi Vladimir, we had kinda communication failure last week as the remote session was not held at the requested time. I'm very busy today but will try to repeat it tomorrow. Thanks for the notification!
veremin
Product Manager
Posts: 20270
Liked: 2252 times
Joined: Oct 26, 2012 3:28 pm
Full Name: Vladimir Eremin
Contact:

Re: Planned Failover with CDP Replica - is there data loss?

Post by veremin »

Thank you, Michael, no time pressure or something - we were just interested to collect more information regarding slow CDP failover.
mcz
Veeam Legend
Posts: 835
Liked: 172 times
Joined: Jul 19, 2016 8:39 am
Full Name: Michael
Location: Rheintal, Austria
Contact:

Re: Planned Failover with CDP Replica - is there data loss?

Post by mcz » 1 person likes this post

okay Vladimir, now it's getting weird... Just had a remote-session with the engineer and now we can't reproduce the issue! I had another case where the cdp alarm was triggered since weeks and today it suddenly was working correctly. I would bet that these two cases are linked to another. I did some changes on the network adapters on the veeam-components but nothing of that could explain it - I'll check the config to see if we could revert it...
veremin
Product Manager
Posts: 20270
Liked: 2252 times
Joined: Oct 26, 2012 3:28 pm
Full Name: Vladimir Eremin
Contact:

Re: Planned Failover with CDP Replica - is there data loss?

Post by veremin »

At least now everything is working as expected :) But please do provide your findings to the support engineer, the more information we have about sporadic issue, the better.
mcz
Veeam Legend
Posts: 835
Liked: 172 times
Joined: Jul 19, 2016 8:39 am
Full Name: Michael
Location: Rheintal, Austria
Contact:

Re: Planned Failover with CDP Replica - is there data loss?

Post by mcz »

Interesting news here... I've reverted one change: The preferred network. The veeam server plus the cdp proxies both have 2 NIC's to different networks, but direct communication without port filters is only possible on one network. Is it possible that when the veeam server tries to stop the cdp job on the proxy (and it fails due to blocked ports) that it would wait for a certain amount and then used a different strategy to reach the proxy?

I'll try to create the dump for further analysis.
mcz
Veeam Legend
Posts: 835
Liked: 172 times
Joined: Jul 19, 2016 8:39 am
Full Name: Michael
Location: Rheintal, Austria
Contact:

Re: Planned Failover with CDP Replica - is there data loss?

Post by mcz » 1 person likes this post

ok, I was able to collect the dump while the issue occured. Hopefully this will be enough to find the root cause as I can't reproduce it all the time (last time, it was again very quick after it was very slow on the first run). Curious what the outcome will be.
veremin
Product Manager
Posts: 20270
Liked: 2252 times
Joined: Oct 26, 2012 3:28 pm
Full Name: Vladimir Eremin
Contact:

Re: Planned Failover with CDP Replica - is there data loss?

Post by veremin »

The last time the dumps were not enough, but let's see how it goes. I will keep you posted.
mcz
Veeam Legend
Posts: 835
Liked: 172 times
Joined: Jul 19, 2016 8:39 am
Full Name: Michael
Location: Rheintal, Austria
Contact:

Re: Planned Failover with CDP Replica - is there data loss?

Post by mcz » 1 person likes this post

Thanks Vladimir.
mcz
Veeam Legend
Posts: 835
Liked: 172 times
Joined: Jul 19, 2016 8:39 am
Full Name: Michael
Location: Rheintal, Austria
Contact:

Re: Planned Failover with CDP Replica - is there data loss?

Post by mcz »

Okay, I've received a fix today, obviously it was a bug??? What was the reason for it to occur? Thanks
veremin
Product Manager
Posts: 20270
Liked: 2252 times
Joined: Oct 26, 2012 3:28 pm
Full Name: Vladimir Eremin
Contact:

Re: Planned Failover with CDP Replica - is there data loss?

Post by veremin »

11a introduced the bug that affects CDP failover speed – under certain circumstances issues with source proxy (unavailability or similar) might result in degraded failover performance.

For future readers: if you are experiencing the similar symptoms, kindly reach our support team, let them review the debug logs and apply the 378749 fix to your environment (if necessary). Also, don't forget to share the case Id here.

Thanks!
mcz
Veeam Legend
Posts: 835
Liked: 172 times
Joined: Jul 19, 2016 8:39 am
Full Name: Michael
Location: Rheintal, Austria
Contact:

Re: Planned Failover with CDP Replica - is there data loss?

Post by mcz »

Vladimir, I've installed the patch, but it didn't help in my case. I've already informed the engineer, so I guess further investigation is needed. Just for your information.
mcz
Veeam Legend
Posts: 835
Liked: 172 times
Joined: Jul 19, 2016 8:39 am
Full Name: Michael
Location: Rheintal, Austria
Contact:

Re: Planned Failover with CDP Replica - is there data loss?

Post by mcz » 1 person likes this post

Hi Vladimir,

just had a nice remote session with my engineer and two guys from RnD and QA. By working perfectly together the issue has been found and it is quite simple: There's no IP cache implemented and in our case we have 4 network adapters. By not using the correct preferred network the veeam server tries to connect on to the cdp proxy on different interfaces or better said it just iterates through them. If the connection fails (e.g. when it is blocked by firewalls or anything similar), veeam uses the next adapter but of course there are many tcp retransmissions and waiting intervals inbetween and eventually it would fail due to a max retry or a timeout.

Due to the fact that many connections are being initiated AND that there is no IP/adapter cache it just adds up to an extreme amount of time compared to the 'normal case'. We just did a POC by just enabling the traffic on that problematic adapter and it was just like the lift off of a rocket ship! Don't know if something would be improved on veeam side, the developer mentioned that maybe a special hint in the logs would be useful for similar cases in the future, but I don't know if the whole connection-approach would find an improvement in the future. Thanks Vladimir, for me this case is kind of done.
veremin
Product Manager
Posts: 20270
Liked: 2252 times
Joined: Oct 26, 2012 3:28 pm
Full Name: Vladimir Eremin
Contact:

Re: Planned Failover with CDP Replica - is there data loss?

Post by veremin »

Thank you, Michael, for coming back and providing the perfect summary for the issue experienced - it is definitely beneficial for future readers that come across the similar problem.

And we will think how this behavior might be improved in one of the next product versions.
Post Reply

Who is online

Users browsing this forum: No registered users and 86 guests