-
- Service Provider
- Posts: 24
- Liked: never
- Joined: Jan 11, 2012 4:22 pm
- Full Name: Alex
- Contact:
Planned Failover with CDP Replica - is there data loss?
Hi,
When using normal replica jobs, Veeam has a very nice planned failover feature which we can use to move our workloads to another site prior to performing maintenance.
We're very confident with using this process, as the process;
- Shuts down the source vm's for you
- Replicates the remaining changes, ensuring zero data loss
- Powers up the target VM's
As a result, we find this feature invaluable as it perfectly conducts a full failover with zero data loss.
However, we've moved some critical vm's over to the new CDP replica policy for the 10second RPO. However, there is no planned failure option on these VM's.
Worse, when we click the only option (Failover now button), the replica VM powers up, but the source VM stays on.
In addition, if we power down the source VM, the CDP job stops, stating the VM needs to be powered on to continue. However, we have no idea whether the job copied the last of the data over, giving us the zero data loss guarantee.
Can anyone advise;
a) Is there any data loss if we power down the source, and wait for the CDP job to stop (due to vm needing to be on)? Or is there a chance the CDP job could stop before copying the last of the data?
b) Is there any guide of best practise on the steps to take to perform planned failover with CDP protected jobs? In testing, we;
- shut down source
- wait for CDP job to stop natuarally due to vm being off
- Initiate "Failover now" within Veeam
Is this process ok? Are there any extra steps or checks we should perform to ensure data safety?
When using normal replica jobs, Veeam has a very nice planned failover feature which we can use to move our workloads to another site prior to performing maintenance.
We're very confident with using this process, as the process;
- Shuts down the source vm's for you
- Replicates the remaining changes, ensuring zero data loss
- Powers up the target VM's
As a result, we find this feature invaluable as it perfectly conducts a full failover with zero data loss.
However, we've moved some critical vm's over to the new CDP replica policy for the 10second RPO. However, there is no planned failure option on these VM's.
Worse, when we click the only option (Failover now button), the replica VM powers up, but the source VM stays on.
In addition, if we power down the source VM, the CDP job stops, stating the VM needs to be powered on to continue. However, we have no idea whether the job copied the last of the data over, giving us the zero data loss guarantee.
Can anyone advise;
a) Is there any data loss if we power down the source, and wait for the CDP job to stop (due to vm needing to be on)? Or is there a chance the CDP job could stop before copying the last of the data?
b) Is there any guide of best practise on the steps to take to perform planned failover with CDP protected jobs? In testing, we;
- shut down source
- wait for CDP job to stop natuarally due to vm being off
- Initiate "Failover now" within Veeam
Is this process ok? Are there any extra steps or checks we should perform to ensure data safety?
-
- Product Manager
- Posts: 20450
- Liked: 2318 times
- Joined: Oct 26, 2012 3:28 pm
- Full Name: Vladimir Eremin
- Contact:
Re: Planned Failover with CDP Replica - is there data loss?
If you fail over replica VM to the latest restore point (default selection), backup server will try to select the latest available restore point, meaning even if you open a failover wizard, drink a cup of coffee and then press "failover to the latest restore point", we will choose the latest restore point that has been created in the background by constantly running CDP policy.
You can create sort of planned failover with PowerShell:
- Get replica VM
- Disable its CDP policy (or add VM to policy exclusion list or stop source VM or both)
- Fail over replica VM
If you need assistance with scripting process, kindly create a separate thread in our PowerShell subforum.
Thanks!
You can create sort of planned failover with PowerShell:
- Get replica VM
- Disable its CDP policy (or add VM to policy exclusion list or stop source VM or both)
- Fail over replica VM
If you need assistance with scripting process, kindly create a separate thread in our PowerShell subforum.
Thanks!
-
- Veeam Software
- Posts: 3649
- Liked: 610 times
- Joined: Aug 28, 2013 8:23 am
- Full Name: Petr Makarov
- Location: Prague, Czech Republic
- Contact:
Re: Planned Failover with CDP Replica - is there data loss?
I guess that failover plan might be an option as well, just add VMs protected by CDP to the plan.
Thanks!
Thanks!
-
- Product Manager
- Posts: 20450
- Liked: 2318 times
- Joined: Oct 26, 2012 3:28 pm
- Full Name: Vladimir Eremin
- Contact:
Re: Planned Failover with CDP Replica - is there data loss?
But failover plan during its execution does not stop the source VMs. Thanks!
-
- Service Provider
- Posts: 24
- Liked: never
- Joined: Jan 11, 2012 4:22 pm
- Full Name: Alex
- Contact:
Re: Planned Failover with CDP Replica - is there data loss?
Sorry for the slow reply all - just got back from Annual Leave.
Thanks for all the replies. I do have some further questions.
To explain this a bit better. Lets say the CDP job is set to 15 seconds RPO.
0 Seconds - Veeam copies a restore point
5 seconds - Shutdown command is initiated
10 seconds - VM completes shutdown
15 seconds - Veeam notices the vm is shutdown and stops the CDP job. Does it do one more sync (meaning no data loss), or does it simply shut down the CDP job (meaning we've lost 15 seconds)?
Or that disabling the CDP policy will enable the Failover VM feature?
I'm not quite getting how, once we've disabled CDP for the VM, we can then do a planned failover.
Unless the logic is more like;
- Get replica VM and shut it down
- Wait for shutdown to complete and CDP job to complete (to ensure all data replicated)
- Disable CDP policy
- Presumably the 'failover' feature becomes available at this point (is normally greyed out when CDP is in place), so we trigger a failover?
Thanks for all the replies. I do have some further questions.
So, with this, is there a guarantee that there is no data loss? When the VM is powered off, the CDP job stops replicating, so I just want to clarify - can we guarantee that the final sync has all the data copied?veremin wrote: ↑Jun 30, 2021 4:07 pm If you fail over replica VM to the latest restore point (default selection), backup server will try to select the latest available restore point, meaning even if you open a failover wizard, drink a cup of coffee and then press "failover to the latest restore point", we will choose the latest restore point that has been created in the background by constantly running CDP policy.
To explain this a bit better. Lets say the CDP job is set to 15 seconds RPO.
0 Seconds - Veeam copies a restore point
5 seconds - Shutdown command is initiated
10 seconds - VM completes shutdown
15 seconds - Veeam notices the vm is shutdown and stops the CDP job. Does it do one more sync (meaning no data loss), or does it simply shut down the CDP job (meaning we've lost 15 seconds)?
So this sounds interesting although I apologise - i'm not quite understanding the logic here - can you expand further. Are you suggesting that the VM is part of both a replica and a CDP job?
Or that disabling the CDP policy will enable the Failover VM feature?
I'm not quite getting how, once we've disabled CDP for the VM, we can then do a planned failover.
Unless the logic is more like;
- Get replica VM and shut it down
- Wait for shutdown to complete and CDP job to complete (to ensure all data replicated)
- Disable CDP policy
- Presumably the 'failover' feature becomes available at this point (is normally greyed out when CDP is in place), so we trigger a failover?
-
- Product Manager
- Posts: 20450
- Liked: 2318 times
- Joined: Oct 26, 2012 3:28 pm
- Full Name: Vladimir Eremin
- Contact:
Re: Planned Failover with CDP Replica - is there data loss?
Theoretically you can loose small amount of changes left on an I/O filter that is attached to the source VM. Source VM goes down, so does I/O filter. But we are talking about 1 MB of changes at max - this is the maximum size of data I/O filter can temporarily keep before sending it further. Thanks!When the VM is powered off, the CDP job stops replicating, so I just want to clarify - can we guarantee that the final sync has all the data copied?
-
- Veeam Legend
- Posts: 945
- Liked: 222 times
- Joined: Jul 19, 2016 8:39 am
- Full Name: Michael
- Location: Rheintal, Austria
- Contact:
Re: Planned Failover with CDP Replica - is there data loss?
Just had a similar situation and wannted to create a thread - when I saw that there was an existing one...
The thing is, if you've got a database on the source-side, you could potentially loose transactions. Yeah, there might be a transaction backup, but then you'd have to restore on the cdp-replica before you could continue. I would also find it nice if you could be sure that the latest changes were replicated. Maybe leave the vm powered on after os shutdown (if that's possible) or maybe it's enough to disconnect from the network, do a cdp replication pass and trigger the shutdown then?
One thing I'd like to mention is the time it takes until the CDP job is stopped: After starting the failover, it really took minutes until the cdp job was stopped for that particular vm - failover (replica start) won't kick in until cdp job has finished and so we had a longer downtime than you'd expect when doing all the RTO calculations.
Thanks for the feedback, PM's.
The thing is, if you've got a database on the source-side, you could potentially loose transactions. Yeah, there might be a transaction backup, but then you'd have to restore on the cdp-replica before you could continue. I would also find it nice if you could be sure that the latest changes were replicated. Maybe leave the vm powered on after os shutdown (if that's possible) or maybe it's enough to disconnect from the network, do a cdp replication pass and trigger the shutdown then?
One thing I'd like to mention is the time it takes until the CDP job is stopped: After starting the failover, it really took minutes until the cdp job was stopped for that particular vm - failover (replica start) won't kick in until cdp job has finished and so we had a longer downtime than you'd expect when doing all the RTO calculations.
Thanks for the feedback, PM's.
-
- Product Manager
- Posts: 20450
- Liked: 2318 times
- Joined: Oct 26, 2012 3:28 pm
- Full Name: Vladimir Eremin
- Contact:
Re: Planned Failover with CDP Replica - is there data loss?
Does not look expected - should have been seconds instead. If you can reproduce the issue outside of your working hours, collect required logs and open a ticket with us, we can double check it internally.One thing I'd like to mention is the time it takes until the CDP job is stopped: After starting the failover, it really took minutes until the cdp job was stopped for that particular vm - failover (replica start) won't kick in until cdp job has finished and so we had a longer downtime than you'd expect when doing all the RTO calculations.
-
- Veeam Legend
- Posts: 945
- Liked: 222 times
- Joined: Jul 19, 2016 8:39 am
- Full Name: Michael
- Location: Rheintal, Austria
- Contact:
Re: Planned Failover with CDP Replica - is there data loss?
Thanks Vladimir,
just had a situation where I needed CDP so it suited well:
Obviously, CDP failover is quick, but stopping CDP takes ages. We've got a RPO of 3 minutes and I even waited 3+ minutes since the last replica pass - doesn't help. At a certain point I've suspended the vm in the hope that CDP would get stopped by that action and at a certain point, the cdp job was stopped and the failover took place.
Have you got an idea why stopping CDP would take that long (timeouts, etc.), tbh I have never observed it to be that quick (cdp job actions)...
Thanks!
just had a situation where I needed CDP so it suited well:
Obviously, CDP failover is quick, but stopping CDP takes ages. We've got a RPO of 3 minutes and I even waited 3+ minutes since the last replica pass - doesn't help. At a certain point I've suspended the vm in the hope that CDP would get stopped by that action and at a certain point, the cdp job was stopped and the failover took place.
Have you got an idea why stopping CDP would take that long (timeouts, etc.), tbh I have never observed it to be that quick (cdp job actions)...
Thanks!
-
- Product Manager
- Posts: 20450
- Liked: 2318 times
- Joined: Oct 26, 2012 3:28 pm
- Full Name: Vladimir Eremin
- Contact:
Re: Planned Failover with CDP Replica - is there data loss?
Can you collect logs from both backup and vCenter servers, create a ticket, attach them to it and share the ticket number here? I will ask QA team to check the case and see what exactly might affect CDP stoppage time. Thanks!
-
- Veeam Legend
- Posts: 945
- Liked: 222 times
- Joined: Jul 19, 2016 8:39 am
- Full Name: Michael
- Location: Rheintal, Austria
- Contact:
Re: Planned Failover with CDP Replica - is there data loss?
Hi Vladimir,
case no is 05270063
case no is 05270063
-
- Product Manager
- Posts: 20450
- Liked: 2318 times
- Joined: Oct 26, 2012 3:28 pm
- Full Name: Vladimir Eremin
- Contact:
Re: Planned Failover with CDP Replica - is there data loss?
Thank you, Michael, passed the information to QA team. Will let you know, if we find some interesting or we need additional information.
-
- Product Manager
- Posts: 20450
- Liked: 2318 times
- Joined: Oct 26, 2012 3:28 pm
- Full Name: Vladimir Eremin
- Contact:
Re: Planned Failover with CDP Replica - is there data loss?
R&D team have just tracked this issue internally and currently are investigating the root cause of long shutdown request processing. Will keep you updated.
-
- Product Manager
- Posts: 20450
- Liked: 2318 times
- Joined: Oct 26, 2012 3:28 pm
- Full Name: Vladimir Eremin
- Contact:
Re: Planned Failover with CDP Replica - is there data loss?
Update: the information provided within the collected dumps was not enough, so support engineer will reach you soon (if hasn't already) and ask you to install the special utility that allows us to log the corresponding process activity into separate file.
R&D team will analyze this file then and try to find the root cause. Will keep you posted.
Thanks!
R&D team will analyze this file then and try to find the root cause. Will keep you posted.
Thanks!
-
- Product Manager
- Posts: 20450
- Liked: 2318 times
- Joined: Oct 26, 2012 3:28 pm
- Full Name: Vladimir Eremin
- Contact:
Re: Planned Failover with CDP Replica - is there data loss?
Hi, @mcz,
Michael, support engineer has not heard back from you, any chance you can provide him with an answer any time soon?
Thanks!
Michael, support engineer has not heard back from you, any chance you can provide him with an answer any time soon?
Thanks!
-
- Veeam Legend
- Posts: 945
- Liked: 222 times
- Joined: Jul 19, 2016 8:39 am
- Full Name: Michael
- Location: Rheintal, Austria
- Contact:
Re: Planned Failover with CDP Replica - is there data loss?
Hi Vladimir, we had kinda communication failure last week as the remote session was not held at the requested time. I'm very busy today but will try to repeat it tomorrow. Thanks for the notification!
-
- Product Manager
- Posts: 20450
- Liked: 2318 times
- Joined: Oct 26, 2012 3:28 pm
- Full Name: Vladimir Eremin
- Contact:
Re: Planned Failover with CDP Replica - is there data loss?
Thank you, Michael, no time pressure or something - we were just interested to collect more information regarding slow CDP failover.
-
- Veeam Legend
- Posts: 945
- Liked: 222 times
- Joined: Jul 19, 2016 8:39 am
- Full Name: Michael
- Location: Rheintal, Austria
- Contact:
Re: Planned Failover with CDP Replica - is there data loss?
okay Vladimir, now it's getting weird... Just had a remote-session with the engineer and now we can't reproduce the issue! I had another case where the cdp alarm was triggered since weeks and today it suddenly was working correctly. I would bet that these two cases are linked to another. I did some changes on the network adapters on the veeam-components but nothing of that could explain it - I'll check the config to see if we could revert it...
-
- Product Manager
- Posts: 20450
- Liked: 2318 times
- Joined: Oct 26, 2012 3:28 pm
- Full Name: Vladimir Eremin
- Contact:
Re: Planned Failover with CDP Replica - is there data loss?
At least now everything is working as expected But please do provide your findings to the support engineer, the more information we have about sporadic issue, the better.
-
- Veeam Legend
- Posts: 945
- Liked: 222 times
- Joined: Jul 19, 2016 8:39 am
- Full Name: Michael
- Location: Rheintal, Austria
- Contact:
Re: Planned Failover with CDP Replica - is there data loss?
Interesting news here... I've reverted one change: The preferred network. The veeam server plus the cdp proxies both have 2 NIC's to different networks, but direct communication without port filters is only possible on one network. Is it possible that when the veeam server tries to stop the cdp job on the proxy (and it fails due to blocked ports) that it would wait for a certain amount and then used a different strategy to reach the proxy?
I'll try to create the dump for further analysis.
I'll try to create the dump for further analysis.
-
- Veeam Legend
- Posts: 945
- Liked: 222 times
- Joined: Jul 19, 2016 8:39 am
- Full Name: Michael
- Location: Rheintal, Austria
- Contact:
Re: Planned Failover with CDP Replica - is there data loss?
ok, I was able to collect the dump while the issue occured. Hopefully this will be enough to find the root cause as I can't reproduce it all the time (last time, it was again very quick after it was very slow on the first run). Curious what the outcome will be.
-
- Product Manager
- Posts: 20450
- Liked: 2318 times
- Joined: Oct 26, 2012 3:28 pm
- Full Name: Vladimir Eremin
- Contact:
Re: Planned Failover with CDP Replica - is there data loss?
The last time the dumps were not enough, but let's see how it goes. I will keep you posted.
-
- Veeam Legend
- Posts: 945
- Liked: 222 times
- Joined: Jul 19, 2016 8:39 am
- Full Name: Michael
- Location: Rheintal, Austria
- Contact:
Re: Planned Failover with CDP Replica - is there data loss?
Thanks Vladimir.
-
- Veeam Legend
- Posts: 945
- Liked: 222 times
- Joined: Jul 19, 2016 8:39 am
- Full Name: Michael
- Location: Rheintal, Austria
- Contact:
Re: Planned Failover with CDP Replica - is there data loss?
Okay, I've received a fix today, obviously it was a bug??? What was the reason for it to occur? Thanks
-
- Product Manager
- Posts: 20450
- Liked: 2318 times
- Joined: Oct 26, 2012 3:28 pm
- Full Name: Vladimir Eremin
- Contact:
Re: Planned Failover with CDP Replica - is there data loss?
11a introduced the bug that affects CDP failover speed – under certain circumstances issues with source proxy (unavailability or similar) might result in degraded failover performance.
For future readers: if you are experiencing the similar symptoms, kindly reach our support team, let them review the debug logs and apply the 378749 fix to your environment (if necessary). Also, don't forget to share the case Id here.
Thanks!
For future readers: if you are experiencing the similar symptoms, kindly reach our support team, let them review the debug logs and apply the 378749 fix to your environment (if necessary). Also, don't forget to share the case Id here.
Thanks!
-
- Veeam Legend
- Posts: 945
- Liked: 222 times
- Joined: Jul 19, 2016 8:39 am
- Full Name: Michael
- Location: Rheintal, Austria
- Contact:
Re: Planned Failover with CDP Replica - is there data loss?
Vladimir, I've installed the patch, but it didn't help in my case. I've already informed the engineer, so I guess further investigation is needed. Just for your information.
-
- Veeam Legend
- Posts: 945
- Liked: 222 times
- Joined: Jul 19, 2016 8:39 am
- Full Name: Michael
- Location: Rheintal, Austria
- Contact:
Re: Planned Failover with CDP Replica - is there data loss?
Hi Vladimir,
just had a nice remote session with my engineer and two guys from RnD and QA. By working perfectly together the issue has been found and it is quite simple: There's no IP cache implemented and in our case we have 4 network adapters. By not using the correct preferred network the veeam server tries to connect on to the cdp proxy on different interfaces or better said it just iterates through them. If the connection fails (e.g. when it is blocked by firewalls or anything similar), veeam uses the next adapter but of course there are many tcp retransmissions and waiting intervals inbetween and eventually it would fail due to a max retry or a timeout.
Due to the fact that many connections are being initiated AND that there is no IP/adapter cache it just adds up to an extreme amount of time compared to the 'normal case'. We just did a POC by just enabling the traffic on that problematic adapter and it was just like the lift off of a rocket ship! Don't know if something would be improved on veeam side, the developer mentioned that maybe a special hint in the logs would be useful for similar cases in the future, but I don't know if the whole connection-approach would find an improvement in the future. Thanks Vladimir, for me this case is kind of done.
just had a nice remote session with my engineer and two guys from RnD and QA. By working perfectly together the issue has been found and it is quite simple: There's no IP cache implemented and in our case we have 4 network adapters. By not using the correct preferred network the veeam server tries to connect on to the cdp proxy on different interfaces or better said it just iterates through them. If the connection fails (e.g. when it is blocked by firewalls or anything similar), veeam uses the next adapter but of course there are many tcp retransmissions and waiting intervals inbetween and eventually it would fail due to a max retry or a timeout.
Due to the fact that many connections are being initiated AND that there is no IP/adapter cache it just adds up to an extreme amount of time compared to the 'normal case'. We just did a POC by just enabling the traffic on that problematic adapter and it was just like the lift off of a rocket ship! Don't know if something would be improved on veeam side, the developer mentioned that maybe a special hint in the logs would be useful for similar cases in the future, but I don't know if the whole connection-approach would find an improvement in the future. Thanks Vladimir, for me this case is kind of done.
-
- Product Manager
- Posts: 20450
- Liked: 2318 times
- Joined: Oct 26, 2012 3:28 pm
- Full Name: Vladimir Eremin
- Contact:
Re: Planned Failover with CDP Replica - is there data loss?
Thank you, Michael, for coming back and providing the perfect summary for the issue experienced - it is definitely beneficial for future readers that come across the similar problem.
And we will think how this behavior might be improved in one of the next product versions.
And we will think how this behavior might be improved in one of the next product versions.
Who is online
Users browsing this forum: Baidu [Spider], Semrush [Bot] and 17 guests