Host-based backup of VMware vSphere VMs.
Post Reply
ihussain
Influencer
Posts: 14
Liked: never
Joined: Feb 03, 2018 10:11 pm
Full Name: I Hussain
Contact:

Failback after Restore vs Quick Migration after Permanent Failover

Post by ihussain »

Hi

I have a VM recovery scenario that I would like some expert input on.

VM01 is a large VM (>5TB) which resides on the prod site (10.0.0.1/24) and is backed up locally. Due to the size of the vmdks (and their individually required retention policies) backup jobs are split into 3 smaller jobs instead of 1 big job.
This VM01 is also replicated to a DR site (10.0.99.0/24) twice daily.
VM01 has been accidentally deleted from the prod site and is now running from the replica failover VM01_replica at the DR site.
There is a 100Meg VPN link between the 2 sites and all backup & replica jobs for VM01 have been stopped in the interim.

In order to minimise further disruption and data loss what is the best & efficient way to get VM01 running back on the prod site?

1). Restore VM01 entirely from the multiple backup restore points at the prod site and then initiate the replica failback from the DR site. And then finally re-enabling>re-running the stopped backup & replication jobs.

Or

2). Initiate the Permanent Failover on VM01_replica at the DR site and then carry out a Veeam Quick Migration from the DR site to get VM01 back running on the vCenter at the prod site. And then finally re-enabling>re-running the stopped backup & replication jobs.

Or

3). Any other suggestions?

Please can someone advise on this? Thanks!
HannesK
Product Manager
Posts: 14840
Liked: 3086 times
Joined: Sep 01, 2014 11:46 am
Full Name: Hannes Kasparick
Location: Austria
Contact:

Re: Failback after Restore vs Quick Migration after Permanent Failover

Post by HannesK »

Hello,
Due to the size of the vmdks (and their individually required retention policies) backup jobs are split into 3 smaller jobs instead of 1 big job.
I recommend to fix the infrastructure performance instead of implementing such workarounds. VMs 10x bigger than yours are no problem with proper configuration & hardware.

I assume that "100Meg" means 100Mbit/s... so that means around 120h full transfer without any compression. I assume that you can use the Veeam WAN accelerator in "High bandwidth mode" https://helpcenter.veeam.com/docs/backu ... ml?ver=100

1. I have no experience with that "split job scenario", but it should work, yes. The calculation of changes will take some time.

2. I would go for 3 instead of quick migration.

I would go for option 3 because it sounds like you never tested that scenario and it looks like the "safest" and "most predictable" way without knowing any details about your infrastructure performance.

3. Do permanent failover. Create a new replication job that points from the DR site to the production site. Wait for 120h or less hours and do a planned failover. That way you have zero data loss. Then create new backup and replication jobs because the VM ID (MoRefID) changed and re-using the old jobs is complicated (please use forum search to see which options are available).

Best regards,
Hannes
ihussain
Influencer
Posts: 14
Liked: never
Joined: Feb 03, 2018 10:11 pm
Full Name: I Hussain
Contact:

Re: Failback after Restore vs Quick Migration after Permanent Failover

Post by ihussain »

Thanks Hannes

Would option 1 still entail data loss even when doing replica failback back onto the restored (from backup) VM01 at the prod site?

Can you elaborate more on option 3 please?
When doing a permanent failover then would VM01_replica become VM01 and adopt it's IP settings?
After doing the permanent failover where should the new replication job be created & run from? VBR server on the prod or the DR site?
HannesK
Product Manager
Posts: 14840
Liked: 3086 times
Joined: Sep 01, 2014 11:46 am
Full Name: Hannes Kasparick
Location: Austria
Contact:

Re: Failback after Restore vs Quick Migration after Permanent Failover

Post by HannesK »

reading these questions and having the feeling that we are talking about a production environment: I recommend to involve someone (a Veeam partner for example) who has worked with replication before. Or at least I recommend that you try out failover and failback with a test-VM

As far as I see, you did not commit failover yet. That means that you are running on snapshots on the target side and you might run out of disk space on the VMware datastore. Please follow the user guide and do not manually delete the snapshots!

1. I'm not sure whether you are really asking for "failback" or whether you are talking about "undo failover". "undo failover" causes data loss, yes. Please check the user guide https://helpcenter.veeam.com/docs/backu ... ml?ver=100

3. Failover already adopted the IP settings according to your replication job configuration. I'm talking about https://helpcenter.veeam.com/docs/backu ... ml?ver=100

Hmm, I did not see that you have two backup servers. If you really have two backup servers, then it depends on your design (I don't know how to guess that). The new replication job in the scenario I had in my mind (one backup server) takes the VM from the DR site as source. Once everything is done, you can revert the direction again.
ihussain
Influencer
Posts: 14
Liked: never
Joined: Feb 03, 2018 10:11 pm
Full Name: I Hussain
Contact:

Re: Failback after Restore vs Quick Migration after Permanent Failover

Post by ihussain »

Yes that's correct there are 2 VBR servers. Physical at prod (for BJ and BCJ onlys) and Virtual at DR (for replicas jobs only). The virtual VBR at the DR site is hosted on the same ESXI server where the VM01_replica lies and is running from.

So after "committing" the failover (via permanent failover) are you saying to replicate this back to the vCenter on the prod site?
HannesK
Product Manager
Posts: 14840
Liked: 3086 times
Joined: Sep 01, 2014 11:46 am
Full Name: Hannes Kasparick
Location: Austria
Contact:

Re: Failback after Restore vs Quick Migration after Permanent Failover

Post by HannesK »

okay, then the backup server that is responsible for replica sounds like "the right one". From a performance perspective, I assume that you have a proxy server for replication tasks on the prod site.

I repeat my recommendation to try it out with a small machine to get used to the software (or ask somebody for help who knows your environment).
So after "committing" the failover (via permanent failover) are you saying to replicate this back to the vCenter on the prod site?
yes. As I said: I only recommend that because it seems to be the safest way for me to avoid full disk or performance issues in a production environment.

The "normal" way would be "failback to production" (new location because you deleted the original VM) https://helpcenter.veeam.com/docs/backu ... ml?ver=100 . But I did not want to recommend that, because you are still running snapshots. And during failback, you will run on snapshots for another 5 days. I have no idea how much free space you have and which other things might be untested in your environment. That's why I went for the "safest way".
ihussain
Influencer
Posts: 14
Liked: never
Joined: Feb 03, 2018 10:11 pm
Full Name: I Hussain
Contact:

Re: Failback after Restore vs Quick Migration after Permanent Failover

Post by ihussain »

Thanks Hannes this has been most helpful.

Disk space is plenty.
It sounds like an initial replication of VM01 from DR site to prod site (with high bandwidth WAN acceleration mode) and then planned failover is the way to go.
Post Reply

Who is online

Users browsing this forum: Semrush [Bot] and 23 guests