I am in the process of testing disaster recovery / business continuity and have several questions with regards to the failback process of critical infrastructure such as the vCenter and Domain Controllers.
Production Site: 2 x virtual DC's, physical Veeam B&R with local repo, virtual Veeam Proxy, ESXi Hosts, and vCenter
DR Site (Co-Lo): 1 ESXi Host, no vCenter, virtual Veeam B&R for replication initiation and DR orchestration
Networking: both sites are connected via 100Mbps Site-to-Site VPN with NATed overlapping IP subnets, which means both sides have the same IP subnets/scheme. I don't want to Re-IP and this also allows for testing at DR site without affecting Production as I can power on the same DC's on both side and they can't see each other.
Backups and Replications at DR Site: have backups of all the VMs via Backup Copy Job. Have replication of critial VMs on the standalone DR Host via Replication from Backup Job.
- Testing went fine as all I had to do was to execute the "Failover Plan" at the DR site when the Production Site got vaporized by alien ships.
- At the DR site, I have the following relevant VM's running during a disaster situation: DC's and LOB VM's.
- I am not failing over the Production vCenter because it doesn't know or care about the DR Site's stand-alone host.
- The alien ships had been destroyed by the almighty human, now I have restored my hosts, storage, and network at the Production Site.
- Site-to-Site VPN is restored and Veeam B&R at DR Site can talk to Production Hosts.
My questions (assume that I have lost everything at Production Site):
1) How do I restore Production vCenter? From the look of it and testing different scenarios, I would have to do a regular VM restore directly to a Production Host across the WAN? I know this will take time but the vCenter VM is not that big.
2) If during the Failover process, I failover the production vCenter to the DR Site; and when I Fail it Back to Production, I can't because there's no vCenter and DC's over there in order for the Veeam B&R at the DR site to authenticate and access the infrastructure. The Production Veeam Proxy is useless because again there's no vCenter and DC's. I know I can Failback directly to the host, but there are issues. See below.
3) Let's say after I have successfully completed a regular VM restore of the vCenter VM to Production over the WAN, what about the DC's that are running at the DR Site? How do I fail them back? These are now live. The vCenter at the Production Site does not have those DC's running and thus Veeam B&R cannot login to initiate FailBack. Remember that I have NATed overlapping VPN so the Production Site cannot see the DR Site unless I tell them to use the NATed IPs.
4) So, I tried to workaround with Question #2 above. From the Veeam B&R at the DR Site, I added the Production Hosts to the Veeam Backup Infrastructure, and did a Failback to different location so I could pick the Host where the original DC's live, but I kept getting the error "Object reference not set to an instance of an object."
5) Is it possible to Replicate via the vCenter infrastructure and then failback by targeting the hosts directly, especially just for the vCenter and DC VMs? This way, vCenter is not required during FailBack.
To sum up my issues:
1) Failback vCenter: Solved by restoring the vCenter VM from backup at the DR directly to a Production Host. It takes time, but the vCenter VM is not big.
2a) Failback Domain Controllers normally via vCenter: How can I FailBack the DC's running at the DR Site that is live with users? Veeam B&R will complain that it cannot login to vCenter at the Production Site because obviously the Domain Controllers are not there. I cannot do a regular VM Restore from backup because obviously they would not have the live data at the DR site.
2b) Failback Domain Controllers directly to Production Hosts: Does not work as explained in Question #4 above.