Using Failover for a Datacenter Move

mgratla · Post by **mgratla** » Nov 19, 2014 2:37 pm this post

This seems to be scratching a few heads around our place and so I reach out to the collective.

I also decided to shove this in the general forum because I have a feeling it's going to touch on a number of different topics. Mods, feel free to move as you see fit.

Technical info for reference, we're a vmware 5.0 U1 (vCenter 5.1a) shop. Two datacenters. Active and DR Failover. IP's are not spanned. Mirrored from the third octet. (e.g. in Production we have 10.10.50.12, so that's mirrored to 10.20.50.12. The last two octets stay the same. Third is our vlan identifier)
Here's the other one. We use a DVS within vmware. We don't use a standard vmware switch.

Right now, all of our production stuff is replicated via Veeam 7 to our failover datacenter in the usual way. Every replication job has a Network Re-Map rule (Because it's two different DVS's) and Re-IP rules

We just got told we're going to go through a datacenter move. Our primary datacenter. So good for us, we can failover into our DR site!

The even better news is that we're not failing back onto new hardware. It's being forklifted, dropped into the new place and re-cabled. Failback would normally be a cinch.

The bad news: We don't know how long that forklift and re-cable is going to take because some circuits will need to be provisioned.

So that means if we failover, we could be running off of snapshots for a couple of weeks. Maybe even longer. Maybe a month or two.

So that's not a good idea. "Permanent failover" would be the way to go, consolidate all those snapshots on a successful failover

Then comes the matter of failing back. We'd have to recreate a couple of hundred replica jobs each with a big number of Re-IP and network remap rules, map to the original VM's, sync, do a permanent failover and then kill those jobs, Find the originals, remove the machines from each jobs "exclude" field, and begin normal replication again.

That last part is going to be a huge time hog. How the heck can we make that quicker? Is there a script we can write to build the jobs as a mirror of what we have now and make this process quicker? We could run off a snapshot for a few days sure, a long weekend, but a month? Vmware isn't going to be happy, so we're seeing permanent failover as our only option.

Post by **tsightler** » Nov 19, 2014 6:10 pm this post

How many VMs are we talking about? I ask this primarily because, you say you have 100's of jobs so one thing I'd look at is if there were ways to reduce the number of jobs. But indeed I'd probably tackle this with scripting. It might be a good time to look at the infrastructure and figure out if there are organizational things you could do to make scripting this much easier.

For example (and this is just one quick thought to get the conversation going) you could perhaps use a custom attribute (or in v8 vCenter tags) to hold information about the jobs, networks, IP, etc. Create a simple script that then takes your current jobs and populates that information (job name, primary DC network, DR DC network, etc) and then have a "counter" script that uses this information to create "failback" jobs from scratch or perhaps a simple parameter (-primary2dr vs -dr2primary) to create the jobs automatically in the direction needed.

Of course you could also do the same thing to write the configuration of the jobs out to an Excel spreadsheet or something, then edit up the spreadsheet with the new info for failback and have a script that takes that as input and recreates the jobs.

Or basing your jobs off of something like folders might make the task a lot easier if you're not already using folders for other things like security. It's really hard to make sweeping recommendations about what might work without more specific details about exactly what your infrastructure looks like.

mgratla · Post by **mgratla** » Dec 17, 2014 3:26 pm this post

Great, never got the notification for this sorry.

Hundreds of jobs was wrong, it's hundreds of settings / clicks / re-ip and re-network rules. It was meant to illustrate the enormity of the task ahead

Here's our enviroment

Two datacenters, each DC has two Veeam Proxies and One "Veeam" server. All V7, V8 isn't approved yet
All VMware, and all port groups come from a DVS

Example job would be

4 VM's from DC1 to DC2

Re-IP rules for;
- 10.104.10.* to 10.108.10.*
- 10.104.20.* to 10.108.20.*

Network Remap Rule for;
(DVS1) 010-net to (DVS2) 010-net
(DVS1) 020-net to (DVS2) 020-net

Reverse Incremental
Optimize for WAN target
14 Retention points
Daily backups at 10pm
Application aware processing (Ignore failures)
Guest file system indexing

We would want a script to capture all of that information, then create a job with the settings reversed.

Normally my powershell gut tells me 'throw all this into variables' so you could just put the variables where you need them. Or throw the current jobs into a CSV. Reverse the columns for IP and network remap and then create the jobs from that new CSV file.

mgratla · Post by **mgratla** » Dec 17, 2014 7:38 pm this post

Sorry, forgot to add,

VM's start by being called VM-SERVER-A1
but when replicated, the replica has _replica appended to it, so e.g. VM-SERVER-A1_replica

The job would also need to reverse replicate from the _replica machine to the original

R&D Forums

Using Failover for a Datacenter Move

Re: Using Failover for a Datacenter Move

Re: Using Failover for a Datacenter Move

Re: Using Failover for a Datacenter Move

Who is online