Discussions specific to the VMware vSphere hypervisor
Post Reply
bhagen
Enthusiast
Posts: 95
Liked: 8 times
Joined: Feb 23, 2017 10:26 pm
Contact:

Real-world replication and DR failover examples?

Post by bhagen » Apr 08, 2019 11:54 pm

We have two vmware 6.7u1 clusters; one production, and one I just built that's empty. Each are running vsan. We run VBR 9.5u4. Our goal is to use VBR to replicate 60 VMs (about 100TB) from production to "empty", permanently failover, then reverse the replication, so that our current production cluster becomes our DR cluster, and the new "empty" cluster becomes the production cluster.

Once all that is working, we then plan to shut down the DR cluster and move it to our DR building.

And the ultimate goal, of course, is the ability to spin up the DR cluster in case the production cluster (or the building that cluster is in) becomes a smoking hole.

The Veeam documentation for this scenario is pretty sparse, and very high-level...almost theoretical. I'm looking for resources that deep dive into real-world implementations of this type of setup.

If you know of a white-paper, case-study, or training, please drop a link!

HannesK
Veeam Software
Posts: 4044
Liked: 497 times
Joined: Sep 01, 2014 11:46 am
Location: Austria
Contact:

Re: Real-world replication and DR failover examples?

Post by HannesK » Apr 09, 2019 9:41 am

Hello,
not sure which information is missing in helpcenter, but in general customers use planned failover for this and schedule on the business preferences. More or less "planning downtime".

I assume that you have the same network settings everywhere, as you did not mention anything.

Are you using one VCenter? Are you doing also backups, or only replication with Veeam?

Best regards,
Hannes

bdufour
Expert
Posts: 198
Liked: 29 times
Joined: Nov 01, 2017 8:52 pm
Full Name: blake dufour
Contact:

Re: Real-world replication and DR failover examples?

Post by bdufour » Apr 09, 2019 2:52 pm

really depends on the bandwidth u have btw the sites that will handle the replication traffic. thats going to be the biggest factor. getting a full backup seed at the 'empty' site will likely help you do this in a reasonable amount of time.

bhagen
Enthusiast
Posts: 95
Liked: 8 times
Joined: Feb 23, 2017 10:26 pm
Contact:

Re: Real-world replication and DR failover examples?

Post by bhagen » Apr 25, 2019 10:28 pm

HannesK wrote:
Apr 09, 2019 9:41 am
Hello, not sure which information is missing in helpcenter, but in general customers use planned failover for this and schedule on the business preferences. More or less "planning downtime".
As I mentioned in my original post, "The Veeam documentation for this scenario is pretty sparse, and very high-level...almost theoretical. I'm looking for resources that deep dive into real-world implementations of this type of setup."
HannesK wrote:
Apr 09, 2019 9:41 am
I assume that you have the same network settings everywhere, as you did not mention anything.
Yes, networking is configured; we'll simply need to change gateways using the Veeam failover plan when we fail to the DR site.
HannesK wrote:
Apr 09, 2019 9:41 am
Are you using one VCenter? Are you doing also backups, or only replication with Veeam?
Yes, a single vcenter. We'll replicate to the DR cluster, then do backups of the replicated VMs to minimize I/O on the production cluster.

I'm still not finding any real-world examples of using Veeam to replicate a vmware environment to a DR site. No whitepapers, no Veeam case studies. I'd really like to see a few examples of this in the real world...

HannesK
Veeam Software
Posts: 4044
Liked: 497 times
Joined: Sep 01, 2014 11:46 am
Location: Austria
Contact:

Re: Real-world replication and DR failover examples?

Post by HannesK » Apr 26, 2019 9:11 am

about the Veeam documentation. It has around 100 pages in the PDF. I assume that you already tested your desired scenario with a test VM, so feel free to ask what's missing.

About the whitepapers: I guess that's because the technology is about 10 years old and works more or less like normal backup. You will also not find any deep dive whitepaper for a normal backup job.
We'll replicate to the DR cluster, then do backups of the replicated VMs to minimize I/O on the production cluster.
this is where it becomes interesting for me (not mentioned in initial post). Backing up powered off / replicated VMs. Not a good idea for the following reasons:
  • Additional license usage
  • no file-indexing possible
  • No SQL Point in Time recovery
  • No CBT
  • More complex restore mechanisms

bhagen
Enthusiast
Posts: 95
Liked: 8 times
Joined: Feb 23, 2017 10:26 pm
Contact:

Re: Real-world replication and DR failover examples?

Post by bhagen » Apr 26, 2019 5:18 pm

Those 100 pages have all the *technical* information I need; but they don't adequately answer the "why" questions...so I suppose I should have just asked the specific questions that I couldn't find answers to. :-) I'll do that now:

1. What is a "normal" or "standard" way to setup replication jobs for 60 vms: one job per vm? one job per "application tier" (exchange, sql, etc.)? one job per OS version (for better dedupe like backup jobs...or does that even play into replication)? a combination of these? some other way that I'm missing? Why would I use a particular method over another?

2. We're already running 13 nightly backup jobs that cover all 60 vms, and we're wanting to run secondary jobs to an offsite repository. That's a lot of jobs running overnight. What is a "normal" or "standard" way to schedule replication jobs when there are already backup jobs in play? Replicate during working hours? Replicate after hours, when backups are running? How badly do replication jobs affect performance of veeam jobs and/or production VMs? I'm sure a lot of this answer will be based on our RPOs, and the fact that I will be "seeding" the DR cluster while that cluster is onsite, but then doing the incremental replication jobs to that cluster once it's moved to the DR site (which is connected to our main site via L2 connection, and is therefore in the same subnet as our main site). So I'm curious about our options here, and why we would choose one over another.

That's probably enough for now. :-)

Oh...thank you for the information about backing up the replicas as opposed to the production vms. That's very valuable info...and though it's not what I was hoping to hear, at least now I know it and will have to adjust our expectations accordingly.

foggy
Veeam Software
Posts: 18295
Liked: 1569 times
Joined: Jul 11, 2011 10:22 am
Full Name: Alexander Fogelson
Contact:

Re: Real-world replication and DR failover examples?

Post by foggy » Apr 28, 2019 9:15 pm

1. You may select any of the mentioned approaches, but another typical one is based on VMs importance, where you create a separate job for the most critical VMs and allow it to finish first, then group other VMs based on that criteria. Dedupe factor doesn't play role for replication (there's even no dedupe option in the job settings).

2. This depends on whether you're considering replication from backups or production VMs. The first approach doesn't affect production VMs at all and jobs can be run during working hours, you will just need to stagger backup copy jobs with replication ones accordingly, since both use backups as a source.

skrause
Expert
Posts: 439
Liked: 91 times
Joined: Dec 08, 2014 2:58 pm
Full Name: Steve Krause
Contact:

Re: Real-world replication and DR failover examples?

Post by skrause » Apr 29, 2019 2:23 pm 1 person likes this post

On point number 1:
Since you have already categorized your workloads into 13 jobs, you could probably just use those same categorizations for your replication jobs. You can have per VM (and even virtual disk) destination settings inside the job. You can even choose to seed some VMs from backups or existing VMs while having others just do an initial full run. Make sure if you decide to seed jobs that you give them ample time to calculate digests on their first run (which may be longer than if you just pushed all the data from production depending on the VM.)

On point number 2:
It sounds like you are not implementing Backup Copy jobs and are wanting to run separate Backup jobs. If at all possible, you should use backup copy jobs for getting your data to the secondary location. This will cut down on your production impact, if you have ample bandwidth this can even be setup to sync during "business hours" when your backup jobs would not be running. If you don't want the backup jobs and replication jobs to interfere with one another, you could (after the initial seeding of the replica) "chain" the jobs to run after the backup job completes. While, in general, job chaining is not an ideal practice this seems like one where it would be useful.

If you are replicating your vCenter and want to fail it over, you will need to have that replication job setup using individual source/destination hosts directly and not the vCenter to manage the job. Otherwise you can't orchestrate the failover with a failover plan as Veeam has nothing to talk to. You may want to look into getting another vCenter license for the "DR" location depending upon your RTO and RPO SLAs as getting the vCenter up and running properly would have to be your first task in a failover situation. I think you might even be able to use the vCenter HA to do this but I have not read up a whole lot on how (or even if) it works in a multi-site configuration. The couple thousand a year in licensing costs for a second vCenter server is worth the simplification of the DR process, IMO.
Steve Krause
Veeam Certified Architect

bhagen
Enthusiast
Posts: 95
Liked: 8 times
Joined: Feb 23, 2017 10:26 pm
Contact:

Re: Real-world replication and DR failover examples?

Post by bhagen » Apr 29, 2019 6:20 pm

Thanks @foggy and @skrause!

It sounds like doing replication jobs from backup copy jobs that reside on a backup server other than our main backup server is something I should investigate, so I will.

I like the idea of grouping replication jobs by importance; I think that would also work well in the event of a full-scale failover. I'll experiment with that as well.

Good point about the vCenter appliance replication/failover. I do have another vCenter license specifically for a DR stack, but bought it so long ago that I'd forgotten about it until you mentioned it @skrause. So yes, now I need to investigate vCenter HA, and how to make that work in a failover scenario.

This will give me something to work on this week, now that I have all my vms (except for my veeam server and my vcsa's) over on our new cluster.

Thanks for the tips!

skrause
Expert
Posts: 439
Liked: 91 times
Joined: Dec 08, 2014 2:58 pm
Full Name: Steve Krause
Contact:

Re: Real-world replication and DR failover examples?

Post by skrause » Apr 29, 2019 7:00 pm

You can't run Backup Copy jobs on a different B&R server from the one that the original jobs are being run on. (I learned that one from experience) You could use a Backup Copy job's .bak files as a source for replica seeding on another server if you add the repository and import the existing backups to it. But you would want to make sure the copy job does not sync data while you are running your replica seeding.

In any situation, the best practice is to have the B&R server that is running your replication jobs and failover plans to be in a different location than your production workloads. This is so that in the case of failover, you are able to go straight to failing over your production workloads rather than having to first get Veeam back up and running.
Steve Krause
Veeam Certified Architect

bhagen
Enthusiast
Posts: 95
Liked: 8 times
Joined: Feb 23, 2017 10:26 pm
Contact:

Re: Real-world replication and DR failover examples?

Post by bhagen » Apr 29, 2019 7:18 pm

I would put my backup copy jobs on a different NAS, not run them from a different VBR server. Then run replication jobs from that NAS to the DR vmware cluster.

To your point, if my VBR server is running in the DR site, then it would be running even if my main site went down.

So:
Run the VBR server in the DR site
Backup production vms in the main site to a backup nas in the main site
Run Backup copy jobs from the main site backup nas to the DR site backup nas - no hit on production I/O
Replicate productions vms from DR site backup nas to DR site vsphere cluster - no hit on production I/O

skrause
Expert
Posts: 439
Liked: 91 times
Joined: Dec 08, 2014 2:58 pm
Full Name: Steve Krause
Contact:

Re: Real-world replication and DR failover examples?

Post by skrause » Apr 29, 2019 8:18 pm 1 person likes this post

That would work.

I personally prefer to have my replicas run (after seeding) from my production VMs as a source since the impact on workloads that are not highly transactional is usually unnoticeable. It also means that there is only one "point of failure" for replicas being current: only the replication job state matters for my immediate failover solution rather than 3 jobs (Backup, Backup Copy, Replication).

It also makes setting up the jobs going back the other direction easier after a failover because all I have to do is use the pre-failover VM as the seeded target in the job going back the other way.

I do have a very fast low-latency link between my primary and secondary data centers though, so YMMV.
Steve Krause
Veeam Certified Architect

Post Reply

Who is online

Users browsing this forum: et.sim and 15 guests