Site-to-site backup & replication; architecture question

glennsantacruz · Mar 08, 2010 8:57 pm

Our environment consists of two sites, each running ESX4.0u1 (Enterprise), each attached to a FC SAN (separated by a small WAN link), and each site running production VMs. In addition, each site also has an off-SAN Linux backup JBOD server as a backup target.

Code: Select all

SiteA
  VM-A-1 (prod)
  VM-A-2 (prod)
  VM-A-3

SiteB
  VM-B-1 (prod)
  VM-B-2

Goal: Backup each facility "locally" to allow for "fast" restoration of data; also replicate "prod" VMs between facilities to allow for failover in case of complete site failure. For example (using the above-mentioned Sites & VMs): if VM-A-1 became corrupt (or a file were inadvertently removed), we would restore the given VM/file from site-local backups (and not incur any WAN-overhead on retrieving the missing data). On the other hand, suppose SiteB were to completely fail: we'd recover to SiteA by failing-over the replicant VM's. In that case, we should see:

Code: Select all

SiteA
  VM-A-1 (prod)
  VM-A-2 (prod)
  VM-A-3
  VM-B-1 (prod-replica)

Conversely, if SiteA were to fail completely, we would have:

Code: Select all

SiteB
  VM-B-1 (prod)
  VM-B-2
  VM-A-1 (prod-replica)
  VM-A-2 (prod-replica)

In short, our goals are to:

a) leverage vStorage API for backups, to help decrease backup window & overall load
b) provide site-local backup storage to allow for local restoration due to corruption/human error
c) provide site-to-site redundancy via replication

Could you please comment on the following approaches and let me know whether I understand the Veeam product correctly:

1) Single Veeam B&R server (located at SiteA - the "big" site) this server would:
- a) Do "local" backups via SAN mode (or Virtual Appliance)
  b) Perform replication to SiteB
  c) Perform backup of SiteB, leveraging network agents within ESX service console and Linux backup target server.
  d) requires local SQL database to be replicated regularly to SiteB
  e) SiteA failure requires buildout of single Veeam B&R in SiteB, followed by DB restore (and subsequent VM recovery)
2) Two Veeam B&R servers, one at each Site
- a) Do "local" backups via SAN mode (or Virtual Appliance)
  b) Perform replication to target Site ( B goes to A, A goes to B )
  c) replicate local SQL databases for each Veeam B&R to each site (bi-directional replication for each Veeam B&R)
  d) SiteA failure follows option 1 above
  e) SiteB failure follows option 1 above
  f) Either site failure results in two Veeam B&R servers in a single site

I suspect that either approach would work (please do help me understand if I'm wrong in thinking this); I'm really trying to design a sensible approach to our environment with this product, and want to know the benefits and drawbacks to each approach. Using the above two options (surely there are more - can you comment?), I would suspect that option 1 is preferable from a "simplistic" standpoint, but option 2 is preferred from a performance standpoint (since it doesn't require network agents).

Post by **Vitaliy S.** » Mar 09, 2010 12:31 pm this post

Hello Glenn,

I would recommend you to use 2nd scenario, in this case you would have more reliable disaster recovery approach:

a. suits equally good for both scenarios
b. you can use replica seeding option here, so the job transmitts only changes via WAN to each site (higher speed, less load on the connection link)
c. that's exactly what some of your customers do. However please note that you will be able to restore your backups (using Import button) and failover to your replicated VM to the most current state (using vSphere Client) even without Veeam SQL DB.
d. yes, you may either perform a restore or failover to site B (will be less time consuming, so it's more preferable in case of emergency).
e. the same as above
f. yes, also please note if you have backup server on each site you will be able to manage Backup deployments easily via Enterprise Manager console

Thank you!

Bunce · Post by **Bunce** » Mar 11, 2010 1:20 pm this post

Great post - the exact architecture and queries I was going to ask about. I imagine this is a pretty common requirement!

With 2(c) are you propsing to use native SQL methods of replication (eg log shipping) or a Veam replication of the entire VM..

What worries me about the Veeam backups of SQL is that while it may be 'consistent' you don't get the native restoration features offered by normal SQL backups (such as point in time restores)..

Cheers,
Andrew

Post by **Vitaliy S.** » Mar 11, 2010 2:22 pm this post

Andrew,

2(c) refers to native SQL server backup options, as you are not able to use VSS in the backup/replica job for Veeam SQL DB itself.

glennsantacruz · Mar 11, 2010 2:57 pm

For our evaluation and testing purposes, we chose a to simply backup the local SQL server for each Veeam B&R server and push those backups across the wire. Since Veeam doesn't (yet?) play well in a multiple-instance environment for SQL server, we're configured with local SQL Express instances for each B&R.

Basically, after every replication job, we run an external script that:

Code: Select all

a) issues request to local SQLExpress instance to perform full backup
b) if the full backup is produced without error, send it across the WAN to the secondary Veeam B&R server

In this fashion, each server has a current copy of the other's database. In the event of site failure, we clone the "good" B&R server (with a customization template to take care of the sysprep/newsid/etc.), and the cloned server needs nothing more than a quick SQLExpress database restoration.

Bunce · Post by **Bunce** » Mar 11, 2010 11:30 pm this post

Interesting - must admit I hadn't looked into the Veeam B&R SQL issue. I was going to run it off our primary SQL box (itself using ive log-shipping/replication to our D/R site) . Good point re 'sidding'.

Interesting to know how others handle network addressing in a failover situation at their D/R site.. Assuming none of us are using SRM (since we're using Veeam to replicate), so we need to employ other methods of re-configuring our VM's.

I guess it depends on the D/R site itself. If its an exact mirror of the primary site, the existing IP / subnets architecture could possibly be used and routing is changed downtream to point to this site.

Our D/R site however is a live site (one of our branch offices), and so already has its own unique subnet range. This offers a few alterantives such as re-ip'ing our VM's and waiting for DNS to update, or running an exact mirror in a separate network at that site and re-route trafic from all our sites to use go via our branch office..

Whats everyone else doing?

glennsantacruz · Mar 12, 2010 2:45 pm

We're in a similar environment: SiteA (production) in a /19 subnet, SiteB (D/R) in a separate /19 ; both tied together via MPLS, so the overall network is "aware" of those subnets and routing is fairly well-established. We try to leverage hostnames as much as possible, so a machine movement from one site to the next is typically handled via DNS changes ; our current D/R script involves re-ip (via DHCP or manually, as needed) to the recovered machines. For Windows machines that are not DHCP-driven, take a look at "netsh" ; you can script changes to IP information instead of manually configuring via the GUI, over and over again. Very convenient.

For Veeam replication, though, we have a separate issue -- our SiteA uses different network labels than SiteB, so we have an extra step in replication failover. We use Veeam to initiate the failover (let it do the magic it wants, snapshot-wise, etc.), then we manually shutdown the replicant to make the necessary VMX changes (network label, RAM), then restart it. Since we're manually intervening here, it's a good place to run the netsh scripts for IP changes ( just prior to shutdown / VMX change ).

We've given some consideration to "flopping" the entire /19 subnet over to SiteB, via router changes. But that gets tricky if you intend to recover anything back to the primary site (after it becomes available). In short, we found it easier (in our current size) to handle each VM by hand... I'm sure once we grow to more than 200 or so VMs, we'll begin rethinking that strategy... How does VMware Site Recovery Manager do this?

Post by **Gostev** » Mar 12, 2010 3:47 pm this post

We should definitely have support for such scenario by the time you grow to 200 VMs... this feature is on high-priority features list for me.

Bunce · Post by **Bunce** » Mar 12, 2010 8:40 pm this post

glennsantacruz wrote: For Veeam replication, though, we have a separate issue -- our SiteA uses different network labels than SiteB, so we have an extra step in replication failover.

By network labels, do you mean the VM network names listed against the adapter in a VM?

Thats another point I hadn't considered.. If the VM is bought up at the DR site and those networks don't exist then they obviously won't have connectivity.. Also quite likely they could be on a different VLAN..

From my understanding this is the type of stuff that SRM automates. You setup a number of tasks that need to be sequenced and they are carried out automatically.

glennsantacruz · Mar 12, 2010 9:44 pm

Yes, that's exactly what I'm describing: different network labels/names between the sites. Our production site has different VLANs ( and we chose to label the VM networks according to the underlying VLAN involved -- i.e. "VM Network 10", "VM Network 20", etc. ). So a server in SiteA may need label "VM Network 20" but the proper label in SiteB might be "DR 10-25".

I'm not familiar with SRM from a technical perspective, but I do understand a little regarding their marketing - sounds like this is the very product to solve these type of issues. Alternatively, Veeam could support property maps in replication jobs.... and that would be very nice (hint, hint, nudge, nudge, wink, wink) to get not only network labels but also VMDK datastore locations, RAM sizing, etc...

R&D Forums

Site-to-site backup & replication; architecture question

Re: Site-to-site backup & replication; architecture question

Re: Site-to-site backup & replication; architecture question

Re: Site-to-site backup & replication; architecture question

Re: Site-to-site backup & replication; architecture question

Re: Site-to-site backup & replication; architecture question

Re: Site-to-site backup & replication; architecture question

Re: Site-to-site backup & replication; architecture question

Re: Site-to-site backup & replication; architecture question

Re: Site-to-site backup & replication; architecture question

Who is online