Large replication question(Putting the feelers out)

Rangler · Post by **Rangler** » Aug 22, 2012 6:52 pm this post

Hi All,

Will be setting up a demo of Veeam B&R soon but wanted to put the feelers out incase someone had a similar setup to mine that can vouce for speed , reliability and my thinking

Currently I have a vmware 4.1 farm consisting or around 25 HP BL465c G7 blades fully populated (24cores) attached to a F-Class 3PAR storage system hosting around 450 windows vm's mix of Exchange, SQL,Citrix VDI's, DC's & F&P.
I have a 100Mbit connection to our dedicated DR site which comprises of roughly 9 HP G6 blades and a HP 6400EVA.

Ideally i would like to replicate around 20 of my core VM's which is around 40TB's in total of which possibly 200GB changes on a daily basis.
Currently we use a product called doubletake to replicate a couple of DC's, exchange boxes, SQL cluster etc etc, but this product has never been used in anger so is untested.

I've a couple of questions.
Firstly has anyone else out there got a similare setup working with veeam successfully.? If so what have been your thoughts on the whole process and have there been any issues.?
Secondly - even though i only want to replicate 20ish of my vm's i assume i still have to purchase a licence for each socket in my HP cluster.?

Thanks in advance for any thoughts, posts etc.

Post by **dellock6** » Aug 22, 2012 11:12 pm this post

Uhm, 12-core AMD processors requires Tier B licensing from Veeam, so for sure licensing 50 sockets is going to be a huge expense.
I do not know which level of license you have on your vSphere environment, but if you have DRS you can try a "Should" VM to Host affinity rule, and license only the hosts where those VMs are running on.

About the setup, it sounds good, do not have the same design at customers, but its general lines are similar to many designs I've seen.

Luca.

J1mbo · Post by **J1mbo** » Aug 23, 2012 9:12 am this post

Rangler wrote:I have a 100Mbit connection to our dedicated DR site...i would like to replicate around 20 of my core VM's...possibly 200GB changes on a daily basis....Currently we use a product called doubletake to replicate a couple of DC's, exchange boxes, SQL cluster etc etc, but this product has never been used in anger so is untested.

As said affinity rules will be needed to control costs. 100Mbps line should be easily enough for the delta rate, although seeding is probably the way to go given the underlying data size.

Doubletake has a horrific impact on disk IO at the primary in my experience; our Exchange server IOPS dropped by 70% when we replaced DT with Veeam. And of course the DT failover method is cumbersome at best. That said with Ex.2010 I'd argue it's better to use a cross-site DAG instead of either.

You might consider a LAN extension to DR site to avoid re-addressing, if it's only 20 VMs core. Also replicating DCs is never ideal unless the plan is to fail-over all DCs serving the domain. Could be better to deploy dedicated DCs at the DR site instead (of course consider the FSMO role placement).

lobo519 · Post by **lobo519** » Aug 23, 2012 1:49 pm this post

J1mbo wrote: Also replicating DCs is never ideal unless the plan is to fail-over all DCs serving the domain. Could be better to deploy dedicated DCs at the DR site instead (of course consider the FSMO role placement).

There is a lot of discussion out there about Backing up/replicating DCs and to be honest, I still don't feel that i fully understand the proper way to restore. If you had two domain controllers and replicate one that has all the FSMO roles - would you still need the second DC to properly failover/restore from a disaster?

Post by **Gostev** » Aug 23, 2012 2:01 pm this post

Yes, of course you do. Domain Controllers are designed to stop NETLOGON service if they cannot contact any replication partner during certain time after restore. I don't remember how long exactly though.

mbreitba · Post by **mbreitba** » Aug 23, 2012 5:00 pm this post

Replication targets do not impact your licensing. You only pay licensing on your source hosts. If you set up affinity rules as mentioned above, you only have to purchase licensing for the hosts that those VM's live on. Not sure on the size of those VM's, but I would guess that you'd be able to get away with licensing half of your infrastructure with a little bit of affinity wizardry.

As far as that much data, it's probably feasible, depending upon how many snaps you want to maintain. Remember that you cannot maintain more than 32 snaps of a VM. If you're looking at 200GB of daily change, that could be a boatload of data that is retained if you want to retain more than a few days data. For your purposes, I would look at taking more frequent snapshots, and doing 2-3 days retention. If you're using this for DR, you're not going to want to go back to data that is two weeks old anyway, right?

By running maybe 4-6 snaps per day, you'll reduce the network bandwidth needed by spreading that load out over the day. The downside is that you'll impact production storage 4-6 times per day doing the replication, rather than just once. If your primary storage is up to the task, I'd look at doing it that way. Other than that, I don't see anything wrong with your config, just remember to pre-seed that data, or it'll take you a month to get the first run over there.

Rangler · Post by **Rangler** » Aug 23, 2012 7:56 pm this post

Thanks for the info.

Currently just setup a test lab with 100Mbit connection going to do a full backup of our exchange servers then wait 24hours and see what the incremental size turns out to be.
Have also asked our network guy what the doubletake replicated data usage is to our DR which will give me a closer figure on what the daily delta is likely to be.

Wont bother replicating DC's as we have two at our DR plus probably another 50 or so dotted around the globe.
Its the SQL side of things which might kill us however the DBA's have seperate log shipping jobs over to DR so as long as we have an image or the SQL boxes and the DBA's have done their job right by keeping the maintenance backup plans and log shipping up to date we really should only be 15-30 minutes of lost production time.

Got a meeting with different aspects of the company to get an agreed statement of what they need to keep the business trading so i can get the VM list together then look at affinity on the various hosts.
Seeding/ initial backup shouldnt be an issue as i have a couple of spare sans i can backup to and ship over to DR.

Curious what kind of speed increase can i expect if i install a veeam proxy into a vm on the cluster.?

Cheers in advance.

Post by **dellock6** » Aug 23, 2012 8:06 pm this post

If you mean the proxy at DR site, it's not about "IF" you install it, you HAVE TO install it

Also, proxy in a VM at the replication target is even better so it can use hot add for writing into the DR storage.

Luca.

mbreitba · Post by **mbreitba** » Aug 23, 2012 9:47 pm this post

dellock6 wrote:If you mean the proxy at DR site, it's not about "IF" you install it, you HAVE TO install it
Also, proxy in a VM at the replication target is even better so it can use hot add for writing into the DR storage.

Luca.

Actually - my testing has indicated that Hot-Add on the remote side for replication is terrible. Network mode is much preferred. Hot-Add seems to exponentially increase the load on the target storage. Veeam Engineering is looking in to this, has replicated it, but has not gotten back to me on a resolution. Select network mode for the transport type, or you'll see your remote SAN crushed.

Post by **dellock6** » Aug 24, 2012 9:04 am this post

Uhm, this is strange because in my deployment hotadd, when I was able to use it (customers constraints usually) has got better results than network mode.
Keep us updated on the findings from Veeam engineers, I'm interested.

Luca.

davidb1234 · Post by **davidb1234** » Aug 24, 2012 1:07 pm this post

I just finished testing and HOTADD at DR site is Waaaaaaaaaay faster than network mode. By double. Not sure why you would say your DR san is crushed. something must not be configured correctly.

Using FC San on both sides, 100MB WAN link from prod to DR site. and gigabyte switches and network connections. HOTADD is way faster than network mode for replication. I have a Virtual Veeam proxy on both ends using hotadd mode because network mode just wasn't cutting it.

I am running the replication jobs from a veaam server in the DR site rather than production site. I keep the metadata repository on the actual virtual veeam proxy at the production site to keep the metadata close to the source data on the san.

mbreitba · Post by **mbreitba** » Aug 24, 2012 2:53 pm this post

Very interesting - Working with a 1Gbit WAN and using Hot-Add mode with a proxy on both sides caused excessive queue depth and horrible amounts of read IO. Switching to network mode completely resolved the issue for me. Must be some sort of special case.

Post by **tsightler** » Aug 24, 2012 4:32 pm this post

We are aware of the issue with hotadd causing large amounts of extra I/O on the target side, but so far this doesn't appear to be a widespread issue but specific to certain configurations, although we haven't yet determined the exact situations where it happens. I've been able to reproduce this issue in my personal lab using a simple iSCSI target, but it doesn't seem to occur when using local disks (at least I can't tell that it does). One of the most interesting things is that the I/O does not show up in the vCenter graphs, but if you monitor the storage you can see it easily.

Can you share with us your target storage configuration (model, connectivity) and also how you are monitoring? That would be useful. We do have ongoing investigation into this issue, I'm actually running some test in my lab environment right now on this very case. In the interim, network mode is a workaround for customers that see the extra I/O when using hotadd.

BTW, I've only seen this extra load with incrementals, the initial full always seems OK with hotadd. Are you seeing the same behavior?

Rangler · Post by **Rangler** » Aug 24, 2012 6:11 pm this post

Glad this has generated interest.

Just finnished chatting to our network guy and it appears that DOUBLETAKE traffic currently goes down a 1Gbit fibre to our DR site and normal day to day traffic travels across the 100Mbit connection.
So i have even more bandwidth to play with than i thought

Should know Exchange delta sizes tonight and SQL tomorow so should have better figures to do the sums with.

Regards and thanks.

bdoellefeld · Post by **bdoellefeld** » Aug 27, 2012 5:34 pm this post

tsightler wrote: Can you share with us your target storage configuration (model, connectivity) and also how you are monitoring? That would be useful. We do have ongoing investigation into this issue, I'm actually running some test in my lab environment right now on this very case. In the interim, network mode is a workaround for customers that see the extra I/O when using hotadd.

I don't see readily see another thread specifically about this problem although I've seen some references to it. (Admin: please move if needed| OP: sorry..) I ran into the same problem seen by mbreitba with huge i/o on the target side and I see it overwhelm the storage. I've had to force NDB at the target. In my case I only noticed it over time after upgrading from v5 and moving my "legacy" jobs over to new jobs. I assume I did not notice right away because v5 jobs couldn't use hotadd at the target(?)

My configuration is: Veeam proxy on both sides. EVA 6400 source to a EVA 4400 target. 30 Mbit WAN link. Monitoring using the EVA hooks in perfmon. Sam as mbreitba I see very excessive queue depth and very high read.

bdoellefeld · Post by **bdoellefeld** » Aug 27, 2012 5:39 pm this post

tsightler wrote: BTW, I've only seen this extra load with incrementals, the initial full always seems OK with hotadd. Are you seeing the same behavior?

Forgot to add... I can confirm seeing this as well. It flies on the full and the incr just cranks out I/O. An incr that takes maybe 15 minutes over NBD goes to hours using hotadd and leaves you scratching your head.

bdoellefeld · Post by **bdoellefeld** » Oct 22, 2012 4:51 pm this post

tsightler; Was curious if you ever came across any resolutions to this issue (using hotadd on target side causing abnormal amount of target i/o).

Post by **tsightler** » Oct 22, 2012 5:14 pm this post

I have worked with our development team and we have gathered many logs, and they have performed their own testing as well. Unfortunately at this point there is no resolution. This appears to be a VMware issue and not really related to Veeam. The problem still occurs with B&R 6.5, at least the latest build that I've tested. For now the only resolution is to use network mode on the target although ideally I'd like to have customers open support cases and provide their configuration and storage logs.

R&D Forums

Large replication question(Putting the feelers out)

Re: Large replication question(Putting the feelers out)

Re: Large replication question(Putting the feelers out)

Re: Large replication question(Putting the feelers out)

Re: Large replication question(Putting the feelers out)

Re: Large replication question(Putting the feelers out)

Re: Large replication question(Putting the feelers out)

Re: Large replication question(Putting the feelers out)

Re: Large replication question(Putting the feelers out)

Re: Large replication question(Putting the feelers out)

Re: Large replication question(Putting the feelers out)

Re: Large replication question(Putting the feelers out)

Re: Large replication question(Putting the feelers out)

Re: Large replication question(Putting the feelers out)

Re: Large replication question(Putting the feelers out)

Re: Large replication question(Putting the feelers out)

Re: Large replication question(Putting the feelers out)

Re: Large replication question(Putting the feelers out)

Who is online