Comprehensive data protection for all workloads
Post Reply
nra1223
Novice
Posts: 3
Liked: never
Joined: Jan 13, 2011 4:49 pm
Full Name: Nick Ashbrook
Contact:

Replication High Network Utilization

Post by nra1223 »

I'll just start off by saying that I'm generally happy with the product.

Here's the setup:
I have 2 sites with an ESXi 4.1 host at each site with dedicated a RAID 10 directly attached 8 disk SAS 15K (so I/O is not a bottleneck at least according to IOMeter).

That RAID 10 is setup as a datastore. I have a VEEAM backup server on each of these hosts and present approximately half the disks (500G) as a virtual disk to the server to give me superfast backups via the virtual appliace mode (I know, no HA or DRS, small price to pay). It's working well.

The other half of the datastore is used for replication from the opposite site. I have CBT enabled on the servers I am backing up and replicating (VEEAM Backups verified this as well as seeing it in the vmdx). These machines are running on a Clariion CX500 presented as an iSCSI target via Celerra. The throughput is fine and not a bottleneck as I can push it to 900Mb/s without issue.

I/O load is low, around 100-150 at any give time throughoyt the day (each datastore can do about 3K Random)

So, I kept reading about about people getting 200+ MB/s processing on their replications and mine is only about 70MB/s. I decided to logon to my VEEAM server and watch a replication real time. What I saw was the processing speed matched the network utilization (and wasn't close to maxing out our 1G Layer 2 connection). It's basically running a full backup as far as I can see (I verified throughput in vSphere and Wireshark). I watched it copy the entire VM instead of only the changes via CBT. What am I missing?
Gostev
Chief Product Officer
Posts: 31814
Liked: 7302 times
Joined: Jan 01, 2006 1:01 am
Location: Baar, Switzerland
Contact:

Re: Replication High Network Utilization

Post by Gostev »

Using full ESX as replication target (as opposed to ESXi) would reduce the traffic in more than 3 times, because we are able to use temporary local agent running in target ESX service console. The agent gets incremental changes data, and rebuilds replica VMDK locally. This is why full ESX is the recommended target for replication over WAN.

On the other hand, with ESXi on target we can no longer leverage the service console agent, so replica VMDK rebuild happens over WAN, which increases the traffic significantly due to rollbacks (previous restore points) data handling.
nra1223
Novice
Posts: 3
Liked: never
Joined: Jan 13, 2011 4:49 pm
Full Name: Nick Ashbrook
Contact:

Re: Replication High Network Utilization

Post by nra1223 »

Thanks for the quick reply Gostev.

Are there any plans to deal with future releases of ESX in that the Service Console will no longer exist?

I ran the replication 5 minutes after a previous replication. The network utilization I saw was a consistent 70/Mbs over an 8.5 minute period for a 32G vmdk. I was a bit off on my calculation as I beleive it is about 6GB of total data transfered. There was not that amount of change made (it's a new machine that I built for testing and there are no apps). Any reason why that amount of data would need to be transferred?
Gostev
Chief Product Officer
Posts: 31814
Liked: 7302 times
Joined: Jan 01, 2006 1:01 am
Location: Baar, Switzerland
Contact:

Re: Replication High Network Utilization

Post by Gostev »

Yes, we are planning to enhance replication architecture in the next release to better support ESXi replication target.
I am not sure where this high network utilization you are observing is coming from if there were no changes to source VM disks.
m.novelli
Veeam ProPartner
Posts: 566
Liked: 103 times
Joined: Dec 29, 2009 12:48 pm
Full Name: Marco Novelli
Location: Asti - Italy
Contact:

Re: Replication High Network Utilization

Post by m.novelli »

Gostev wrote:Yes, we are planning to enhance replication architecture in the next release to better support ESXi replication target.
Are you talking about Veeam 5.0.2 , Veeam 5.1 or Veeam 6.0?
I have a big customer really complaining about slow replica to ESXi Disaster Recovery site :(

Many thanks,

Marco
nra1223
Novice
Posts: 3
Liked: never
Joined: Jan 13, 2011 4:49 pm
Full Name: Nick Ashbrook
Contact:

Re: Replication High Network Utilization

Post by nra1223 »

I'm running 5.0.1 of backup and replication. I wonder if the CBT block sizes that it is using are too large because it looks like a lot of data is being copied over. Definitely not consistent with what I would normally see in Recoverpoint.
tsightler
VP, Product Management
Posts: 6035
Liked: 2860 times
Joined: Jun 05, 2009 12:57 pm
Full Name: Tom Sightler
Contact:

Re: Replication High Network Utilization

Post by tsightler »

Yes, the block sizes default to 1MB and with WAN target mode they are 256KB, which is still too large. WAN optimization is a big win, but is really just covering up for what is really a poor replication technology. That sounds overly harsh, and what I mean by that is that Veeam's replication technology is built upon their same reverse incremental backup engine. This is a solid, reliable engine, but this technology was originally developed to run on the service console which had limited memory and resources so large block sizes helped compensate for this. Also, I suspect that Veeam probably target local replication much more than WAN replication, in which case bandwidth is generally sufficient while the service console's usable resources were minimal. There are also significant limits on what can be done when writing via ESXi since there is no agent side target. Fixing this requires an entirely new approach.

Not only that, but as virtualization has gone mainstream, VM's have grown bigger and many more active systems are now virtualized, and thus these larger blocks are an even bigger problem for Veeam's replication implementation especially in WAN situations. Veeam is well aware of this, and I'm hoping to see big improvements going forward. I have no inside information regarding new versions, but I would say Veeam 6.x is likely the target for such improvements.
Gostev
Chief Product Officer
Posts: 31814
Liked: 7302 times
Joined: Jan 01, 2006 1:01 am
Location: Baar, Switzerland
Contact:

Re: Replication High Network Utilization

Post by Gostev »

m.novelli wrote:Are you talking about Veeam 5.0.2 , Veeam 5.1 or Veeam 6.0?
This release has only codename at this time, no version number or timelines I could share publically, except that the release will happen somewhere in 2011.
m.novelli wrote:I have a big customer really complaining about slow replica to ESXi Disaster Recovery site :(
Tom is 100% correct above. I would recommend using full ESX as replication target for now when replicating over slow WAN links. ESXi is fine as target in case of local replication (over 1Gb/100Mb), but for remote replication you want traffic compression and target agent, which are only possible with full ESX in the target site.
hagepat
Veeam ProPartner
Posts: 15
Liked: never
Joined: Aug 31, 2010 1:17 pm
Contact:

Re: Replication High Network Utilization

Post by hagepat »

Hi,

Is it not possible with the vMa (vSphere Management Assistant) to run temporary local agent ? That should speed up replica.
I mean there are other hardware vendors with the need of agent. For instance APC PowerChute Network Shutdown. This agent is installed on the vMa and makes connections to ESXi host possible. Is it not possible for development team to create a construction like this?
J1mbo
Veteran
Posts: 261
Liked: 29 times
Joined: May 03, 2011 12:51 pm
Full Name: James Pearce
Contact:

Re: Replication High Network Utilization

Post by J1mbo »

Gostev wrote:for remote replication you want traffic compression and target agent, which are only possible with full ESX in the target site.
Hi Gostev, could you expand on this? I may have misunderstood this as I thought there was no compression for replication (see here, "deduplication and compression are not used for replicated VMs"). Many thanks!
Vitaliy S.
VP, Product Management
Posts: 27377
Liked: 2800 times
Joined: Mar 30, 2009 9:13 am
Full Name: Vitaliy Safarov
Contact:

Re: Replication High Network Utilization

Post by Vitaliy S. »

Hi James,

That's true, but Anton was referring to the second part of the sentence you've quoted, meaning that rollbacks (restore points) are compressed and deduped.

Thanks.
tsightler
VP, Product Management
Posts: 6035
Liked: 2860 times
Joined: Jun 05, 2009 12:57 pm
Full Name: Tom Sightler
Contact:

Re: Replication High Network Utilization

Post by tsightler »

Also, if you are using full ESX there is some simple network level compression between the Veeam server and the ESX service console agent. Not to mention that fact that you save bandwidth since the read/write/write cycle required for writing the rollback happens locally on the ESX server rather than through the ESXi management agent.
J1mbo
Veteran
Posts: 261
Liked: 29 times
Joined: May 03, 2011 12:51 pm
Full Name: James Pearce
Contact:

Re: Replication High Network Utilization

Post by J1mbo »

Thanks for the replies. So where Veeam B&R is running at the remote site, is there a difference in WAN traffic between replicating TO ESX vs. ESXi?
tsightler
VP, Product Management
Posts: 6035
Liked: 2860 times
Joined: Jun 05, 2009 12:57 pm
Full Name: Tom Sightler
Contact:

Re: Replication High Network Utilization

Post by tsightler »

There should be no different if the Veeam B&R server is on the target side pulling. The best performance as to bandwidth is always push replication to full ESX, but pull replication to ESX or ESXi is the next best option, however, with this method there is no network compression unless using a third-party WAN accelerator.
J1mbo
Veteran
Posts: 261
Liked: 29 times
Joined: May 03, 2011 12:51 pm
Full Name: James Pearce
Contact:

Re: Replication High Network Utilization

Post by J1mbo »

Thanks, so there would be compression with push? It's odd that Support advised push wasn't a recommended configuration when I was at the design phase, indeed it seemed to be pretty unstable in that configuration hence contacting support in the first place.
Gostev
Chief Product Officer
Posts: 31814
Liked: 7302 times
Joined: Jan 01, 2006 1:01 am
Location: Baar, Switzerland
Contact:

Re: Replication High Network Utilization

Post by Gostev »

Only with ESX target.
tsightler
VP, Product Management
Posts: 6035
Liked: 2860 times
Joined: Jun 05, 2009 12:57 pm
Full Name: Tom Sightler
Contact:

Re: Replication High Network Utilization

Post by tsightler »

It's critical to distinguish between ESX and ESXi on the target. If full ESX is the target, then yes, push replication will have some minimal network compression. This is by far my favorite mode if full ESX is available as the replication target. Make sure to provide full SSH credentials for the ESX host to get the benefit of this as this will allow Veeam to push a small, run-time agent to the ESX host to efficiently communicate with the Veeam server.

However, if using ESXi on the target, then push via WAN is not recommended at all.
J1mbo
Veteran
Posts: 261
Liked: 29 times
Joined: May 03, 2011 12:51 pm
Full Name: James Pearce
Contact:

Re: Replication High Network Utilization

Post by J1mbo »

Thanks. Unless I've missed something (highly possible!) this is an area that could do with a lot more detail, for example some scenarios, in the production documentation (in my opinion).
Gostev
Chief Product Officer
Posts: 31814
Liked: 7302 times
Joined: Jan 01, 2006 1:01 am
Location: Baar, Switzerland
Contact:

Re: Replication High Network Utilization

Post by Gostev »

Since we were too lazy to write the documentation :wink: we decided to rather address this by simply eliminating any differences. All these ESX vs. ESXi and push vs. pull specifics are going away in v6, because it has universal ESXi-centric architecture, which works equally well for any host type, and does not care about backup server placement.
J1mbo
Veteran
Posts: 261
Liked: 29 times
Joined: May 03, 2011 12:51 pm
Full Name: James Pearce
Contact:

Re: Replication High Network Utilization

Post by J1mbo »

Super!
jstoner
Lurker
Posts: 1
Liked: never
Joined: Sep 24, 2011 8:43 am
Full Name: Jeramie Stoner
Contact:

Re: Replication High Network Utilization

Post by jstoner »

Wondering when v6 will be available, as well as where the "10x" replication improvement is coming from...did you implement HyperIP's technology into the product?

I've been battling poor replication performance for months now. Everything is configured according to recommendations and we've already implemented HyperIP. I'll have almost a week's worth of successfull replica jobs and then one day for no obvious reason a single VM takes days to complete...even though the VRB file is small. Currently running job has a VRB that is only 2GB and hasn't changed in size since Wednesday. Currently running job:

Processing rate: 369 KB/s
Backup mode: NBD with changed block tracking
Start time: 9/20/2011 4:54:34 PM

Previous job:

Processing rate: 52 MB/s
Backup mode: NBD with changed block tracking
Start time: 9/19/2011 6:01:23 PM
End time: 9/19/2011 8:49:51 PM

It's like it's reading the entire vmdk from the remote side or something. Please tell me v6 will fix this and that it is coming out soon.

Wouldn't be nearly as critical if I could run a local backup while this replica job trickles. Someone else has said it before, but ideally the replica jobs would simply xfer backup job data, (local backup job runs and completes>remote replica job starts and copies deltas of backup job...not affecting production servers or the ability to run another local backup) instead of having to effectively run 2 backups.
Vitaliy S.
VP, Product Management
Posts: 27377
Liked: 2800 times
Joined: Mar 30, 2009 9:13 am
Full Name: Vitaliy Safarov
Contact:

Re: Replication High Network Utilization

Post by Vitaliy S. »

jstoner wrote:Wondering when v6 will be available, as well as where the "10x" replication improvement is coming from...did you implement HyperIP's technology into the product?
No, we haven't implemented HyperIP's technology.

Here is existing topic with similar discussion, take a look: V6 Replication Accelerated by 10X?
Post Reply

Who is online

Users browsing this forum: Semrush [Bot] and 40 guests