-
- Novice
- Posts: 3
- Liked: never
- Joined: Jan 13, 2011 4:49 pm
- Full Name: Nick Ashbrook
- Contact:
Replication High Network Utilization
I'll just start off by saying that I'm generally happy with the product.
Here's the setup:
I have 2 sites with an ESXi 4.1 host at each site with dedicated a RAID 10 directly attached 8 disk SAS 15K (so I/O is not a bottleneck at least according to IOMeter).
That RAID 10 is setup as a datastore. I have a VEEAM backup server on each of these hosts and present approximately half the disks (500G) as a virtual disk to the server to give me superfast backups via the virtual appliace mode (I know, no HA or DRS, small price to pay). It's working well.
The other half of the datastore is used for replication from the opposite site. I have CBT enabled on the servers I am backing up and replicating (VEEAM Backups verified this as well as seeing it in the vmdx). These machines are running on a Clariion CX500 presented as an iSCSI target via Celerra. The throughput is fine and not a bottleneck as I can push it to 900Mb/s without issue.
I/O load is low, around 100-150 at any give time throughoyt the day (each datastore can do about 3K Random)
So, I kept reading about about people getting 200+ MB/s processing on their replications and mine is only about 70MB/s. I decided to logon to my VEEAM server and watch a replication real time. What I saw was the processing speed matched the network utilization (and wasn't close to maxing out our 1G Layer 2 connection). It's basically running a full backup as far as I can see (I verified throughput in vSphere and Wireshark). I watched it copy the entire VM instead of only the changes via CBT. What am I missing?
Here's the setup:
I have 2 sites with an ESXi 4.1 host at each site with dedicated a RAID 10 directly attached 8 disk SAS 15K (so I/O is not a bottleneck at least according to IOMeter).
That RAID 10 is setup as a datastore. I have a VEEAM backup server on each of these hosts and present approximately half the disks (500G) as a virtual disk to the server to give me superfast backups via the virtual appliace mode (I know, no HA or DRS, small price to pay). It's working well.
The other half of the datastore is used for replication from the opposite site. I have CBT enabled on the servers I am backing up and replicating (VEEAM Backups verified this as well as seeing it in the vmdx). These machines are running on a Clariion CX500 presented as an iSCSI target via Celerra. The throughput is fine and not a bottleneck as I can push it to 900Mb/s without issue.
I/O load is low, around 100-150 at any give time throughoyt the day (each datastore can do about 3K Random)
So, I kept reading about about people getting 200+ MB/s processing on their replications and mine is only about 70MB/s. I decided to logon to my VEEAM server and watch a replication real time. What I saw was the processing speed matched the network utilization (and wasn't close to maxing out our 1G Layer 2 connection). It's basically running a full backup as far as I can see (I verified throughput in vSphere and Wireshark). I watched it copy the entire VM instead of only the changes via CBT. What am I missing?
-
- Chief Product Officer
- Posts: 31814
- Liked: 7302 times
- Joined: Jan 01, 2006 1:01 am
- Location: Baar, Switzerland
- Contact:
Re: Replication High Network Utilization
Using full ESX as replication target (as opposed to ESXi) would reduce the traffic in more than 3 times, because we are able to use temporary local agent running in target ESX service console. The agent gets incremental changes data, and rebuilds replica VMDK locally. This is why full ESX is the recommended target for replication over WAN.
On the other hand, with ESXi on target we can no longer leverage the service console agent, so replica VMDK rebuild happens over WAN, which increases the traffic significantly due to rollbacks (previous restore points) data handling.
On the other hand, with ESXi on target we can no longer leverage the service console agent, so replica VMDK rebuild happens over WAN, which increases the traffic significantly due to rollbacks (previous restore points) data handling.
-
- Novice
- Posts: 3
- Liked: never
- Joined: Jan 13, 2011 4:49 pm
- Full Name: Nick Ashbrook
- Contact:
Re: Replication High Network Utilization
Thanks for the quick reply Gostev.
Are there any plans to deal with future releases of ESX in that the Service Console will no longer exist?
I ran the replication 5 minutes after a previous replication. The network utilization I saw was a consistent 70/Mbs over an 8.5 minute period for a 32G vmdk. I was a bit off on my calculation as I beleive it is about 6GB of total data transfered. There was not that amount of change made (it's a new machine that I built for testing and there are no apps). Any reason why that amount of data would need to be transferred?
Are there any plans to deal with future releases of ESX in that the Service Console will no longer exist?
I ran the replication 5 minutes after a previous replication. The network utilization I saw was a consistent 70/Mbs over an 8.5 minute period for a 32G vmdk. I was a bit off on my calculation as I beleive it is about 6GB of total data transfered. There was not that amount of change made (it's a new machine that I built for testing and there are no apps). Any reason why that amount of data would need to be transferred?
-
- Chief Product Officer
- Posts: 31814
- Liked: 7302 times
- Joined: Jan 01, 2006 1:01 am
- Location: Baar, Switzerland
- Contact:
Re: Replication High Network Utilization
Yes, we are planning to enhance replication architecture in the next release to better support ESXi replication target.
I am not sure where this high network utilization you are observing is coming from if there were no changes to source VM disks.
I am not sure where this high network utilization you are observing is coming from if there were no changes to source VM disks.
-
- Veeam ProPartner
- Posts: 566
- Liked: 103 times
- Joined: Dec 29, 2009 12:48 pm
- Full Name: Marco Novelli
- Location: Asti - Italy
- Contact:
Re: Replication High Network Utilization
Are you talking about Veeam 5.0.2 , Veeam 5.1 or Veeam 6.0?Gostev wrote:Yes, we are planning to enhance replication architecture in the next release to better support ESXi replication target.
I have a big customer really complaining about slow replica to ESXi Disaster Recovery site
Many thanks,
Marco
-
- Novice
- Posts: 3
- Liked: never
- Joined: Jan 13, 2011 4:49 pm
- Full Name: Nick Ashbrook
- Contact:
Re: Replication High Network Utilization
I'm running 5.0.1 of backup and replication. I wonder if the CBT block sizes that it is using are too large because it looks like a lot of data is being copied over. Definitely not consistent with what I would normally see in Recoverpoint.
-
- VP, Product Management
- Posts: 6035
- Liked: 2860 times
- Joined: Jun 05, 2009 12:57 pm
- Full Name: Tom Sightler
- Contact:
Re: Replication High Network Utilization
Yes, the block sizes default to 1MB and with WAN target mode they are 256KB, which is still too large. WAN optimization is a big win, but is really just covering up for what is really a poor replication technology. That sounds overly harsh, and what I mean by that is that Veeam's replication technology is built upon their same reverse incremental backup engine. This is a solid, reliable engine, but this technology was originally developed to run on the service console which had limited memory and resources so large block sizes helped compensate for this. Also, I suspect that Veeam probably target local replication much more than WAN replication, in which case bandwidth is generally sufficient while the service console's usable resources were minimal. There are also significant limits on what can be done when writing via ESXi since there is no agent side target. Fixing this requires an entirely new approach.
Not only that, but as virtualization has gone mainstream, VM's have grown bigger and many more active systems are now virtualized, and thus these larger blocks are an even bigger problem for Veeam's replication implementation especially in WAN situations. Veeam is well aware of this, and I'm hoping to see big improvements going forward. I have no inside information regarding new versions, but I would say Veeam 6.x is likely the target for such improvements.
Not only that, but as virtualization has gone mainstream, VM's have grown bigger and many more active systems are now virtualized, and thus these larger blocks are an even bigger problem for Veeam's replication implementation especially in WAN situations. Veeam is well aware of this, and I'm hoping to see big improvements going forward. I have no inside information regarding new versions, but I would say Veeam 6.x is likely the target for such improvements.
-
- Chief Product Officer
- Posts: 31814
- Liked: 7302 times
- Joined: Jan 01, 2006 1:01 am
- Location: Baar, Switzerland
- Contact:
Re: Replication High Network Utilization
This release has only codename at this time, no version number or timelines I could share publically, except that the release will happen somewhere in 2011.m.novelli wrote:Are you talking about Veeam 5.0.2 , Veeam 5.1 or Veeam 6.0?
Tom is 100% correct above. I would recommend using full ESX as replication target for now when replicating over slow WAN links. ESXi is fine as target in case of local replication (over 1Gb/100Mb), but for remote replication you want traffic compression and target agent, which are only possible with full ESX in the target site.m.novelli wrote:I have a big customer really complaining about slow replica to ESXi Disaster Recovery site
-
- Veeam ProPartner
- Posts: 15
- Liked: never
- Joined: Aug 31, 2010 1:17 pm
- Contact:
Re: Replication High Network Utilization
Hi,
Is it not possible with the vMa (vSphere Management Assistant) to run temporary local agent ? That should speed up replica.
I mean there are other hardware vendors with the need of agent. For instance APC PowerChute Network Shutdown. This agent is installed on the vMa and makes connections to ESXi host possible. Is it not possible for development team to create a construction like this?
Is it not possible with the vMa (vSphere Management Assistant) to run temporary local agent ? That should speed up replica.
I mean there are other hardware vendors with the need of agent. For instance APC PowerChute Network Shutdown. This agent is installed on the vMa and makes connections to ESXi host possible. Is it not possible for development team to create a construction like this?
-
- Veteran
- Posts: 261
- Liked: 29 times
- Joined: May 03, 2011 12:51 pm
- Full Name: James Pearce
- Contact:
Re: Replication High Network Utilization
Hi Gostev, could you expand on this? I may have misunderstood this as I thought there was no compression for replication (see here, "deduplication and compression are not used for replicated VMs"). Many thanks!Gostev wrote:for remote replication you want traffic compression and target agent, which are only possible with full ESX in the target site.
-
- VP, Product Management
- Posts: 27377
- Liked: 2800 times
- Joined: Mar 30, 2009 9:13 am
- Full Name: Vitaliy Safarov
- Contact:
Re: Replication High Network Utilization
Hi James,
That's true, but Anton was referring to the second part of the sentence you've quoted, meaning that rollbacks (restore points) are compressed and deduped.
Thanks.
That's true, but Anton was referring to the second part of the sentence you've quoted, meaning that rollbacks (restore points) are compressed and deduped.
Thanks.
-
- VP, Product Management
- Posts: 6035
- Liked: 2860 times
- Joined: Jun 05, 2009 12:57 pm
- Full Name: Tom Sightler
- Contact:
Re: Replication High Network Utilization
Also, if you are using full ESX there is some simple network level compression between the Veeam server and the ESX service console agent. Not to mention that fact that you save bandwidth since the read/write/write cycle required for writing the rollback happens locally on the ESX server rather than through the ESXi management agent.
-
- Veteran
- Posts: 261
- Liked: 29 times
- Joined: May 03, 2011 12:51 pm
- Full Name: James Pearce
- Contact:
Re: Replication High Network Utilization
Thanks for the replies. So where Veeam B&R is running at the remote site, is there a difference in WAN traffic between replicating TO ESX vs. ESXi?
-
- VP, Product Management
- Posts: 6035
- Liked: 2860 times
- Joined: Jun 05, 2009 12:57 pm
- Full Name: Tom Sightler
- Contact:
Re: Replication High Network Utilization
There should be no different if the Veeam B&R server is on the target side pulling. The best performance as to bandwidth is always push replication to full ESX, but pull replication to ESX or ESXi is the next best option, however, with this method there is no network compression unless using a third-party WAN accelerator.
-
- Veteran
- Posts: 261
- Liked: 29 times
- Joined: May 03, 2011 12:51 pm
- Full Name: James Pearce
- Contact:
Re: Replication High Network Utilization
Thanks, so there would be compression with push? It's odd that Support advised push wasn't a recommended configuration when I was at the design phase, indeed it seemed to be pretty unstable in that configuration hence contacting support in the first place.
-
- Chief Product Officer
- Posts: 31814
- Liked: 7302 times
- Joined: Jan 01, 2006 1:01 am
- Location: Baar, Switzerland
- Contact:
Re: Replication High Network Utilization
Only with ESX target.
-
- VP, Product Management
- Posts: 6035
- Liked: 2860 times
- Joined: Jun 05, 2009 12:57 pm
- Full Name: Tom Sightler
- Contact:
Re: Replication High Network Utilization
It's critical to distinguish between ESX and ESXi on the target. If full ESX is the target, then yes, push replication will have some minimal network compression. This is by far my favorite mode if full ESX is available as the replication target. Make sure to provide full SSH credentials for the ESX host to get the benefit of this as this will allow Veeam to push a small, run-time agent to the ESX host to efficiently communicate with the Veeam server.
However, if using ESXi on the target, then push via WAN is not recommended at all.
However, if using ESXi on the target, then push via WAN is not recommended at all.
-
- Veteran
- Posts: 261
- Liked: 29 times
- Joined: May 03, 2011 12:51 pm
- Full Name: James Pearce
- Contact:
Re: Replication High Network Utilization
Thanks. Unless I've missed something (highly possible!) this is an area that could do with a lot more detail, for example some scenarios, in the production documentation (in my opinion).
-
- Chief Product Officer
- Posts: 31814
- Liked: 7302 times
- Joined: Jan 01, 2006 1:01 am
- Location: Baar, Switzerland
- Contact:
Re: Replication High Network Utilization
Since we were too lazy to write the documentation we decided to rather address this by simply eliminating any differences. All these ESX vs. ESXi and push vs. pull specifics are going away in v6, because it has universal ESXi-centric architecture, which works equally well for any host type, and does not care about backup server placement.
-
- Veteran
- Posts: 261
- Liked: 29 times
- Joined: May 03, 2011 12:51 pm
- Full Name: James Pearce
- Contact:
-
- Lurker
- Posts: 1
- Liked: never
- Joined: Sep 24, 2011 8:43 am
- Full Name: Jeramie Stoner
- Contact:
Re: Replication High Network Utilization
Wondering when v6 will be available, as well as where the "10x" replication improvement is coming from...did you implement HyperIP's technology into the product?
I've been battling poor replication performance for months now. Everything is configured according to recommendations and we've already implemented HyperIP. I'll have almost a week's worth of successfull replica jobs and then one day for no obvious reason a single VM takes days to complete...even though the VRB file is small. Currently running job has a VRB that is only 2GB and hasn't changed in size since Wednesday. Currently running job:
Processing rate: 369 KB/s
Backup mode: NBD with changed block tracking
Start time: 9/20/2011 4:54:34 PM
Previous job:
Processing rate: 52 MB/s
Backup mode: NBD with changed block tracking
Start time: 9/19/2011 6:01:23 PM
End time: 9/19/2011 8:49:51 PM
It's like it's reading the entire vmdk from the remote side or something. Please tell me v6 will fix this and that it is coming out soon.
Wouldn't be nearly as critical if I could run a local backup while this replica job trickles. Someone else has said it before, but ideally the replica jobs would simply xfer backup job data, (local backup job runs and completes>remote replica job starts and copies deltas of backup job...not affecting production servers or the ability to run another local backup) instead of having to effectively run 2 backups.
I've been battling poor replication performance for months now. Everything is configured according to recommendations and we've already implemented HyperIP. I'll have almost a week's worth of successfull replica jobs and then one day for no obvious reason a single VM takes days to complete...even though the VRB file is small. Currently running job has a VRB that is only 2GB and hasn't changed in size since Wednesday. Currently running job:
Processing rate: 369 KB/s
Backup mode: NBD with changed block tracking
Start time: 9/20/2011 4:54:34 PM
Previous job:
Processing rate: 52 MB/s
Backup mode: NBD with changed block tracking
Start time: 9/19/2011 6:01:23 PM
End time: 9/19/2011 8:49:51 PM
It's like it's reading the entire vmdk from the remote side or something. Please tell me v6 will fix this and that it is coming out soon.
Wouldn't be nearly as critical if I could run a local backup while this replica job trickles. Someone else has said it before, but ideally the replica jobs would simply xfer backup job data, (local backup job runs and completes>remote replica job starts and copies deltas of backup job...not affecting production servers or the ability to run another local backup) instead of having to effectively run 2 backups.
-
- VP, Product Management
- Posts: 27377
- Liked: 2800 times
- Joined: Mar 30, 2009 9:13 am
- Full Name: Vitaliy Safarov
- Contact:
Re: Replication High Network Utilization
No, we haven't implemented HyperIP's technology.jstoner wrote:Wondering when v6 will be available, as well as where the "10x" replication improvement is coming from...did you implement HyperIP's technology into the product?
Here is existing topic with similar discussion, take a look: V6 Replication Accelerated by 10X?
Who is online
Users browsing this forum: Semrush [Bot] and 40 guests