Replication Destination Datastore performance

lobo519 · Post by **lobo519** » Nov 09, 2012 2:58 pm this post

I am trying to control the IOPS generated on our destination datastore for our replication jobs. Right now they are generating about 5000 IOPS and its killing the datastore. I would normally just use vCenter to limit the IOPS under resource allocation but because the drives are being added and removed constantly (Hotadd) this doesn't work.

I tried reducing the CPU shares which had no effect. (No much CPU usage anyway).

Any idea as to how I can control this?

We are currently on and aging MD3000i and the replication datastore is a separate RAID 10 so it doesn't affect other VMs. We are looking to replace the MD with an Equallogic and even though the IO size is very small - I think the replication jobs will still impact the other VMs on the array.

Nov 09, 2012 3:54 pm

Force the target proxies to network mode. This will likely have no major performance impact but will probably reduce the I/O load significantly. Let us know if that works.

lobo519 · Post by **lobo519** » Nov 09, 2012 7:28 pm this post

Wow - Seems to work great! Thanks!

Performance seems about the same and I/O down to like 200 - 400

Why the difference?

Post by **tsightler** » Nov 09, 2012 8:39 pm this post

That is still being researched by R&D, but for some reason hotadd mode has the behavior on at least some configurations. There have been other threads on the forum that talk about this same thing. For now the easiest workaround is to use network mode. Actually, in some cases we see performance improvement due to the lower overhead of the target I/O and the decrease in setup time that's typically required on the proxy for hotadd.

DaxUK · Post by **DaxUK** » Feb 04, 2013 9:48 am this post

Hi,

I have had a search on the forums and couldn't find anything on this, apologies if this is covered before.

We are seeing abnormally high IOP requirements on our SAN at our DR site when using replication of a number of VM's from our live environment. We are using Virtual appliance mode utilizing Proxies in each site

We are seeing spikes of up to 7k IO/sec when the replication is occurring on the DR SAN. Is there anyway to reduce this?

Also we wanted to confirm the actual process of disk activity in a replication job. Obviously in a backup job writing to a Veeam repository the dedup occurs and changes are written to the repository.

However in a replication job the data is written to our SAN at our DR site but this is obviously not a Veeam repository so we wondering how the data is written to the replica target, is the disk completely rewritten each time a replication job occurs? which may explain the huge IO requirements.

Many Thanks for any help.

Dave

chrisdearden · Post by **chrisdearden** » Feb 04, 2013 10:34 am this post

Can you try and switch your target proxies to network mode - do you see the same issue?

DaxUK · Post by **DaxUK** » Feb 04, 2013 11:35 am this post

Hi,

We have now tested this and it has greatly reduced the IOPs heading to the destination SAN to 200-400 from 7000.

Is there any update on when this is likely to be fixed as this is not optimal for us as our Backup proxy in the target site has to traverse a firewall to talk to Vcenter/ESXi and we would like to avoid this for backup data.

Regards
Dave

chrisdearden · Post by **chrisdearden** » Feb 04, 2013 12:14 pm this post

Hi Dave,
Is the firewall traversal due to security policy? Assuming your Target proxy is a VM ( which it would need to be to use hotadd ) , connecting it to e network with vmkernel access is technically not a problem , but I appreciate there are often many other reasons for why this would not be implemented.

DaxUK · Post by **DaxUK** » Feb 04, 2013 12:46 pm this post

Hi,

Yup it is a security policy reason why we cant put the proxy in the same front end network as the vCenter server.

That is why we are quite keen to get hotadd working to elevate the firewall from the equation.

Regards
Dave

Post by **foggy** » Feb 05, 2013 1:53 pm this post

DaxUK wrote:Is there any update on when this is likely to be fixed as this is not optimal for us as our Backup proxy in the target site has to traverse a firewall to talk to Vcenter/ESXi and we would like to avoid this for backup data.

Still under investigation. The problem is that this issue is not 100% reproducible and more likely comes from VMware side under certain circumstances. Our R&D worked with VMware on it but did not come to a reasonable resolution so far.

It would be much appreciated if you could open a support case and provide debug logs and environment-specific information to our R&D.

There is also another thread regarding the same problem.

lars@norstat.no · Oct 30, 2013 10:47 am

I have the same problem with high IO on target SAN. I have had slow performance since upgrading to v7 so yesterday i cleaned out the database, deleted all replicas and started over. The replication of about 16TB took a few hours the speed where screaming fast and i maxed out my target SAN at about 4Gb/s. From IBM Storage manager i can see that Read Percentage is about 1.5 % and IO is about 800-900 IOPS.

Then when i try the first incremental replication on a small VM with only 40 GB disk i get a Read percentage of about 34 % and the IO rise to about 11 000 IOPS, if i try 2-3 VM's i max out on IO at about 36 000 IOPS on my DS4700. The read percentage is still about 34 %. The speed of transfer is also pretty slow.

My source SAN is humming along at about 4000 IOPS, but this is the production SAN so there is probably more going on there as well .... The source SAN is a Storwize v7000.

I'm running Vsphere 5.1 and Veeam v7 with the latest patch.

This is 100% reproducible.

Post by **foggy** » Oct 30, 2013 11:23 am this post

Have you tried switching to network mode? Am I understanding right, that you did not observe this behavior prior to v7 upgrade? What are the bottleneck stats for this job?

lars@norstat.no · Oct 30, 2013 11:53 am

Switching to network mode is not really a solution, i'm using hotadd because i have a 4Gb/s fiber link to the disaster site and only a 2Gb/s network link and even if i could get a faster network link i would not run all this traffic on my production LAN. I'm sure that IO would drop a lot if using the network, but i'm not using that option.

I had the same problem before v7 as well, the reason i started over after the upgrade was that it was even slower than before. Down from 50 MB/s proccessing rate to about 600KB/s.

When the system was working right, i think the last time was version 6.1 i saw a processing rate of about 3GB/s on my larger servers on incremental jobs using hotadd.

The bottleneck stats is not surprisingly showing
Source 4% > Proxy 25% > Network 7% > Target 95%

lars@norstat.no · Oct 30, 2013 9:32 pm

As you can see from my newest post i went back to using physical Veeam server and direct SAN so my previous has no relevance for me anymore, but hotadd need to be fixed either by you or Vmware because it's obviously broken ....

Post by **Gostev** » Oct 31, 2013 6:20 pm this post

We write data to replica disks through VMware API, so that's where the issue sits... I wonder if we can may be bypass it, unless the issue is in actual hypervisor.

superdekster · Post by **superdekster** » Nov 01, 2013 10:26 am this post

I have the same issue as lars@norstat.no - Case # 00446903. And Tech Support tell us to wait patch 2 on Veeam 7 as it will use different VDDK version.
We have 4000 MB/s performance in full replicas and ~30 MB/s in increment replicas with ~11 000 IOPS.
I hope R&D will take care about this big problem.

Post by **Vitaliy S.** » Nov 01, 2013 11:08 am this post

Aleksandr, do you use hotadd or network mode for the target proxy server?

superdekster · Post by **superdekster** » Nov 03, 2013 12:13 pm this post

Vitaliy S. wrote:Aleksandr, do you use hotadd or network mode for the target proxy server?

Vitaliy, I used hotadd for my target proxy because we have 8 Gbps FC SAN infrastructure and need good performance to do replicas. But when we detected hotadd problem with Veeam tech support I switched target proxy to network. And waiting patch 2 will solve the problem

Post by **Vitaliy S.** » Nov 03, 2013 1:24 pm this post

Thanks for the update. Did switching to network mode did your performance any better? Just curious.

superdekster · Post by **superdekster** » Nov 04, 2013 3:19 am this post

Vitaliy S. wrote:Thanks for the update. Did switching to network mode did your performance any better? Just curious.

A little bit, yes. To 40-50 MB/s

lars@norstat.no · Nov 04, 2013 12:06 pm

Think i found a possible reason to why hotadd is not working as it should, at least in my case although i haven't been able to test it yet.

In the documentation for Vsphere 5 it states:

"HotAdd cannot be used if the VMFS block size of the datastore containing the virtual machine folder for the target virtual machine does not match the VMFS block size of the datastore containing the proxy virtual machine. For example, if you back up virtual disk on a datastore with 1MB blocks, the proxy must also be on a datastore with 1MB blocks."

Now i stored my proxies on the production SAN that where created using Vsphere 5.1 and the Replication target was created before that and then upgraded to the new VMFS version 5.xx.

But there is a difference, source SAN has version 5.58 and target SAN has version 5.54, and then i found this.

http://blogs.vmware.com/vsphere/2011/07 ... mfs-5.html

"VMFS-5 upgraded from VMFS-3 continues to use the previous file block size which may be larger than the unified 1MB file block size."
"VMFS-5 upgraded from VMFS-3 continues to use 64KB sub-blocks and not new 8K sub-blocks."

This can explain the unusally high iops and poor performance. Something to look into anyway, although i have switched back to SAN/NBD mode now and deleted all my proxies.

Post by **Gostev** » Nov 04, 2013 7:44 pm this post

Hi Lars, thanks for this, I will run this by the devs.

Nov 05, 2013 3:34 pm

Hi,

Sharing experience is priceless

I am also working on a replication setup these days (HP 3Par storeserv to HP P2000) and I am using two target proxies, one using HotAdd and the other one using network mode. The source proxy is using direct SAN access.

Network mode is used on replication jobs running every hour as you don't loose time adding and removing disks.
HotAdd is used on jobs replicating once a day or 3 times a day (8 am, 12 am, 6 pm), as it has to replicate more data since replication occurs less often (If I remember correctly the documentation says hotadd is more efficient)

I was sceptical on the throughput I was having with the hotadd proxy. It was 3 times slower that network mode.
Reading this post made me look at the IOPS on the P2000 datastores receiving replicas, and guess what ...... it was topping at 12 000iops during replication, killing the array.

So I added a network card to the hotadd proxy, connected to the same network and the same vswitch as the management port of the ESXi it was running and replicating on. Doing this prevents NBD trafic going out through the physical network.

And....guess what....throughput became 3 times higher....

Thanks !

superdekster · Post by **superdekster** » Nov 10, 2013 8:33 am this post

lars@norstat.no wrote:Think i found a possible reason to why hotadd is not working as it should, at least in my case although i haven't been able to test it yet.

In the documentation for Vsphere 5 it states:

"HotAdd cannot be used if the VMFS block size of the datastore containing the virtual machine folder for the target virtual machine does not match the VMFS block size of the datastore containing the proxy virtual machine. For example, if you back up virtual disk on a datastore with 1MB blocks, the proxy must also be on a datastore with 1MB blocks."

Now i stored my proxies on the production SAN that where created using Vsphere 5.1 and the Replication target was created before that and then upgraded to the new VMFS version 5.xx.

But there is a difference, source SAN has version 5.58 and target SAN has version 5.54, and then i found this.

http://blogs.vmware.com/vsphere/2011/07 ... mfs-5.html

"VMFS-5 upgraded from VMFS-3 continues to use the previous file block size which may be larger than the unified 1MB file block size."
"VMFS-5 upgraded from VMFS-3 continues to use 64KB sub-blocks and not new 8K sub-blocks."

This can explain the unusally high iops and poor performance. Something to look into anyway, although i have switched back to SAN/NBD mode now and deleted all my proxies.

Hi, Lars. I checked your interesting idea about different VMFS versions on source and destination datastores. In my case it did't help, performance still poor. Sadly.

andrewpetre · Post by **andrewpetre** » Jan 29, 2014 12:30 am this post

I'm so glad this thread was here. I've been tearing my hair out trying to imagine why my brand new SAS array is so slow. We were getting 5-7MB/sec processing rates in replication jobs with maxed 10k/sec iops readings.

Changed the remote target proxy to Network Mode and now we're measuring 60+ MB/sec processing over the 1GB connection.

Current build 7.0.771, btw. So this is not fixed in the patch. I don't think that was in doubt, but just for reference.

R&D Forums

Replication Destination Datastore performance

Re: Replication Destination Datastore performance

Re: Replication Destination Datastore performance

Re: Replication Destination Datastore performance

[MERGED] IOPs requirement for replication target SAN

Re: IOPs requirement for replication target SAN

Re: Replication Destination Datastore performance

Re: Replication Destination Datastore performance

Re: Replication Destination Datastore performance

Re: Replication Destination Datastore performance

Re: Replication Destination Datastore performance

Re: Replication Destination Datastore performance

Re: Replication Destination Datastore performance

Re: Replication Destination Datastore performance

Re: Replication Destination Datastore performance

Re: Replication Destination Datastore performance

Re: Replication Destination Datastore performance

Re: Replication Destination Datastore performance

Re: Replication Destination Datastore performance

Re: Replication Destination Datastore performance

Re: Replication Destination Datastore performance

Re: Replication Destination Datastore performance

Re: Replication Destination Datastore performance

Re: Replication Destination Datastore performance

Re: Replication Destination Datastore performance

Who is online