Host-based backup of VMware vSphere VMs.
Post Reply
adapterer
Expert
Posts: 227
Liked: 46 times
Joined: Oct 12, 2015 11:24 pm
Contact:

Slow Replica Guest Performance With Open Snapshot

Post by adapterer »

Hi,

We are having issues with vSphere replica disk performance and I wanted to see what people's experiences are.

This is most definitely not a Veeam issue, it's strictly related to vSphere and is also not experienced in Hyper-V, however it does affect the 'experience' of using Veeam failover with vSphere.

Basically, a vSphere VM does not seem to be able to exceed a storage command queue depth of 1 with an open snapshot. We have tested this on iSCSI, FCoE and NFS and all exhibit the same issue. VMWare support essentially confirmed the behaviour as normal and consistent with their design expectations of redo log snapshots.

Now in our case, we had a client who complained of slow Exchange performance, but to prove the point we are using CrystalDiskMark's 4Kq32T1 for "Random 4KiB Read/Write with multi Queues & Threads"

In this first image we have a normal VM with no open snapshot:

Image

In this case we are getting 35k IOPS.. this SAN has RAM and SSD caching and so the results are full of lies, but we dont have any performance issues with no snaps. If we view ESXTOP storage adapter stats, this storage is NFS so we see the ACTV column reach high numbers and achieving good queue depth.

Now with a normal vSphere snapshot:

Image

Here, performance tanks to 4500 IOPS. If we view ESXTOP storage adapter stats here, the NFS ACTV column does not go higher than 1.

Now it's easy at this point to say 'your SAN sucks lol' but without vSphere snaps the disk performance is excellent. The long and short of it is that even though this array has 60 disks, as well as RAM and SSD caching, in the worst case scenario for read and write (without cache lies) with a queue depth of 1 we could be reduced to the performance of a single disk (160-180 IOPS). The problem is that we also need vSphere snaps to enable undo failover ability.

Has anyone been down this path? Potential workarounds at this point are:

1. Use an all-flash array for 'problem' VM guests
2. Delete snapshots for VM's that having issues - again this destroys undo failover - dont like this one
adapterer
Expert
Posts: 227
Liked: 46 times
Joined: Oct 12, 2015 11:24 pm
Contact:

Re: Slow Replica Guest Performance With Open Snapshot

Post by adapterer »

Case open 02407576
Post Reply

Who is online

Users browsing this forum: arun.kumar, Bing [Bot], Sergey Belov and 55 guests