Host-based backup of VMware vSphere VMs.
Post Reply
Mutant_Tractor
Novice
Posts: 9
Liked: never
Joined: Jul 29, 2013 3:43 pm
Full Name: Myles Gray
Location: Belfast, NI
Contact:

Slow performance 10GbE - Virt Applicance VM - MegaRAID SSD

Post by Mutant_Tractor »

Hi All,

Having trouble getting my head round the "Processing Rate" unit that is displayed on replication jobs.

Take for example our exchange server:

640GB size
267.9GB Transferred (2.3x).
Duration: 1:37:25
Processing Rate: 112MB/s
Bottleneck: Target

Load: Source (23%) -> Proxy (55%) -> Network (5%) -> Target (96%)

The source and target are the SAME SAN.

Spec:
Dell R720XD
LSI 9720-8i MegaRAID with 2x 256GB SSD WriteBack cache in RAID1 (LSI Cachecade v2)
Benchmarked at 800MB/s write and 80K IOPS with VMWare's io-analyzer Vapp.

The data being read comes from the back-end disks (12x 3TB - 6GBs Seagate Constellations) through the SSD cache (2x Intel 520 256GB - 6GBs SSDs).
The data being written is written directly to the SSDs then committed to back-end disks, we have 256GB of available SSD cache that is constantly being purged once committed to disk.

We have 3x Proxies - including the veeam server itself, each on a separate ESX 5.1 host. 8v-CPUs per proxy 8-16GB RAM each - all configured in virtual appliance mode and appropriate proxies for jobs selected.

How is it that our Data Processing Rate is only 112Mb/s (on a first time replication) With this setup I was expecting 500-600MB/s throughput!

My troubleshooting:
I ran 2x parallel jobs to the SAN and iftop was showing 4Gb/s networking throughput - so that's not the problem - it will roll all the way out to 8Gb/s or so (fiber IC's).
Disk performance (from my benchmarks) isn't a problem.
The proxies are only showing 40-60% cpu usage across all 8 cores and minimal RAM usage.
Jumbo frames enabled on all ESX hosts, Dell 8024F Switches and SAN (Intel X520-SR2 10GbE Fiber NIC).

Any input very welcome!
Myles
tsightler
VP, Product Management
Posts: 6012
Liked: 2843 times
Joined: Jun 05, 2009 12:57 pm
Full Name: Tom Sightler
Contact:

Re: Slow performance 10GbE - Virt Applicance VM - MegaRAID S

Post by tsightler » 2 people like this post

You might want to force network mode for the target proxy and see how this impacts the behavior. Hotadd can create a tremendous amount of I/O load on the target and is regularly the bottleneck for high speed replication. For whatever reason, NBD mode does not cause this same issue in these environments and can potentially lead to higher performance.

When you benchmarked with the IO-analyzer app what settings did you use? Did you create a large test disk similar in size to what you are actually using with Veeam? Did you perform read/write I/O concurrently to the storage?

You need to monitor the datastore performance to get an idea of what was going on. Since the bottleneck is showing the target I'd suspect the write latency to be higher than you expect. Also, the VMware I/O analyzer typically uses an Eager Zeroed Thick disk when performing testing, that means that there's no need to zero blocks prior to allocating them as all blocks are alread zeroed, however, with Veeam this function is performed during the replication as each block is allocated so 112MB/s requires 2x that many writes on the underlying storage, so even at your 800MB/s benchmark you're already down to 400MB/s for a first time replication.
Mutant_Tractor
Novice
Posts: 9
Liked: never
Joined: Jul 29, 2013 3:43 pm
Full Name: Myles Gray
Location: Belfast, NI
Contact:

Re: Slow performance 10GbE - Virt Applicance VM - MegaRAID S

Post by Mutant_Tractor »

I will try NBD mode today (though there are no 10Gb IFs on that subnet).
Just looking at our virtual appliance mode proxies - they all have E1000 NICs do these need changed to VMXNET3 to allow for 10Gb or is this a moot point as in Vapp mode the data should be going through the ESX iSCSI stack?

I ran a mixture of tests on io-analyser (4K-1M 0-100% read, 0-100% write, 0-100% Rand) - created a secondary HDD 150GB in size, the tests were done on the box alone as a benchmark so no concurrent storage usage.

Correct the disk was indeed eager zeroed, is there a way to replicate this behaviour in Veeam?
tsightler
VP, Product Management
Posts: 6012
Liked: 2843 times
Joined: Jun 05, 2009 12:57 pm
Full Name: Tom Sightler
Contact:

Re: Slow performance 10GbE - Virt Applicance VM - MegaRAID S

Post by tsightler »

OK, NBD mode probably won't help if you only have 1GbE to the management interfaces. Since your post had 10GbE in the subject I assumed you had that available throughout the solution, now I'm understanding that it's only between proxies. So in your case above you had two proxies even though this is local replication?

Changing the network settings isn't likely to help since the bottleneck shows as being 99% target. This means that the Veeam process spent 99% of it's time waiting on writes to the target storage. Network only shows 5%, which is about right for the traffic flow we're seeing assuming a 10GbE network. You might try completely disabling compression since it's mainly just adding CPU overhead on the proxy (55% CPU usage), but I'm not thinking that's likely to help much.

Your benchmark with a 150GB disk doesn't seem very similar to you actual test above. Sure, you can get 800GB writes if you writing to an already zerod VMDK, from within a VM, and not doing anything else. In the Veeam scenario we are reading from the same disk and writing to the same disk so eventually you will hit the limit of the spindles since your dataset is larger than the size of cache. Even if it wasn't it's likely that cache purging will begin to degrade performance long before the cache itself is full and you have to take into account the impact of zeroing. Typically for writeback SSD cache, once the writeback is 40% full it will begin holding data while it is flushed to disk, if it's a smart storage subsystem it might even perform sequential bypass since an initial replica will be largely sequential reads and writes anyway, caching such data can actually be slower than just committing it to disk since since latency will begin to increase as read/write concurrency goes up on the SSDs.

To answer the question regarding replicating the behavior of eager zeroing, you can do so by manually creating a replica target that uses eager zeroed disks and using it as a replica seed. I'm not really sure that this would be useful as it simply changes when and where the "waiting" happens, and will actually add an extra step as Veeam has to read to eager zeroed disk to verify it's state before the first replica.
Mutant_Tractor
Novice
Posts: 9
Liked: never
Joined: Jul 29, 2013 3:43 pm
Full Name: Myles Gray
Location: Belfast, NI
Contact:

Re: Slow performance 10GbE - Virt Applicance VM - MegaRAID S

Post by Mutant_Tractor »

Correct there are two proxies - one per ESX host that both connect to the same datastores (in an effort to distribute load across the proxies).
The 10GbE is the iSCSI dedicated network - only iSCSI traffic is transported over the 10G links.

Typically then, what sort of performance is normal for a VM's first backup?

Our replication job above - is now showing a data processing rate of 2GB/s between hourly replications - this is CBT data read from disk correct and not data transfer speed?
Is there a way to tell the speed the data is being transferred through veeam to the SAN? - SAN side using iftop I can see 1.6Gb/s per job usually - and scales linearly, telling me something is up as the SAN is accepting 3 different jobs at the same time, (from different sources of course) at 1.5-1.6Gb/s each - totalling 4.5Gb/s and is able then to write this to disk as if the job sending the data can't exceed 1.5Gb/s through some software limit - surely veeam would use all available bandwidth if it could (8-10Gb/s)?
tsightler
VP, Product Management
Posts: 6012
Liked: 2843 times
Joined: Jun 05, 2009 12:57 pm
Full Name: Tom Sightler
Contact:

Re: Slow performance 10GbE - Virt Applicance VM - MegaRAID S

Post by tsightler »

Processing rate is not a measure of transfer speed. It's a very simple formulate, <Size of VM Data>/<Time Jobs Runs>, so in your example above 640GB/97min = 112MB/s. When you see this same number for incremental jobs it's the same math 640GB/<Time>, but time will be much less since it's only transferring change blocks so "Processing Rate" will be much higher.

You might try disabling compression at the job level (I'm assuming it's enabled due to the significant reduction in data size and the relatively high proxy CPU usages) as that's one of the points that introduce latency and can limit maximum throughput and since network bandwidth between proxies is obviously not your problem you might achieve better performance this way.
Post Reply

Who is online

Users browsing this forum: Google [Bot] and 71 guests