Veeam Replication IO Sizes

obroni · Post by **obroni** » Jun 23, 2015 5:09 pm this post

Can I just put a general query out to everyone to see if they are seeing the same thing as me?

When doing replication jobs, do you seen the IO sizes on the source/target datastores happening at 128Kb IO's, regardless of the actual block size setting of the job (eg Wan, Lan, Local)?

I'm just looking at esxtop and I can see data being written in what looks to be 128kb IOs, but my replication jobs are set to LAN block size. To get that figure I'm dividing the MBWRT/s by Writes/s.

Post by **dellock6** » Jun 24, 2015 5:56 am this post

Hi Nick,
I've never done such tests to be honest, but I guess is because once Veeam sends data to the ESXi stack, it then writes data following the block size of the underlying VMFS file system, and even if a block created by Veeam is larger, it gets divided into smaller blocks before being written. No difference from other file systems when writing blocks that are larger than the cluster size... But again, no direct experience as this is as VMFS works anyway, so no real interest into it for me...

obroni · Post by **obroni** » Jun 24, 2015 10:13 am this post

Hi Luca,

I'm not 100% convinced that's the case, if I run a IOMeter in a Windows VM generating 1MB blocks, I see this get passed down all the way to esxtop. Windows 2008+ splits anything larger into 1MB blocks, but ESX itself should allow anything up to 32MB I think. I've been doing some more digging and I think Veeam is submitting the IO's as the correct size from what I can see in perfmon on the proxy, but this isn't making its way down to the ESX storage. I'm wondering if this could be potentially linked to some sort of sector misalignment with the vmdk mounting driver......I will continue to have a hunt around and report back on what I find.

This is quite an important performance factor as on a near idle HP MSA array running the above IOMeter test I get

128KB blocks = 45MB/s
1MB block = 85MB/s

Writing to our Ceph storage cluster reveals even larger difference between different block sizes.

obroni · Post by **obroni** » Jun 24, 2015 11:40 am this post

Just a quick update, I ran IOmeter directly on one of our 2008 Proxies and was seeing IO splitting as described previously. I have just in-place upgraded this VM to 2012 R2 and no longer see this behaviour, so something changed between 2008 and 2012 R2. I will kick off some replication jobs and see if I am seeing the correct IO sizes being passed down.

Post by **dellock6** » Jun 24, 2015 12:46 pm this post

Uhm, now it's becoming interesting indeed. Thanks for the updates Nick, eager to see the following results.

obroni · Post by **obroni** » Jun 24, 2015 12:51 pm this post

Replica seems to be going a little faster with the 2012 proxy, but still seeing much smaller IO sizes on the ESX side. I'm now wondering if this is due to the way Veeam handles retention points with Replicas by using snapshots. Maybe the average IO size I'm seeing in esxtop is due to the way VMware is handling IO when a VM disk has a snapshot. I will try some IOMeter tests on normal VM's with and without snapshots and see if this makes a difference.

obroni · Post by **obroni** » Jun 25, 2015 11:17 am this post

After taking a snapshot on a VM running IOMeter, I see similar results to what I do during replication jobs. So I'm pretty much convinced that snapshots are the cause of the smaller average IO seen in esxtop.

Post by **dellock6** » Jun 25, 2015 8:29 pm this post

Interesting....thanks for sharing, even if using snapshots as part of the supported way of doing things in VMware, I'm not sue how this can be solved. Maybe VVOLs in the future will be better, they have a different snapshot technology but I have no details on the advantages they could bring in terms of IO size.

obroni · Jun 26, 2015 8:52 am

Luca, slightly unrelated, but you might find this blog article I wrote interesting as well

http://www.sys-pro.co.uk/blog/2015/veea ... -patterns/

It plots disk seeks on a graph to show the difference between reverse and forward backup types.

Post by **dellock6** » Jun 26, 2015 8:14 pm this post

Lovely!! Thanks for sharing Nick.
And indeed in my paper about repository performances, I used directIO in fio to avoid flushing and have "pure" results. Even if indeed there are always multiple caches in between working at different layers. Have you thought about using "better" fs like xfs or btrfs? I always use xfs in my deployments for example... I don't feel enough confident yet to move to btrfs.

Luca

obroni · Post by **obroni** » Jun 27, 2015 7:09 pm this post

No Problem, hope you found it an interesting read.

I've just read your paper, very comprehensive, a very good read. Can I just check if the array you tested on had battery backed write back cache? The IO numbers, particularly for the the forward incrementals look a little low.If you can get your chunk size set so that a veeam block is in the same region as a full stripe, I've found you can easily get into the GB/s range. Thanks to the fact that Veeam doesn't seem to request flushes, having large amounts of ram in Linux massively helps the scheduler getting all the data into a sequential pattern, I saw a massive improvement going from 12GB to 128GB of ram. I'm not sure if Windows buffers as aggressively as this. Merges also go really fast as hopefully most of the data is still in page cache.

I did initially set the storage server up with XFS, but every couple of months I was getting soft kernel panics. I never managed to get the bottom of it before I switched to using EXT4, which has been performing without problems since. I probably do need to revisit looking at different filesystems at some point, but its working so well that I haven't managed to justify the time to look into it. I've been doing a lot of work with Ceph recently trying to get Erasure coding/Cache Tiering into a usable state, as particularly for Cloud Connect we see this as the best path forward for future expansion. We're currently using it for storing Replica's, but due to the high latency Ceph brings and without a front end cache, it has a few performance limitations.

Post by **dellock6** » Jun 29, 2015 9:33 am this post

I used a NetApp FAS2020, with a vmdk over it acting as a data disk for my repository, so indeed it has proper cache.
I've used XFS a lot and never had kernel panics to be honest, but I agree for repository you need to have something that makes you confident, so if ext4 is the solution for you, it's fine.

Finally, for the ram, actually the biggest difference is using or not v8 Update 2 with the new caching system, more than anything else.

R&D Forums

Veeam Replication IO Sizes

Re: Veeam Replication IO Sizes

Re: Veeam Replication IO Sizes

Re: Veeam Replication IO Sizes

Re: Veeam Replication IO Sizes

Re: Veeam Replication IO Sizes

Re: Veeam Replication IO Sizes

Re: Veeam Replication IO Sizes

Re: Veeam Replication IO Sizes

Re: Veeam Replication IO Sizes

Re: Veeam Replication IO Sizes

Re: Veeam Replication IO Sizes

Who is online