Replicate VMDK larger than 2TB with hotadd

Butha · Post by **Butha** » Dec 05, 2016 4:51 am this post

Hi All,

Have a case open at the moment (#01960859) - thought I'd throw this out on the forum to see if anybody else experiences this.

Replaced a SQL VM with a new one, and the data volume now increased to be over 2TB - the size of one virtual disk is now 3TB, eager zero provisioned on NFS. ESXi 5.5 with updates. We use Netapp NFS on and storage integration.

The issue is when using hotadd on target proxy - the writer part of the job seems to "stall" - the speed either starts off fine - between 60-150MB/s - then when Total job processed value hits approx 2.7TB (The total VM is 4TB - 5 volumes) the large disk ( 3TB) drops to 2MB/s - or eventual KB's/s and after 18hours we cancel the job. Sometimes the speed on the 3TB disk starts at 3MB/s - you know trouble already and might as well cancel the job.

The first (or any additional) Active Full jobs run perfect - it's from the next incremental that the problems start.

Many different things have been tried - (Including upgrading to 9.5 - installing a brand new fresh 9.5 server as well) only workaround that ensure a successful incremental replication is to change transport mode to network mode - but this affects performance for other jobs as well.

The usual array of tests around benchmark the SAN performance has been done - but I'm of the opinion it doesn't prove anything, as the san is idling, many other jobs run at the same time at full speed - and we have not had any issues with the same data set on a 2TB volume.

We are currently working with our DBAs to attempt compression on large table within the DB to reduce the size <2TB (A feature previously only available on SQL Enterprise - but now added to SQL 2016 Standard with SP1) - but this is a long process, and would only buy a little time before the volume would have to be increased beyond 2TB again (and requiring many hours of relocating a lot of data) - I'm sure many other customers must be replicating >2TB volumes?

My question to the forum are if anybody else is successfully replicating VM's with >2TB disks using hotadd? (Perhaps add using NFS and storage snapshots)

Edit:

Just to add - backups of the VM run perfect with hotadd - it's replication only that has an issue.

B

Post by **foggy** » Dec 08, 2016 2:08 pm this post

Butha wrote:The issue is when using hotadd on target proxy - the writer part of the job seems to "stall" - the speed either starts off fine - between 60-150MB/s - then when Total job processed value hits approx 2.7TB (The total VM is 4TB - 5 volumes) the large disk ( 3TB) drops to 2MB/s - or eventual KB's/s and after 18hours we cancel the job. Sometimes the speed on the 3TB disk starts at 3MB/s - you know trouble already and might as well cancel the job.

Just to confirm, the primary bottleneck for this job is Target, right?

Butha · Post by **Butha** » Dec 12, 2016 4:01 am this post

Mornign Foggy,

I wouldn't say the bottleneck indication is relevant - we have done numerous more tests and changes with a L3 engineer, and a call is also logged with Vmware - as there is no explanation for current behavior. I'll post and update if anything is resolved.

Post by **foggy** » Dec 12, 2016 12:56 pm this post

My intent was to confirm the issue is on the target side, however, since switching to nbd helps, this is indeed the case. Haven't you tried using any other proxy VM?

Butha · Post by **Butha** » Dec 15, 2016 10:37 am this post

Many things tried,

Version upgrade from V9 to V9.5
Brand new proxies installed (OS etc)
Vmware storage controllers.

At the moment it's with Veeam 3rd level engineer, and it has been duplicated in the Veeam Labs on different storage types/vendors, and a SDK case has been opened with Vmware. It "seems" that the behavior is normal, but would be great to find out how many other clients replicate VM's with >2TB disks (SESparse format) using Hotadd. NBD is just a workaround at the moment, but it requires a dedicated proxy for replication a single VM - which seems a waste. Also changing between modes causes digest - which will take too long to complete, so we are at risk of not having restorepoints offsite of one of our most important production VM's.

So still busy with the issue.

B

Post by **foggy** » Dec 15, 2016 12:33 pm this post

Butha wrote:Also changing between modes causes digest - which will take too long to complete, so we are at risk of not having restorepoints offsite of one of our most important production VM's.

Confused by that statement, since just switching transport mode doesn't cause digests recalculation.

obroni · Post by **obroni** » Jan 22, 2017 8:29 pm this post

Chipping in, I also experience this, or at least something similar. If I replicate a 2TB+ VM with hotadd it absolutely crawls.
From memory when I last looked into this, sesparse forces hotadd to write IO in 4kb chunks vs 64kb chunks when using the old disk format (<2TB). Using NBD with sesparse returns the behaviour to 64kb IO's, but as you mention, you have to run a seperate proxy.

If you view disk IO inside the hot add proxy VM you will see IO's being written at the size of the Veeam job block setting, so its something in the ESXi IO layer (not VMFS as you get the same over NFS as well) which causes the IO's to be split.

Ideally I would like the IO from the proxy to be past down all the way to the storage without getting split, even into 64kb io's, but I think this is a design choice somewhere in the ESXi snapshot code. If there is anything Veeam can do around this, please please please look into it, as we are getting increasing numbers of these large VM's.

Also forgot to mention, when the inital replica copy runs and uses the direct NFS copy, the IO size matches the Veeam job size and gives awesome performace

, so its something around the snapshot behaviour which breaks performance.

Butha · Post by **Butha** » Jan 23, 2017 6:02 am this post

I was very glad to see you comment! - for months now it sounded like I was the "only client in the world" to experience this.

Do you have it confirmed by Veeam/Vmware about the io block size on ESXi side? It's strange that they would have spend months on troubleshooting and not know about it. At the moment there is an engineering case open with Vmware, but it's been absolutely quiet - we are using a dedicated proxy with NBD for get the replication through.

Have you tried tuning the VAAI plugin (NFS - Netapp) for esxi to be larger block size?

What i still don't understand and believe to be a "bug" is that the performance hit on the random IO is one thing - and all the tests of smaller vmdk when forcing SESParse confirms that it will run a bit slower (around 40MB/s) - which is less than <2TB disks - but the process seems to get stuck as you say - it "crawls" but never completes. Almost like some stall/leak - and when this happens there is no resource constraints on any level - other jobs running from and to the same storage runs full speed, using the same proxy - so that specific "stalled" run never speeds up again - hence why my feeling is a bug of sorts between Veeam and the ESXi layer..

I'd be keen to see if 9.5 U1 helps, or even perhaps ESXi 6.5 maybe (we ran all of this on 5.5)

B

obroni · Post by **obroni** » Jan 23, 2017 7:04 am this post

I did have a case open with Veeam, can't find the number now, but from what I remember it was quite heavily lead from my side. I think in the end, the response I got, was that this was out of Veeam's control because it was just how ESXi does snapshots with 2TB+ disks.

I'm using a Linux server for the NFS, so no VAAI for me

If I let the jobs run long enough they would eventually complete, but that would have been like 2-3 days, the speed during the job was around 0.5MB/s for me. The issue is around sync write latency, most storage arrays will probably top out around 1000-3000 iops for a single sequential 4kb sync write. That's only a 1MB or 2 a second with a 4kb IO size.

Unless Veeam can specify some sort of block size when they create the snapshot of the replica, I don't think there is anything else they can do. A hot add proxy is just a VM and doesn't have any special Veeam components to how it talks to the ESXi storage layer.

I worry that with the change in ESX 6.5 where sesparse is the default that this behaviour might effect all VM's not just those over 2TB in size.

Butha · Post by **Butha** » Jan 23, 2017 8:10 am this post

Thanks for your input. Yes the case was a similar situation my side.

I have not watched the release notes on 6.5 closely yet - but default sesparse would be a big issue. We are close to changing our the replication / DR config over to snapmirror native on the Netapps, and manually do the "once off mount" into our DR VCentre - reason is that Veeam was supposed to add replicate from storage snapshot in V8 i think - but it's yet to materialize - (only backup and copy from primary storage snapshot is available) not because it's not technically possible but for other reasons as far as I understand.

I'll keep this open as well if any update happens. The challenge is of course that any "Test" of this being fixed requires large amounts of data to be replicated which takes a while.

B

obroni · Post by **obroni** » Jan 23, 2017 10:03 am this post

Just had a bit of another flashback when thinking about it. I think its something to do with the grain size. vmfsparse uses a 64kb grain (or 128 sectors) when a snapshot is taken and sesparse appears to use 4kb (or 8 sectors). However I think I read somewhere that you can specify this grain size with sesparse when you create a disk or snapshot via the API, I think if there is a solution, its related to this setting (if exists) and getting Veeam to try and harness it.

If you're not getting anywhere with support, it might be worth throwing a few of these words in the case and see if Veeam can sort something out between VMware and themselves.

R&D Forums

Replicate VMDK larger than 2TB with hotadd

Re: Replicate VMDK larger than 2TB with hotadd

Re: Replicate VMDK larger than 2TB with hotadd

Re: Replicate VMDK larger than 2TB with hotadd

Re: Replicate VMDK larger than 2TB with hotadd

Re: Replicate VMDK larger than 2TB with hotadd

Re: Replicate VMDK larger than 2TB with hotadd

Re: Replicate VMDK larger than 2TB with hotadd

Re: Replicate VMDK larger than 2TB with hotadd

Re: Replicate VMDK larger than 2TB with hotadd

Re: Replicate VMDK larger than 2TB with hotadd

Who is online