VEEAM NetApp Source Bottleneck issues and moving forward

Brandon0830 · Mar 11, 2016 9:48 pm

Hey guys,

I’m a new customer to VEEAM but I’ve done a ton of reading on both VEEAM documentation and here on the forums.

Last November/December, we ran a successful POC for VEEAM in our dev/test/stage environment and ended up purchasing the full VEEAM Availability Suite at the beginning of the year. Since that time I’ve been trying get everything set up for production but I’ve ran into significant challenges and headaches. Many of these problems I didn’t encounter until I added more jobs/VM’s into the equation.

My backup jobs have just been going way too slow and I can’t get them to finish within any reasonable backup windows. My jobs almost exclusively have the bottleneck list as Source at 99% and the processing rate is typically 20 MB/s – 100 MB/s. 100+ is pretty rare but I’ve seen it before if only one job is running for example.

My environment is:

-VMware as the Hypervisor

Production:
-NetAPP FAS 3250’s with VM’s on either 10K or 15K RPM SAS disks depending on the aggregate
-NFS 3.0 Datastores
-10GB Everywhere

Dev/Test:
-NETAPP FAS 8020’s with VM’s on SATA aggregates with Flash Pooling
-NFS 3.0 Datastores
-10GB Everywhere

VEEAM Proxy: Physical Cisco UCS B200 M3 in same Chassis as production server

I originally was trying to set my jobs up to do Incremental with synthetic fulls and transforming previous backup chains into rollbacks. I quickly learned that my repository which is a NetApp FAS2040 (using CIFS) wasn’t going to be able to handle that load and especially not with my regular jobs bottlenecking at the source and already running slow. I switched to Active Fulls and it was definately better my jobs are still really slow, still with 99% source bottlenecks. I also tried the periodic health check options and that made things really slow as well (pretty much would never finish my jobs on time if I keep that enabled).

I read VEEAM forums with similar issues and really no end resolution to them:

veeam-backup-replication-f2/netapp-sour ... 27025.html
vmware-vsphere-f24/netapp-backup-perfor ... 26635.html

I even tried switching my VMware datastores to FCOE/VMFS 10GB so it could utilize multipathing and ALUA to see if that helped. Things didn’t change at all, and I even configured the VEEAM proxy for FCOE Direct Storage Access as well (Marginal difference if at all).

I also tried creating virtual VEEAM proxies with hot add mode, one per esxi host, didn't help, used a ton of resources, and same 99% source bottleneck and slowness. Messed around a ton with dedup, compression settings, etc...no big difference either.

In the end, I suppose I’m just hitting limitations of throughput on my disks...although my production servers don’t have any apparent issues. I set up NetApp Harvest and the throughput it showed was fairly consistent with the processing rates I was getting from VEEAM considering the multiple running jobs, etc.

So where I’m at now is figuring out how to still try to make this product work for us. Luckily since VEEAM integrates with NetApp snapshots, I’m going to use Snapshots to back up my Dev/Test environment and I’ll just use VEEAM for the restore/management of those snaps. Then I’ll use VEEAM backup jobs strictly for production.

I really just need two types of jobs:

31 Restore Points – I need a month
7 Restore Points – I need a week

Where I’m struggling is deciding how often to take my Active Full backups. I’ve read a ton about people just using Forever Incremental and given my performance issues would certainly be the best scenario for me as long as it won’t cause me and the DBA’s issues with either corrupted backups or really long restore times.

I could use some suggestions on whether to set up my jobs like this:

7 Restore Points – Forever Incremental
31 Restore Points – Forever Incremental

OR

7 Restore Points – Incremental with weekly Active Full’s on Saturday
31 Restore Points – Incremental with Active Fulls on the First Saturday on the month (Maybe something else?)

Are there major downsides to having a chain of incremental backups this long, 31 days? I've read about long chains causing issues, but I've never seen at what number they mean. Some of the VM’s being backed up 31 days will be 1-2TB SQL servers that will use the 15 min. transaction log backup for point-in-time restores. So I could use some suggestions on this. Also, am I putting myself at risk by not running the periodic health checks? They just take forever to run and I'm not sure my jobs will ever finish if I turn them on.

Lastly, I need a backup copy job to get certain jobs/VM’s to my DR site where I can keep 7 Restore Points and 3 Months of Weekly backups. For this, I was considering a Backup Copy Job with 7 Restore points and using the “Keeping the following restore points for archival purposes” option set to 14 Weekly backups running every 24 hours. Thoughts?

Thanks for reading. If anyone has an insight on my NetApp source problems as well please feel free to chime in with suggestions. I've been through a ton of headaches that last month or so.

Post by **tsightler** » Mar 11, 2016 10:42 pm this post

Lots of information there but I'm still going to start with a few questions. Is all of this with v9? Are you using per-VM chains and Direct NFS/Backup from Storage Snapshot features?

Brandon0830 · Post by **Brandon0830** » Mar 11, 2016 11:01 pm this post

Yes, I'm running VEEAM 9.0 (9.0.0.902). I believe I tried the per-VM chains setting on the repository at one point in testing but like I said the source is usually the overall bottleneck. I actually couldn't get the DirectNFS working correctly to the Windows proxy but with the limitations like not being able to process VM's with VMware tools quiescence enabled I don't think that was going to be a viable solution anyways.

Post by **tsightler** » Mar 12, 2016 1:03 am this post

Brandon0830 wrote:I believe I tried the per-VM chains setting on the repository at one point in testing but like I said the source is usually the overall bottleneck.

Per-VM can still lead to faster backups, even if source is the bottleneck, mostly because it will almost always have a positive impact on merge performance, which is another area where you were having some concern.

Brandon0830 wrote:I actually couldn't get the DirectNFS working correctly to the Windows proxy but with the limitations like not being able to process VM's with VMware tools quiescence enabled I don't think that was going to be a viable solution anyways.

Are you forced to use VMware tools quiescence? Normally it would be far better to use Veeams application aware processing instead, which doesn't have any such limitations.

Post by **foggy** » Mar 21, 2016 11:30 am this post

Brandon0830 wrote:Are there major downsides to having a chain of incremental backups this long, 31 days? I've read about long chains causing issues, but I've never seen at what number they mean.

Here's the thread that will give you some insight on this.

Brandon0830 wrote:Lastly, I need a backup copy job to get certain jobs/VM’s to my DR site where I can keep 7 Restore Points and 3 Months of Weekly backups. For this, I was considering a Backup Copy Job with 7 Restore points and using the “Keeping the following restore points for archival purposes” option set to 14 Weekly backups running every 24 hours. Thoughts?

Looks like this configuration meets your requirements.

Didi7 · Post by **Didi7** » Mar 22, 2016 10:51 am this post

Brandon0830 wrote:If anyone has an insight on my NetApp source problems as well please feel free to chime in with suggestions. I've been through a ton of headaches that last month or so.

I can tell you that I am using a NetApp FAS2040 as a storage subsystem to one of our VMware Clusters (direct FC connect) and I had lots of headaches as well, because compared to another VMware-cluster using an HP MSA2040, the performance is really poor.

The VMware cluster with the NetApp FAS2040 is backed up with VBR 9.0.0.902, the VMware cluster with the HP MSA2040 is backup up with VBR 8.0.0.2048 !!! Both VBR-servers have direct attached storage with backup repositories, which is fast enough to reach transfer speeds far beyond what's possible with both storage types.

Allthough the bottleneck is still Source with 99% (NetApp FAS2040), I can achieve around 75MB/s data transfer speed using Hotadd transport mode and backing up VMs lying on a RAID-DP aggregate with SAS 15K RPM disks and without using deduplication. Compared to the transfer speed of data with VBR and Hotadd transport mode from the HP MSA2040, this is really poor performance, as I can reach much much faster transfer rates with the HP MSA2040.

After reading so many threads here concerning NetApp storage Systems, there are only 2 possibilites imo. First, NetApp storage systems are that lame or there is something that limits the speed VBR can reach on that particular storage type or manufacturer. What ever it might be!

First, I thought, it's the old entry-level NetApp FAS2040 with 7-mode Technology, that's reponsible for those poor Performance. Then I read your post here and I thought, are those modern or more powerful NetApp storage types that lame as well?

Horrible!

Didi7 · Mar 22, 2016 11:28 am

Btw, since Veeam implemented storage snapshots with NetApp storage Systems, I really wonder, if Veeam itself never had any performance issues with storage from NetApp, when they experimented with it. There are so many threads here complaining transfer speed in regards to NetApp storage, that really sounds strange.

The funny thing about NetApp storage is the fact, that a lot of people claim that the old storage system (HP, EMC or whatever), which was replaced by NetApp, was much faster than the new NetApp one and they feel like they have been thrown into the past

YouTube · Post by **MichaelCade** » Jun 04, 2016 11:20 am this post

Sorry for the delay in replying here. I would be very interesting in assisting here from a Veeam and NetApp perspective. I have a large number of UK Enterprise Veeam customers leveraging us as Veeam with Enterprise Plus as well as NetApp storage systems.

The speeds mentioned above for a 10GB network is a little concerning to start with and even our NBD approach should be faster over 10GB.

In regards to the comparison between an MSA and a NetApp FAS, there are no hidden configuration that NetApp change to hinder the performance. Again I would be more than happy to assist here and make sure the ultimate configuration is correct for the optimal performance.

Finally another useful piece of information is have you opened a Veeam support case? Our support team I believe are the best in the industry with knowledge of many storage vendors as well as deep knowledge of virtualization.

I look forward to engaging with you guys on these issues.

Sent from my A0001 using Tapatalk

Brandon0830 · Post by **Brandon0830** » Jun 06, 2016 1:52 am this post

Thanks Michael,

I've been able to get it "tolerable" with my environment but nothing is speedy, that's for sure. I was able to get DirectNFS working and I ended up switching to Reversed Incremental to my FAS 2040 VEEAM Repository. It just moves the 99% source bottleneck to 99% at the Target and obliterates the 2040 during my backup window but I just have to deal with it. The good news is I'm dropping NetApp sometime next year so hopefully I won't have to suffer with it too much longer.

Thanks,

Brandon

YouTube · Post by **MichaelCade** » Jun 06, 2016 6:28 am this post

Thanks for coming back Brandon, did you have a case open with both NetApp and Veeam?

Would be interested in that case number from both sides so I could get a good idea as to what has been checked. As I said I have some fairly large customers using a similar configuration to yours and they really have no problem at all.

Sent from my A0001 using Tapatalk

lightsout · Post by **lightsout** » Jun 06, 2016 1:02 pm this post

I've had cases open with both, but without much luck. In the end I've had both closed due to lack of time on my behalf and not really much progress either. The performance isn't what I'd want or expect, but I can live with it.

I was actually considering that Veeam is not the problem, but it is just showing it. So I wanted to get into doing some I/O benchmarks to see how the system was working overall. Once again, not gotten around to that either!

YouTube · Post by **MichaelCade** » Jun 06, 2016 3:23 pm this post

Could you let me know the case numbers please? I have access to obviously Veeam tech support but also some technical contacts in NetApp support.

lightsout · Post by **lightsout** » Jun 06, 2016 4:03 pm this post

OK seems I was wrong, I posted on this forum but never submitted a case on Veeam directly. I PM'd you the NetApp case number.

I will say I've since switched to Enterprise Plus licenses, and using SAN snapshots, I get the the same performance.

Brandon0830 · Post by **Brandon0830** » Jun 06, 2016 7:01 pm this post

Lightsout,

For me, using SAN snapshots is exactly what I wanted to avoid with VEEAM. I have tons of storage on another array away from my production workload SAN that I want to use for my backup storage repository. I really didn't want to depend on Primary/Vault snapshots taken that often for this purpose. The other problem is that then ties you into NetApp which I want to move away from anyways.

Thanks,

Brandon

lightsout · Post by **lightsout** » Jun 06, 2016 7:44 pm this post

I'm using the SAN snapshot features during backup, so rather than using VMware snapshots do it at the SAN level as it is more efficient. I was just seeing if that made a difference, it didn't!

Brandon0830 · Post by **Brandon0830** » Jun 08, 2016 2:52 am this post

Yeah, that just doesn't make any sense. You're almost better off just completely utilizing Snapshots/Vaulting/Mirroring and just using VEEAM as a restore tool (which it does really well with the integration). That's what I'm doing for my Dev/Test environment.

plandata_at · Post by **plandata_at** » Jun 08, 2016 2:31 pm this post

Hey!

We have the same issue here. Using NetApp as NFS storage vor Vmware, Performance is always good, but when combining NetApp with veeam as a backup source, speed drops extreamly. Speed between 75Mb/s up to max. 300 MB/s
Using direct storage Access from veeam backup Proxy with NFS over 10G.
Would be glad for any tips how to solve this issue.

Post by **orb** » Jun 08, 2016 3:52 pm this post

Brandon,

Your numbers doesn't add up even if I don't really really trust Veeam processions numbers

Can you summarise how your different Netapp are organised ? (Raid Group Size,Disks per aggregate)
Did you run disk statistic on each NetApp while a backup is running ? It should give you cpu usage and more important disk usage.

Do you have the NetApp vCenter integration ? Did you the tool validate the iSCSI/NFS settings ?
Stay away from CIFS repository, use iSCSI to use as much path you can.

Use multiple Repositories and Scale Out them to ensure all path are busy.
I bet your FAS2040 have GBit links (2 or 4) , I manage to get 160Mb/s with 24 SATA dissk in one big aggregate and multiple session. It is about 450GB/h if I am right (in full active mode, no synthetic)

This PDF should be your bible like any other papers written by Luca.
https://www.veeam.com/wp-veeam-backup-r ... mance.html

Oli

plandata_at · Post by **plandata_at** » Jun 14, 2016 1:48 pm this post

Hi! I have done some more investigations on the source bottleneck and found out following:
a) our backup storage is using 8TB 10k SATA disks, so IOps are limited of course of disc size. have done statistics with sysstat on netapp, and discs are about 70-80% utilized when running backup. So this is of course a "natural" limitation.

BUT:
b) I have talked to a friend working lot with netapp and he told me that netapp is restricting traffic for every thread acessing storage, so that one thread is not able to completly block storage acces.

So i tried to run three jobs at the sime time with netapp NFS volume as a backup source instead of one --> processing numbers of veeam increase abot 2,5 - 3x !
So maybe try to run several jobs at the same time, or maybe (FEATURE REQUEST) Veeam could talk with netapp and increase this by creating parallel threads vor different VMs in one job?

plandata_at · Post by **plandata_at** » Jun 14, 2016 1:59 pm this post

Brandon0830 wrote: My jobs almost exclusively have the bottleneck list as Source at 99% and the processing rate is typically 20 MB/s – 100 MB/s. 100+ is pretty rare but I’ve seen it before if only one job is running for example.
Production:
-NetAPP FAS 3250’s with VM’s on either 10K or 15K RPM SAS disks depending on the aggregate
-NFS 3.0 Datastores
-10GB Everywhere

Dev/Test:
-NETAPP FAS 8020’s with VM’s on SATA aggregates with Flash Pooling
-NFS 3.0 Datastores
-10GB Everywhere

Hi Brandon!

Have overseen your number still now. We are working with netapps for almost 8 years now. I have bigger numbers than you on small netapps with only 12 x 10K disks in RAID DP Aggregat without any flash pool. Also using direct NFS.
If you are really haveing 10G everywhere there must be some missconfiguration somewhere, your numbers are just awfull! Maybe review you configuration and take a look if realy the 10G network interface is used by backup. (on the netapp look with sysstat, and look on your veeam proxy with perfmon...) Also look at your raid group sizes, how many raid groups are uses?
And if you have dual controller, check if you access the NFS volume thourgh the controller where the aggreate is active of threw the other one sou you ar eusing internal connection. Take some deeper looks as systtat, netapp processors, if youz have activated compression, full dedups running during backup, any qos policies accidently definied and so on........

R&D Forums

VEEAM NetApp Source Bottleneck issues and moving forward

Re: VEEAM NetApp Source Bottleneck issues and moving forward

Re: VEEAM NetApp Source Bottleneck issues and moving forward

Re: VEEAM NetApp Source Bottleneck issues and moving forward

Re: VEEAM NetApp Source Bottleneck issues and moving forward

Re: VEEAM NetApp Source Bottleneck issues and moving forward

Re: VEEAM NetApp Source Bottleneck issues and moving forward

Re: VEEAM NetApp Source Bottleneck issues and moving forward

Re: VEEAM NetApp Source Bottleneck issues and moving forward

Re: VEEAM NetApp Source Bottleneck issues and moving forward

Re: VEEAM NetApp Source Bottleneck issues and moving forward

Re: VEEAM NetApp Source Bottleneck issues and moving forward

Re: VEEAM NetApp Source Bottleneck issues and moving forward

Re: VEEAM NetApp Source Bottleneck issues and moving forward

Re: VEEAM NetApp Source Bottleneck issues and moving forward

Re: VEEAM NetApp Source Bottleneck issues and moving forward

Re: VEEAM NetApp Source Bottleneck issues and moving forward

Re: VEEAM NetApp Source Bottleneck issues and moving forward

Re: VEEAM NetApp Source Bottleneck issues and moving forward

Re: VEEAM NetApp Source Bottleneck issues and moving forward

Who is online