Veeam 8 Performance Issue

VMware specific discussions

Re: Veeam 8 Performance Issue

Veeam Logoby Jack1874 » Mon Oct 19, 2015 7:39 pm

Hi chjones, how are your jobs configured?

I have ...
App_Group01 ---> using Proxy1 ---> using NFSShare01 (Limited to 4)
App_Group02 ---> using Proxy1 ---> using NFSShare02 (Limited to 4)
App_Group03 ---> using Proxy1 ---> using NFSShare03 (Limited to 4)

Doing it this way I'm now seeing between 100 Mb/s and 200 Mb/s

Do you have one big share... Limited to 12?
Jack1874
Enthusiast
 
Posts: 88
Liked: 4 times
Joined: Sat Oct 17, 2015 3:32 pm
Location: Canada
Full Name: Stuart Little

Re: Veeam 8 Performance Issue

Veeam Logoby tsightler » Mon Oct 19, 2015 8:01 pm

Jack1874 wrote:We have seen significant increases in performance when changing from CIFS to NFS. The StoreOnce is = Software Revision 3.13.0-1529.2

It doesn't completely surprise me as I've seen lots of issues with CIFS performance and StoreOnce, however, in previous cases we've been able to eventually resolve them eventually and get good performance even with CIFS. Because using the Windows NFS client is not a recommended practice it's not what we test so we don't have any baseline to compare what is reasonable performance.

However, I'm focused on the network bottleneck because that indicates that Veeam is not efficiently transferring data from the source to target data mover, but in your case those both run on the same machine so this transfer should be very fast by using shared memory and I can see in your job logs that it is doing just that. I'm at a loss to explain the high value for this bottleneck at this point in time. I'm almost wondering if it's some type of NUMA issue where the source and target data movers are running on different nodes, but that's kind of an "out there" hypothesis at the moment. Do you have any throttling configured in the "Network Traffic" settings and is the mulitple upload streams setting still configured for "5"?

I'd really like to see bottleneck stats from a single job running at a time.
tsightler
Veeam Software
 
Posts: 4768
Liked: 1737 times
Joined: Fri Jun 05, 2009 12:57 pm
Full Name: Tom Sightler

Re: Veeam 8 Performance Issue

Veeam Logoby Jack1874 » Tue Oct 20, 2015 2:52 pm

Hi tsightler, can I upload some logs for you to look at?
Jack1874
Enthusiast
 
Posts: 88
Liked: 4 times
Joined: Sat Oct 17, 2015 3:32 pm
Location: Canada
Full Name: Stuart Little

Re: Veeam 8 Performance Issue

Veeam Logoby Jack1874 » Tue Oct 20, 2015 3:32 pm

I just got this from Veeam Support. Are they suggesting that we run one job per proxy... backing up one guest (VMDK) at a time ...

The best practices have changed with Version 8 in regards to HP devices. Therefore, if possible, here is what I would like to try:

1. Remove the HP device, and re-add it, but when we re-add it, we want to add it as an HP device, and use CIFS paths, and not NFS shares.

2. In the repository settings, when re-adding the HP device, set Limit Concurrent Tasks to 1.

3. If the job proxy setting is set to "Automatic Selection," we need to change that, and specify one specific proxy.

4. In the proxy settings for the specified proxy, (Backup Infrastructure > Backup Proxies > Right click > Properties) we need to set max concurrent tasks to 1.

5. Then, go to the Blue File Menu, and go to Options, and disable parallel processing.
Jack1874
Enthusiast
 
Posts: 88
Liked: 4 times
Joined: Sat Oct 17, 2015 3:32 pm
Location: Canada
Full Name: Stuart Little

Re: Veeam 8 Performance Issue

Veeam Logoby chjones » Tue Oct 20, 2015 9:14 pm

Hi Jack,

The setup of my jobs is:

App Group 1 ---> Auto Proxy ---> Using CIFS Share Direct to StoreOnce NAS Share (Limited to 12)
App Group 2 ---> Auto Proxy ---> Using CIFS Share Direct to StoreOnce NAS Share (Limited to 12)
App Group 3 ---> Auto Proxy ---> Using CIFS Share Direct to StoreOnce NAS Share (Limited to 12)
And so on ...

My two proxies are HP BL460c Gen8 Servers with 2 x 8 Core Intel Xeon, 128GB RAM, HP FlexFabric LOM (2 x 10GbE Uplinks in a Microsoft Switch Independent Team) and HP QMH2752 (Dual Port 8Gb Fibre HBA for connection to 3PAR) and running Windows Server 2012 R2. The proxies can handle 16 concurrent tasks each, and we leave it set at that (everything will bottleneck to the lowest concurrency of your proxies and your repository so the 16 never gets reached anyway).

The StoreOnce we have added directly to Veeam as a CIFS Share. I did ask the question on these forums of whether I should remove the repository and re-add it as a HP StoreOnce in version 8 and Veeam replied, I believe it was Anton Gostev who replied, that in Version 8 adding a StoreOnce as a HP device doesn't really change anything, just sets the default settings (in v9 this will change so you can add a Catalyst Store as a repository), so we have left the repository as a CIFS Share. There was no benefit to re-adding it as it doesn't change how Veeam interacts with it.

I have the concurrency set on the CIFS repository at 12, so this is limit on the concurrency that the proxies will not go-over.

I just checked my StoreOnce's and I too and now using StoreOnce OS 3.13.0 (we upgraded a week or so ago).

All of my jobs start at the same time and I often see VMs in jobs saying they are waiting for resource availability. This is due to the max concurrency of the StoreOnce being set to 12 and the two proxies combined being able to process 32 disks. The 3PAR Storage Snapshot has already been taken so its not like this is placing an unnecessary load on the ESXi Clusters so I don't worry about seeing this. The VMs will process in due time. Once that 3PAR snapshot is done I know I have a point-in-time snap of the VMs so even if they weren't processed for 12 hours or a day I'm still backing up the VM from the point in time the 3PAR snapshot was created.

I can, sort of, understand limiting the concurrency on the storeonce to 1. This is a HP recommendation for a specific reason. To get the best possible dedupe HP recommend limiting the inbound data stream to a StoreOnce to be single objects. I'm not quite sure how this technically makes the dedupe any better other than the StoreOnce receives one full block of data and stores it and then moves on to the next one. with multiple streams it could receive parts of one stream and parts of another and whilst they may be identical it hasn't received the full block of data yet so may not make that data match. However, in my testing I have not see an overall increase in performance to the storeonce by setting the concurrency to 1. All I saw was a similar overall time to back everything up, just the VMs sat there saying "waiting for infrastructure availability" for longer.

If you have the storeonce concurrency set to 1 then there is no benefit to selecting only one proxy as if you had ten proxies you'll still only get one of them sending data at once. I like having multiple proxies for the redundancy, I don't need to edit any jobs if I have a failure of one proxy. I also split my proxies across blade enclosures so I can still backup if I lose an entire blade chassis.

In terms of networking, we have the following:

HP StoreOnce - 2 x 10GbE in Bonding Mode 4 (Ether-Channel/Port-Channel) - This means the StoreOnce can process up to 2 simultaneous data streams, both at 10Gb/sec. There is no guarantee this will always happen as you are reliant on the load balancing policy to distribute across both NICs using the Source MAC Hash, but its the best we can achieve with the limited network control on a StoreOnce.

HP Blade Enclosure - 2 x HP FlexFabric 10Gb/24-Port Modules, each module has 2 x 10GbE Uplinks, so 4 x 10GbE uplinks per enclosure. These are also setup in Port-Channels on the Cisco Switch (the two 10Gb/E ports in FlexFabric Module 1 are in Port-Channel A, and the two 10GbE ports in FlexFabric Module 2 are in Port-Channel B). We also have two shared uplink sets, one for FlexFabric Module 1, and another for Module 2. Every VLAN is added to both shared uplink sets. This is the only way HP support using an Active/Active configuration for Virtual Connect Modules. If you don't do this you end up in an Active/Standby configuration and half of your uplinks out of the enclosures are essentially useless unless you lose a virtual connect module. All traffic into the Standby module is passed from that module over to the active module and then out to the network, which is not what I personally want.

On the blade servers that act as the proxies, their Virtual Connect Profiles are setup as follows:

NIC 1 ---> Force same vlan mapping as Shared Uplink Set 1 ---> Network from Shared Uplink Set 1
NIC 2 ---> Force same vlan mapping as Shared Uplink Set 2 ---> Network from Shared Uplink Set 2

Within Windows 2012 R2 we then create a Network Team using both 10GbE network adapters and set the load balancing mode to switch independent (this is because the blade servers cannot establish a port-channel with the virtual connects or the upstream cisco switch, the virtual connects create that port-channel).

The only downside we have to this configuration is that if one of the virtual connects is offline or fails then the blade servers report a loss of one of their nics and that nic can't see its traffic to the other virtual connect modules. But I see this is a low-risk or even not a risk at all because if a virtual connect goes down in an Active/Standby configuration you are still only getting the throughput of one module anyway, and even when everything is happy and online you still only get that. Maximum performance whilst still having redundancy is what I go for.

Hope that helps ... sorry for the wall of text. It's kinda hard to explain this setup quickly :D

The last point I wanted to make was that all of my backup jobs use the Forward Incremental processing mode and we perform an Active Full every Friday. We have approximately 25TB of VMs we backup with the above setup and we achieve this quite easily inside 4-5 hours each night for incremental job runs, and usually within 14 hours for a full backup every Friday night. The only exception is our 6.4TB Exchange 2013 Mailbox Server VM which takes about 20-24 hours to complete a full backup, however some of this time is usually spent waiting for resource availability due to the concurrency limits. We have no concerns with this.

We also have the exact same setup at another datacentre (we try to ensure our datacentres are replicas of each other) and we have the exact same experience at this other site as well so it's not something we've fluked, the setup so far appears solid.
chjones
Enthusiast
 
Posts: 83
Liked: 25 times
Joined: Tue Oct 30, 2012 7:53 pm
Full Name: Chris Jones

Re: Veeam 8 Performance Issue

Veeam Logoby tsightler » Wed Oct 21, 2015 12:42 am

Jack1874 wrote:Hi tsightler, can I upload some logs for you to look at?

I have logs from your support case and I can see the behavior you are describing quite clearly, but I'm at a loss as to why. If it's OK I'd like to reach out to you in a PM. I think support is simply offering some suggestions that may help narrow down the possibilities.
tsightler
Veeam Software
 
Posts: 4768
Liked: 1737 times
Joined: Fri Jun 05, 2009 12:57 pm
Full Name: Tom Sightler

Re: Veeam 8 Performance Issue

Veeam Logoby Jack1874 » Thu Oct 22, 2015 5:50 pm

Sure.. I sent you a PM
Jack1874
Enthusiast
 
Posts: 88
Liked: 4 times
Joined: Sat Oct 17, 2015 3:32 pm
Location: Canada
Full Name: Stuart Little

Re: Veeam 8 Performance Issue

Veeam Logoby Jack1874 » Fri Oct 23, 2015 2:04 pm

hanks Chjones.. thats great info you provided.

I ran some jobs overnight ...

App_Group01 ---> using Proxy1 ---> using NFSShare01 (Limited to 1)
App_Group02 ---> using Proxy1 ---> using NFSShare02 (Limited to 1)
App_Group03 ---> using Proxy1 ---> using NFSShare03 (Limited to 1)
App_Group04 ---> using Proxy1 ---> using NFSShare04 (Limited to 1)
Max Concurrent Tasks = 16

DB_Group01 ---> using Proxy2 ---> using NFSShare05 (Limited to 1)
BD_Group02 ---> using Proxy2 ---> using NFSShare06 (Limited to 1)
DB_Group03 ---> using Proxy2 ---> using NFSShare07 (Limited to 1)
DB_Group04 ---> using Proxy2 ---> using NFSShare08 (Limited to 1)
Max Concurrent Tasks = 112

Our speeds vary from 14 MB/s on some disks .. right through to 250 MB/s

And the bottleneck is consistently saying source... at between 90 - 99 %

Do you have any kind of muti-pathing enabled for the 3PAR ?
Jack1874
Enthusiast
 
Posts: 88
Liked: 4 times
Joined: Sat Oct 17, 2015 3:32 pm
Location: Canada
Full Name: Stuart Little

Re: Veeam 8 Performance Issue

Veeam Logoby emachabert » Fri Oct 23, 2015 5:17 pm

With 3Par you should be using round robin multipathing. Under windows use MS MPIO ( look hp doc for the claim rule)

Chris did a very good , complete and clear description of the best practices when dealing with HP FlexFabric and HP 3Par.

Jack what StoreOnce model are you using ?
Veeamizing your IT since 2009/ Vanguard 2015,2016,2017
emachabert
Veeam Vanguard
 
Posts: 354
Liked: 163 times
Joined: Wed Nov 17, 2010 11:42 am
Location: France
Full Name: Eric Machabert

Re: Veeam 8 Performance Issue

Veeam Logoby chjones » Sun Oct 25, 2015 11:08 pm

Eric is correct, Round Robin is the way to go. On the Windows Blades for the proxies we have the Microsoft MPIO installed plus the claim rule set that Eric mentioned and that's almost about all you need to do. 3PAR volumes are then automatically set to Round Robin. If you create a new vSphere Datastore you present it to the proxies as well and they will be automatically set to round robin on the proxy and you can just forget about it.

With the speeds, I wouldn't worry about it too much. When I am doing daily incrementals the speeds can appear very low, but this is because you are copying bits and pieces, you aren't running just one large file copy from end to end. Many small copies will always be slower than one large continuous copy.

When I run the same setup with Active Full job runs the speeds can be anywhere between 100-200MB/sec for each job, still with multiple jobs running at the same time. The job speeds for incrementals are always slower than a full.

Do you have 8Gb fibre all the way from the 3PAR Controller Nodes to your Proxies, and do you have multiple paths? Can each HBA in your proxies see a direct path to all 4 controller nodes? Our proxies have 2 Fibre HBAs. Each port connects (via the HP Virtual Connects) to one of two HP SN3000B 16Gb Fibre Switch (even tho we only use 8Gb the 16Gb switch was more cost effective). So the HBAs in our proxies connect to each switch. On the 3par we have one port from each controller to each switch. So it looks like this:

Proxy HBA 1 --> Fibre Switch 1
Proxy HBA 2 --> Fibre Switch 2

3PAR Controller Node 0, Port 0 --> Fibre Switch 1
3PAR Controller Node 0, Port 1 --> Fibre Switch 2
3PAR Controller Node 1, Port 0 --> Fibre Switch 1
3PAR Controller Node 1, Port 1 --> Fibre Switch 2
3PAR Controller Node 2, Port 0 --> Fibre Switch 1
3PAR Controller Node 2, Port 1 --> Fibre Switch 2
3PAR Controller Node 3, Port 0 --> Fibre Switch 1
3PAR Controller Node 3, Port 1 --> Fibre Switch 2

Using this setup each proxy has 8 paths to the 3PAR and can communicate directly with every node. Do you have the proxies being able to see each controller node?

Remember that with a 3PAR you typically have the controller shelves in the middle with disk shelves (cages) above and below them. The bottom two controllers (usually 0 and 1) control the cages below them, and 2 and 3 control the cages above them. If your proxy can't talk directly to all controller nodes then one of the other nodes has to talk via the backplane to a controller node that controls the other disks, and ask it to fetch the data for you. Just want to confirm this isn't your issue?

Also, what 3PAR OS are you running and what type of disks?
chjones
Enthusiast
 
Posts: 83
Liked: 25 times
Joined: Tue Oct 30, 2012 7:53 pm
Full Name: Chris Jones

Re: Veeam 8 Performance Issue

Veeam Logoby Jack1874 » Mon Oct 26, 2015 2:21 pm

Hi Guys, thanks for the info.

I checked the proxy's and the MPIO was not set. I have now enabled it and configured it with the claim rule " 3PARdataVV " . I checked the LUNs in Device Manager as they get mounted and they are using Round Robin.

Here is the screen clip from one of the mounted snapshots. Am I right in saying he is only showing two paths?

Image
Jack1874
Enthusiast
 
Posts: 88
Liked: 4 times
Joined: Sat Oct 17, 2015 3:32 pm
Location: Canada
Full Name: Stuart Little

Re: Veeam 8 Performance Issue

Veeam Logoby Jack1874 » Mon Oct 26, 2015 2:28 pm

Hi Eric, we are running HP StoreOnce 4900.
Jack1874
Enthusiast
 
Posts: 88
Liked: 4 times
Joined: Sat Oct 17, 2015 3:32 pm
Location: Canada
Full Name: Stuart Little

Re: Veeam 8 Performance Issue

Veeam Logoby Jack1874 » Mon Oct 26, 2015 2:30 pm

Hi Chris, I will gather as much info about the connectivity to the 3PAR and get back to you.

Just to confirm... do you have 1 x \ dual port HBAs... or 2 x \ dual port HBAs in your blades?
Jack1874
Enthusiast
 
Posts: 88
Liked: 4 times
Joined: Sat Oct 17, 2015 3:32 pm
Location: Canada
Full Name: Stuart Little

Re: Veeam 8 Performance Issue

Veeam Logoby emachabert » Mon Oct 26, 2015 6:49 pm 1 person likes this post

You should see 4 active path (on a two node system), you should look to your zoning configuration and/or the cabling into the fabrics.
Did you apply the 3par best practices regarding zonning and cabling ? Like the one regarding port persostence for example?
Veeamizing your IT since 2009/ Vanguard 2015,2016,2017
emachabert
Veeam Vanguard
 
Posts: 354
Liked: 163 times
Joined: Wed Nov 17, 2010 11:42 am
Location: France
Full Name: Eric Machabert

Re: Veeam 8 Performance Issue

Veeam Logoby chjones » Mon Oct 26, 2015 6:57 pm

We have a single dual-port HBA in our blade servers. We have a single HP QMH2572 mezzanine in our Gen8 blade servers. We have it in Mezzanine Slot 2 in our blades which maps to Virtual Connect Bays 5 and 6.

Looking at your screenshot, yes, your server has two paths to the 3PAR and both are being used in Active/Active Round-Robin. Just a guess, but from your screenshot, are you using Direct-Attach to your 3PAR? It would seem you are not using Fibre Switches? Either way, I would expect to see more than 2 paths as each HBA port should have a direct path to each controller node. With a 2-Node 3PAR I'd expect at least four paths and with a 4-Node 3PAR I'd want to see at least 8 paths. Each HBA Port not having a direct path to every controller node in the 3PAR will cause performance issues.

I only had HP Techs onsite a few weeks ago as we are looking at SSD options for our two 3PAR 7400's and I discussed this very issues with them, just to get it right in my own head, and they indeed confirmed that each HBA port not having its own path to every controller will cause performance degradation. If you can provide some more info on what's in your blades, what virtual connects you have, what model 3PAR and OS and how many controller nodes and how they are connected that may help.

Btw, I am off on a 3 week holiday in a few days (my first ever trip to the USA and going to see a Lakers game ... woohoo!!!) so apologies that I likely won't respond for a while.
chjones
Enthusiast
 
Posts: 83
Liked: 25 times
Joined: Tue Oct 30, 2012 7:53 pm
Full Name: Chris Jones

PreviousNext

Return to VMware vSphere



Who is online

Users browsing this forum: Bing [Bot], gingerdazza and 38 guests