Host-based backup of VMware vSphere VMs.
Jack1874
Enthusiast
Posts: 95
Liked: 5 times
Joined: Oct 17, 2015 3:32 pm
Full Name: Stuart Little
Location: Canada
Contact:

Veeam 8 Performance Issue

Post by Jack1874 »

Case # 01025881
Guys, we are looking for some advice here…

We believe we are having major performance issues when try to land our data to the HP StoreOnce.

Here is what our environment looks like …
Veeam 8 (Update 2)
Hypervisor: ESXi 5.5
Storage Array: HP 3PAR

Proxy Servers 1
Type: Physical
HP DL380p Gen8
RAM = 64 GB
Network = 10 GBe
O/S = Windows 2012 R2
Transport Mode = Direct SAN access
Max Concurrent Tasks = 12

Proxy Servers 2
HP BL60c Gen9
RAM = 64 GB
Network = 10 GBe
O/S = Windows 2012 R2
Transport Mode = Direct SAN access
Max Concurrent Tasks = 16

Repository
HP StoreOnce
Running NFS Shares

Jobs
Multiple jobs with small(er) amount of guests, dedicated proxy \ dedicated NFS share
Options
Parallel processing = enabled

Background
We have some very large VMs (1TB plus)
We are seeing transfer rates between 50 MB/s and 150 MB/s
The majority of the time it shows the network as the bottleneck.
We did try Jumbo Frames but found the connectivity to be too unstable.
Can anyone share their thoughts on out config, speeds, feeds etc? Does anyone have a similar configuration which they are running with success?
Feel free to ask as many questions as you like.
Shestakov
Veteran
Posts: 7328
Liked: 781 times
Joined: May 21, 2014 11:03 am
Full Name: Nikita Shestakov
Location: Prague
Contact:

Re: Veeam 8 Performance Issue

Post by Shestakov »

Hello and welcome to the forums!

Your configuration looks good in general. 150 MB/s is not a bad performance by the way.

What is your full bottleneck statistics?
What transport mode is used? It can be recognized by [nbd], [hotadd] or [Direct SAN] tag in the jobs` actions.

Thanks!
tsightler
VP, Product Management
Posts: 6009
Liked: 2842 times
Joined: Jun 05, 2009 12:57 pm
Full Name: Tom Sightler
Contact:

Re: Veeam 8 Performance Issue

Post by tsightler »

What happens if you disable one of the proxies, just as a test? Do the rates change? Does the bottleneck change?

Also, what is the actual data path. You say the "Network" is 10GbE, but is ingress and egress on the same interface? What about DirectSAN, is that iSCSI of FC over and HBA?

When you say that the performance is 150MB is that aggregate across all jobs together or are running mulitple jobs and seeing that performance on each?

Also, have you manually selected a gateway or did you leave it on "Automatic selection"?
Jack1874
Enthusiast
Posts: 95
Liked: 5 times
Joined: Oct 17, 2015 3:32 pm
Full Name: Stuart Little
Location: Canada
Contact:

Re: Veeam 8 Performance Issue

Post by Jack1874 »

Hi Shestakov, thanks for the welcome ...

The transport mode is [Direct SAN]

I'll post you the bottlneck statistics very soon... as I have some jobs just running. brb ...
Jack1874
Enthusiast
Posts: 95
Liked: 5 times
Joined: Oct 17, 2015 3:32 pm
Full Name: Stuart Little
Location: Canada
Contact:

Re: Veeam 8 Performance Issue

Post by Jack1874 »

Hi Shestakov, the 150 MB/s is the speed of some the individual copies inside the jobs.

On one of the proxies we just seen an average of 53 MB/s for the entire job. The job took 7 hours 20 mins and it tells is that the network is the bottleneck.

We processed 2 TB of data, we read 1.3 TB of data, and transferred 1.1TB of data.

There were other jobs running, but they were on the "other" proxy targeting different shares on the StoreOnce (using same 10 GBe uplink) on StoreOnce.


Thoughts?
Jack1874
Enthusiast
Posts: 95
Liked: 5 times
Joined: Oct 17, 2015 3:32 pm
Full Name: Stuart Little
Location: Canada
Contact:

Re: Veeam 8 Performance Issue

Post by Jack1874 »

Hi tsightler, we are using DirectSAN over F/C

My assumption is that we are using the 10GbE for both inbound and outbound.

I meant that we are seeing 150 MB/s that is on some individual tasks inside the job. The average is circa 53 MB/s. There are multiple jobs going to the HP StoreOnce repository but to different NFS shares.

We manually select the gateway.
Shestakov
Veteran
Posts: 7328
Liked: 781 times
Joined: May 21, 2014 11:03 am
Full Name: Nikita Shestakov
Location: Prague
Contact:

Re: Veeam 8 Performance Issue

Post by Shestakov »

Thanks for the reply Jack,

I`m also curious, what is the percentage of "Network" in the bottleneck statistics versus "Source", "Proxy" and "Target"?
It helps understand how crucial the network bottleneck is.

Note that every infrastructure has a bottleneck, it doesn`t mean that something is wrong.

By the way 150 MB/s is 1.2 Gb/s what actually looks fast. Thanks!
tsightler
VP, Product Management
Posts: 6009
Liked: 2842 times
Joined: Jun 05, 2009 12:57 pm
Full Name: Tom Sightler
Contact:

Re: Veeam 8 Performance Issue

Post by tsightler »

Jack1874 wrote:I meant that we are seeing 150 MB/s that is on some individual tasks inside the job. The average is circa 53 MB/s. There are multiple jobs going to the HP StoreOnce repository but to different NFS shares.
I hate to keep asking questions, but the fact that you are seeing network bottleneck is telling me something, I just don't know what yet, so I'm trying to understand the dataflow. Network bottleneck refers to the traffic between the Veeam source and target data movers. Source data movers are the proxies, which are reading data via FC, and then sending that data to the target data mover, which is the process writing data to the StoreOnce.

But I'm confused by your answer above. You state that you are writing to NFS shares, so that would imply that there is a Linux system involved somewhere, but I don't see any mention of this anywhere. How are you writing via NFS to the StoreOnce?
Jack1874
Enthusiast
Posts: 95
Liked: 5 times
Joined: Oct 17, 2015 3:32 pm
Full Name: Stuart Little
Location: Canada
Contact:

Re: Veeam 8 Performance Issue

Post by Jack1874 »

We are using W2K12 (R2) as the proxies and have mounted the (SotreOnce) NFS shares to them. There is no Linux in our Veeam Architecture.

We decided to to use NFS as the thruput we were getting with CIFS was even worse than what we see with NFS shares
chjones
Expert
Posts: 117
Liked: 31 times
Joined: Oct 30, 2012 7:53 pm
Full Name: Chris Jones
Contact:

Re: Veeam 8 Performance Issue

Post by chjones »

We have a very similar setup to you (HP Blade Servers, 3PAR, StoreOnce, etc). My observation is:

What is the maximum concurrent tasks you have configured for the Veeam Repository? StoreOnce is great at ingesting data in a single stream, but not multiple streams at the same time on a NAS/CIFS Share. The best solution we found was to limit the number of concurrent streams to 12 in the backup console (we found this value specified in an old StoreOnce User Guide as the maximum supported NAS/CIFS streams supported).

We have two proxies, each with two eight-core CPUs (16 cores each) so can process 32 VMDKs with parallel processing enabled. We have the StoreOnce added as a CIFS Share. If we allow the proxies to flood the StoreOnce with 32 streams we see the performance nose-dive. Limiting to 12 seemed to give us the best of both worlds (we found this figure in an older StoreOnce User Guide).

Also important is the StoreOnce OS Version as older versions had performance issues specifically with Veeam. Anything higher than 3.11.4 I believe is best suited for Veeam backups. We are running 3.13.0 at present and have no issues (I believe 3.13.1 will be required for Veeam v9 and it's StoreOnce Catalyst integration).

If I am running a backup job of a VM with a single VMDK I can see upwards of 500MB/sec writing to the storeonce as it is a single stream of inbound data to the StoreOnce. As soon as other VMDKs are backed up at the same time the performance starts to drop, this is just the trade-off with using a dedupe appliance as the repository.

The average speed of my backup jobs (I have up to ten backup jobs all running at the same time but the concurrent operations set to 12, so usually about 4-7 VMs being processed at the same time as each VM usually has 2 or more VMDKs) and my jobs all average around 100-150MB/sec. Remember, that is up to 12 VMDKs being processed at the same time across a number of different jobs and each one processes at 100-150MB/sec.

To get the highest throughput possible you can disable parallel processing in your jobs or at the Veeam server level, however I wouldn't recommend doing this. I would rather process all my VMs at the same time than see my stat lines show higher throughput. As is with any file copy if you copy 1 file as opposed to 10 files at the same time, the performance overall will drop when doing multiple concurrent tasks. It's finding the sweet spot, which for us was 12 concurrent tasks (that is with a HP StoreOnce 4430 with 2 x 10GbE and 36 x 2TB 7200RPM SATA Disks).

As long as you are using the 3PAR Snapshot Integration then you have already offloaded the processing from your ESXi hosts so if the 3PAR Snapshots remains for, say, 20% longer whilst you process the data its not that big of a deal as there is no load on the ESXi hosts during the backup. That's how I look at it. My Active Full Backups on a weekend take about 24 hours to complete (some jobs finish in an hour or two whilst our Exchange 2013 VM which is 6.5TB takes almost a full day) but because the processing is away from the ESXi host I don't really mind.

When Veeam v9 comes with it's native StoreOnce Catalyst integration I am hoping to see about 3x improvement in backup performance. At the moment Veeam has to send every block to the StoreOnce and then the appliance decides whether it has to write it or can discard, in v9 that dedupe is offloaded to your proxy so only blocks required to be written are sent over to the StoreOnce. This dramatically increases the write performance as it allows the StoreOnce to simply act like any old NAS array and just concentrate on writing data, it doesn't have to do dedupe calculations.
Jack1874
Enthusiast
Posts: 95
Liked: 5 times
Joined: Oct 17, 2015 3:32 pm
Full Name: Stuart Little
Location: Canada
Contact:

Re: Veeam 8 Performance Issue

Post by Jack1874 »

Hi CHjones,

I highlighted the amount of concurrent tasks on each proxy. Does that look ok to you? Should we lower that number ?

Proxy Servers 1
Type: Physical
HP DL380p Gen8
RAM = 64 GB
Network = 10 GBe
O/S = Windows 2012 R2
Transport Mode = Direct SAN access
Max Concurrent Tasks = 12

Proxy Servers 2
HP BL60c Gen9
RAM = 64 GB
Network = 10 GBe
O/S = Windows 2012 R2
Transport Mode = Direct SAN access
Max Concurrent Tasks = 16

StoreOnce Firmware Version
= Software Revision 3.13.0-1529.2
Jack1874
Enthusiast
Posts: 95
Liked: 5 times
Joined: Oct 17, 2015 3:32 pm
Full Name: Stuart Little
Location: Canada
Contact:

Re: Veeam 8 Performance Issue

Post by Jack1874 »

Hi , here are out statistics ...

Bottelneck: Network
Source = 11%
Proxy = 3 %
Network = 83 %
Target = 55 %

That is we are running multiple jobs and these are the stats for one of them
Jack1874
Enthusiast
Posts: 95
Liked: 5 times
Joined: Oct 17, 2015 3:32 pm
Full Name: Stuart Little
Location: Canada
Contact:

Re: Veeam 8 Performance Issue

Post by Jack1874 »

One other thing to add ...

We are configuring our jobs on a one-to-one relationship ...

Proxy Max Concurrent Tasks = 12
Repository Limit Max Concurrent Tasks = 4
Proxy1 --- >> NAS Share1
Proxy1--- >> NAS Share2
Proxy1 --- >> NAS Share3

Proxy Max Concurrent Tasks = 16
Repository Limit Max Concurrent Tasks = 4
Proxy2 --- >> NAS Share4
Proxy3--- >> NAS Share5
Proxy4 --- >> NAS Share6

Also, how about your NIC config on the StoreOnce?
On the StoreOnce we currently have tow 10GbE uplinks in fail-over mode. I'm thinking that having them as two separate uplinks is much better as we are looking for thruput and not redundancy. We would export one set of NFS shares on one uplink\IP and the other on another uplink\IP.

Thoughts?
tsightler
VP, Product Management
Posts: 6009
Liked: 2842 times
Joined: Jun 05, 2009 12:57 pm
Full Name: Tom Sightler
Contact:

Re: Veeam 8 Performance Issue

Post by tsightler »

None of the Veeam testing with HP Storeonce has been with Windows NFS client. I would expect that you should be able to get similar performance with CIFS by selecting the gateway to be the same box as the share is on as long as your StoreOnce has recent firmware.

However, we definitely need to figure out the network bottleneck, that part has nothing to do with the StoreOnce. Are you saying that you specifically select a proxy for each job, i.e. Job1 uses only Proxy 1 which uses NAS Share1? I wouldn't expect this to produce a network bottleneck. Are you definitely running v8 with the latest patch (at least update 2)? I'll try to pull your job logs and take a look.
Jack1874
Enthusiast
Posts: 95
Liked: 5 times
Joined: Oct 17, 2015 3:32 pm
Full Name: Stuart Little
Location: Canada
Contact:

Re: Veeam 8 Performance Issue

Post by Jack1874 »

Hi tsightler, we have seen significant increases in performance when changing from CIFS to NFS. The StoreOnce is = Software Revision 3.13.0-1529.2

Yes. we are using 8.0.0.2030
Jack1874
Enthusiast
Posts: 95
Liked: 5 times
Joined: Oct 17, 2015 3:32 pm
Full Name: Stuart Little
Location: Canada
Contact:

Re: Veeam 8 Performance Issue

Post by Jack1874 »

Hi chjones, how are your jobs configured?

I have ...
App_Group01 ---> using Proxy1 ---> using NFSShare01 (Limited to 4)
App_Group02 ---> using Proxy1 ---> using NFSShare02 (Limited to 4)
App_Group03 ---> using Proxy1 ---> using NFSShare03 (Limited to 4)

Doing it this way I'm now seeing between 100 Mb/s and 200 Mb/s

Do you have one big share... Limited to 12?
tsightler
VP, Product Management
Posts: 6009
Liked: 2842 times
Joined: Jun 05, 2009 12:57 pm
Full Name: Tom Sightler
Contact:

Re: Veeam 8 Performance Issue

Post by tsightler »

Jack1874 wrote:We have seen significant increases in performance when changing from CIFS to NFS. The StoreOnce is = Software Revision 3.13.0-1529.2
It doesn't completely surprise me as I've seen lots of issues with CIFS performance and StoreOnce, however, in previous cases we've been able to eventually resolve them eventually and get good performance even with CIFS. Because using the Windows NFS client is not a recommended practice it's not what we test so we don't have any baseline to compare what is reasonable performance.

However, I'm focused on the network bottleneck because that indicates that Veeam is not efficiently transferring data from the source to target data mover, but in your case those both run on the same machine so this transfer should be very fast by using shared memory and I can see in your job logs that it is doing just that. I'm at a loss to explain the high value for this bottleneck at this point in time. I'm almost wondering if it's some type of NUMA issue where the source and target data movers are running on different nodes, but that's kind of an "out there" hypothesis at the moment. Do you have any throttling configured in the "Network Traffic" settings and is the mulitple upload streams setting still configured for "5"?

I'd really like to see bottleneck stats from a single job running at a time.
Jack1874
Enthusiast
Posts: 95
Liked: 5 times
Joined: Oct 17, 2015 3:32 pm
Full Name: Stuart Little
Location: Canada
Contact:

Re: Veeam 8 Performance Issue

Post by Jack1874 »

Hi tsightler, can I upload some logs for you to look at?
Jack1874
Enthusiast
Posts: 95
Liked: 5 times
Joined: Oct 17, 2015 3:32 pm
Full Name: Stuart Little
Location: Canada
Contact:

Re: Veeam 8 Performance Issue

Post by Jack1874 »

I just got this from Veeam Support. Are they suggesting that we run one job per proxy... backing up one guest (VMDK) at a time ...

The best practices have changed with Version 8 in regards to HP devices. Therefore, if possible, here is what I would like to try:

1. Remove the HP device, and re-add it, but when we re-add it, we want to add it as an HP device, and use CIFS paths, and not NFS shares.

2. In the repository settings, when re-adding the HP device, set Limit Concurrent Tasks to 1.

3. If the job proxy setting is set to "Automatic Selection," we need to change that, and specify one specific proxy.

4. In the proxy settings for the specified proxy, (Backup Infrastructure > Backup Proxies > Right click > Properties) we need to set max concurrent tasks to 1.

5. Then, go to the Blue File Menu, and go to Options, and disable parallel processing.
chjones
Expert
Posts: 117
Liked: 31 times
Joined: Oct 30, 2012 7:53 pm
Full Name: Chris Jones
Contact:

Re: Veeam 8 Performance Issue

Post by chjones »

Hi Jack,

The setup of my jobs is:

App Group 1 ---> Auto Proxy ---> Using CIFS Share Direct to StoreOnce NAS Share (Limited to 12)
App Group 2 ---> Auto Proxy ---> Using CIFS Share Direct to StoreOnce NAS Share (Limited to 12)
App Group 3 ---> Auto Proxy ---> Using CIFS Share Direct to StoreOnce NAS Share (Limited to 12)
And so on ...

My two proxies are HP BL460c Gen8 Servers with 2 x 8 Core Intel Xeon, 128GB RAM, HP FlexFabric LOM (2 x 10GbE Uplinks in a Microsoft Switch Independent Team) and HP QMH2752 (Dual Port 8Gb Fibre HBA for connection to 3PAR) and running Windows Server 2012 R2. The proxies can handle 16 concurrent tasks each, and we leave it set at that (everything will bottleneck to the lowest concurrency of your proxies and your repository so the 16 never gets reached anyway).

The StoreOnce we have added directly to Veeam as a CIFS Share. I did ask the question on these forums of whether I should remove the repository and re-add it as a HP StoreOnce in version 8 and Veeam replied, I believe it was Anton Gostev who replied, that in Version 8 adding a StoreOnce as a HP device doesn't really change anything, just sets the default settings (in v9 this will change so you can add a Catalyst Store as a repository), so we have left the repository as a CIFS Share. There was no benefit to re-adding it as it doesn't change how Veeam interacts with it.

I have the concurrency set on the CIFS repository at 12, so this is limit on the concurrency that the proxies will not go-over.

I just checked my StoreOnce's and I too and now using StoreOnce OS 3.13.0 (we upgraded a week or so ago).

All of my jobs start at the same time and I often see VMs in jobs saying they are waiting for resource availability. This is due to the max concurrency of the StoreOnce being set to 12 and the two proxies combined being able to process 32 disks. The 3PAR Storage Snapshot has already been taken so its not like this is placing an unnecessary load on the ESXi Clusters so I don't worry about seeing this. The VMs will process in due time. Once that 3PAR snapshot is done I know I have a point-in-time snap of the VMs so even if they weren't processed for 12 hours or a day I'm still backing up the VM from the point in time the 3PAR snapshot was created.

I can, sort of, understand limiting the concurrency on the storeonce to 1. This is a HP recommendation for a specific reason. To get the best possible dedupe HP recommend limiting the inbound data stream to a StoreOnce to be single objects. I'm not quite sure how this technically makes the dedupe any better other than the StoreOnce receives one full block of data and stores it and then moves on to the next one. with multiple streams it could receive parts of one stream and parts of another and whilst they may be identical it hasn't received the full block of data yet so may not make that data match. However, in my testing I have not see an overall increase in performance to the storeonce by setting the concurrency to 1. All I saw was a similar overall time to back everything up, just the VMs sat there saying "waiting for infrastructure availability" for longer.

If you have the storeonce concurrency set to 1 then there is no benefit to selecting only one proxy as if you had ten proxies you'll still only get one of them sending data at once. I like having multiple proxies for the redundancy, I don't need to edit any jobs if I have a failure of one proxy. I also split my proxies across blade enclosures so I can still backup if I lose an entire blade chassis.

In terms of networking, we have the following:

HP StoreOnce - 2 x 10GbE in Bonding Mode 4 (Ether-Channel/Port-Channel) - This means the StoreOnce can process up to 2 simultaneous data streams, both at 10Gb/sec. There is no guarantee this will always happen as you are reliant on the load balancing policy to distribute across both NICs using the Source MAC Hash, but its the best we can achieve with the limited network control on a StoreOnce.

HP Blade Enclosure - 2 x HP FlexFabric 10Gb/24-Port Modules, each module has 2 x 10GbE Uplinks, so 4 x 10GbE uplinks per enclosure. These are also setup in Port-Channels on the Cisco Switch (the two 10Gb/E ports in FlexFabric Module 1 are in Port-Channel A, and the two 10GbE ports in FlexFabric Module 2 are in Port-Channel B). We also have two shared uplink sets, one for FlexFabric Module 1, and another for Module 2. Every VLAN is added to both shared uplink sets. This is the only way HP support using an Active/Active configuration for Virtual Connect Modules. If you don't do this you end up in an Active/Standby configuration and half of your uplinks out of the enclosures are essentially useless unless you lose a virtual connect module. All traffic into the Standby module is passed from that module over to the active module and then out to the network, which is not what I personally want.

On the blade servers that act as the proxies, their Virtual Connect Profiles are setup as follows:

NIC 1 ---> Force same vlan mapping as Shared Uplink Set 1 ---> Network from Shared Uplink Set 1
NIC 2 ---> Force same vlan mapping as Shared Uplink Set 2 ---> Network from Shared Uplink Set 2

Within Windows 2012 R2 we then create a Network Team using both 10GbE network adapters and set the load balancing mode to switch independent (this is because the blade servers cannot establish a port-channel with the virtual connects or the upstream cisco switch, the virtual connects create that port-channel).

The only downside we have to this configuration is that if one of the virtual connects is offline or fails then the blade servers report a loss of one of their nics and that nic can't see its traffic to the other virtual connect modules. But I see this is a low-risk or even not a risk at all because if a virtual connect goes down in an Active/Standby configuration you are still only getting the throughput of one module anyway, and even when everything is happy and online you still only get that. Maximum performance whilst still having redundancy is what I go for.

Hope that helps ... sorry for the wall of text. It's kinda hard to explain this setup quickly :D

The last point I wanted to make was that all of my backup jobs use the Forward Incremental processing mode and we perform an Active Full every Friday. We have approximately 25TB of VMs we backup with the above setup and we achieve this quite easily inside 4-5 hours each night for incremental job runs, and usually within 14 hours for a full backup every Friday night. The only exception is our 6.4TB Exchange 2013 Mailbox Server VM which takes about 20-24 hours to complete a full backup, however some of this time is usually spent waiting for resource availability due to the concurrency limits. We have no concerns with this.

We also have the exact same setup at another datacentre (we try to ensure our datacentres are replicas of each other) and we have the exact same experience at this other site as well so it's not something we've fluked, the setup so far appears solid.
tsightler
VP, Product Management
Posts: 6009
Liked: 2842 times
Joined: Jun 05, 2009 12:57 pm
Full Name: Tom Sightler
Contact:

Re: Veeam 8 Performance Issue

Post by tsightler »

Jack1874 wrote:Hi tsightler, can I upload some logs for you to look at?
I have logs from your support case and I can see the behavior you are describing quite clearly, but I'm at a loss as to why. If it's OK I'd like to reach out to you in a PM. I think support is simply offering some suggestions that may help narrow down the possibilities.
Jack1874
Enthusiast
Posts: 95
Liked: 5 times
Joined: Oct 17, 2015 3:32 pm
Full Name: Stuart Little
Location: Canada
Contact:

Re: Veeam 8 Performance Issue

Post by Jack1874 »

Sure.. I sent you a PM
Jack1874
Enthusiast
Posts: 95
Liked: 5 times
Joined: Oct 17, 2015 3:32 pm
Full Name: Stuart Little
Location: Canada
Contact:

Re: Veeam 8 Performance Issue

Post by Jack1874 »

hanks Chjones.. thats great info you provided.

I ran some jobs overnight ...

App_Group01 ---> using Proxy1 ---> using NFSShare01 (Limited to 1)
App_Group02 ---> using Proxy1 ---> using NFSShare02 (Limited to 1)
App_Group03 ---> using Proxy1 ---> using NFSShare03 (Limited to 1)
App_Group04 ---> using Proxy1 ---> using NFSShare04 (Limited to 1)
Max Concurrent Tasks = 16

DB_Group01 ---> using Proxy2 ---> using NFSShare05 (Limited to 1)
BD_Group02 ---> using Proxy2 ---> using NFSShare06 (Limited to 1)
DB_Group03 ---> using Proxy2 ---> using NFSShare07 (Limited to 1)
DB_Group04 ---> using Proxy2 ---> using NFSShare08 (Limited to 1)
Max Concurrent Tasks = 112

Our speeds vary from 14 MB/s on some disks .. right through to 250 MB/s

And the bottleneck is consistently saying source... at between 90 - 99 %

Do you have any kind of muti-pathing enabled for the 3PAR ?
emachabert
Veeam Vanguard
Posts: 388
Liked: 168 times
Joined: Nov 17, 2010 11:42 am
Full Name: Eric Machabert
Location: France
Contact:

Re: Veeam 8 Performance Issue

Post by emachabert »

With 3Par you should be using round robin multipathing. Under windows use MS MPIO ( look hp doc for the claim rule)

Chris did a very good , complete and clear description of the best practices when dealing with HP FlexFabric and HP 3Par.

Jack what StoreOnce model are you using ?
Veeamizing your IT since 2009/ Veeam Vanguard 2015 - 2023
chjones
Expert
Posts: 117
Liked: 31 times
Joined: Oct 30, 2012 7:53 pm
Full Name: Chris Jones
Contact:

Re: Veeam 8 Performance Issue

Post by chjones »

Eric is correct, Round Robin is the way to go. On the Windows Blades for the proxies we have the Microsoft MPIO installed plus the claim rule set that Eric mentioned and that's almost about all you need to do. 3PAR volumes are then automatically set to Round Robin. If you create a new vSphere Datastore you present it to the proxies as well and they will be automatically set to round robin on the proxy and you can just forget about it.

With the speeds, I wouldn't worry about it too much. When I am doing daily incrementals the speeds can appear very low, but this is because you are copying bits and pieces, you aren't running just one large file copy from end to end. Many small copies will always be slower than one large continuous copy.

When I run the same setup with Active Full job runs the speeds can be anywhere between 100-200MB/sec for each job, still with multiple jobs running at the same time. The job speeds for incrementals are always slower than a full.

Do you have 8Gb fibre all the way from the 3PAR Controller Nodes to your Proxies, and do you have multiple paths? Can each HBA in your proxies see a direct path to all 4 controller nodes? Our proxies have 2 Fibre HBAs. Each port connects (via the HP Virtual Connects) to one of two HP SN3000B 16Gb Fibre Switch (even tho we only use 8Gb the 16Gb switch was more cost effective). So the HBAs in our proxies connect to each switch. On the 3par we have one port from each controller to each switch. So it looks like this:

Proxy HBA 1 --> Fibre Switch 1
Proxy HBA 2 --> Fibre Switch 2

3PAR Controller Node 0, Port 0 --> Fibre Switch 1
3PAR Controller Node 0, Port 1 --> Fibre Switch 2
3PAR Controller Node 1, Port 0 --> Fibre Switch 1
3PAR Controller Node 1, Port 1 --> Fibre Switch 2
3PAR Controller Node 2, Port 0 --> Fibre Switch 1
3PAR Controller Node 2, Port 1 --> Fibre Switch 2
3PAR Controller Node 3, Port 0 --> Fibre Switch 1
3PAR Controller Node 3, Port 1 --> Fibre Switch 2

Using this setup each proxy has 8 paths to the 3PAR and can communicate directly with every node. Do you have the proxies being able to see each controller node?

Remember that with a 3PAR you typically have the controller shelves in the middle with disk shelves (cages) above and below them. The bottom two controllers (usually 0 and 1) control the cages below them, and 2 and 3 control the cages above them. If your proxy can't talk directly to all controller nodes then one of the other nodes has to talk via the backplane to a controller node that controls the other disks, and ask it to fetch the data for you. Just want to confirm this isn't your issue?

Also, what 3PAR OS are you running and what type of disks?
Jack1874
Enthusiast
Posts: 95
Liked: 5 times
Joined: Oct 17, 2015 3:32 pm
Full Name: Stuart Little
Location: Canada
Contact:

Re: Veeam 8 Performance Issue

Post by Jack1874 »

Hi Guys, thanks for the info.

I checked the proxy's and the MPIO was not set. I have now enabled it and configured it with the claim rule " 3PARdataVV " . I checked the LUNs in Device Manager as they get mounted and they are using Round Robin.

Here is the screen clip from one of the mounted snapshots. Am I right in saying he is only showing two paths?

Image
Jack1874
Enthusiast
Posts: 95
Liked: 5 times
Joined: Oct 17, 2015 3:32 pm
Full Name: Stuart Little
Location: Canada
Contact:

Re: Veeam 8 Performance Issue

Post by Jack1874 »

Hi Eric, we are running HP StoreOnce 4900.
Jack1874
Enthusiast
Posts: 95
Liked: 5 times
Joined: Oct 17, 2015 3:32 pm
Full Name: Stuart Little
Location: Canada
Contact:

Re: Veeam 8 Performance Issue

Post by Jack1874 »

Hi Chris, I will gather as much info about the connectivity to the 3PAR and get back to you.

Just to confirm... do you have 1 x \ dual port HBAs... or 2 x \ dual port HBAs in your blades?
emachabert
Veeam Vanguard
Posts: 388
Liked: 168 times
Joined: Nov 17, 2010 11:42 am
Full Name: Eric Machabert
Location: France
Contact:

Re: Veeam 8 Performance Issue

Post by emachabert » 1 person likes this post

You should see 4 active path (on a two node system), you should look to your zoning configuration and/or the cabling into the fabrics.
Did you apply the 3par best practices regarding zonning and cabling ? Like the one regarding port persostence for example?
Veeamizing your IT since 2009/ Veeam Vanguard 2015 - 2023
chjones
Expert
Posts: 117
Liked: 31 times
Joined: Oct 30, 2012 7:53 pm
Full Name: Chris Jones
Contact:

Re: Veeam 8 Performance Issue

Post by chjones »

We have a single dual-port HBA in our blade servers. We have a single HP QMH2572 mezzanine in our Gen8 blade servers. We have it in Mezzanine Slot 2 in our blades which maps to Virtual Connect Bays 5 and 6.

Looking at your screenshot, yes, your server has two paths to the 3PAR and both are being used in Active/Active Round-Robin. Just a guess, but from your screenshot, are you using Direct-Attach to your 3PAR? It would seem you are not using Fibre Switches? Either way, I would expect to see more than 2 paths as each HBA port should have a direct path to each controller node. With a 2-Node 3PAR I'd expect at least four paths and with a 4-Node 3PAR I'd want to see at least 8 paths. Each HBA Port not having a direct path to every controller node in the 3PAR will cause performance issues.

I only had HP Techs onsite a few weeks ago as we are looking at SSD options for our two 3PAR 7400's and I discussed this very issues with them, just to get it right in my own head, and they indeed confirmed that each HBA port not having its own path to every controller will cause performance degradation. If you can provide some more info on what's in your blades, what virtual connects you have, what model 3PAR and OS and how many controller nodes and how they are connected that may help.

Btw, I am off on a 3 week holiday in a few days (my first ever trip to the USA and going to see a Lakers game ... woohoo!!!) so apologies that I likely won't respond for a while.
Post Reply

Who is online

Users browsing this forum: Google Feedfetcher and 88 guests