Comprehensive data protection for all workloads
Post Reply
NetWise
Influencer
Posts: 13
Liked: 3 times
Joined: Aug 26, 2009 5:52 pm
Full Name: Avram Woroch
Contact:

10GbE ISCSI Performance

Post by NetWise »

Good afternoon. We're doing a deployment of Veeam 6 now onto Production hardware and replacing our Production backups. In our dev environment things worked good - and still are. What we are seeing matches what we saw there, we just didn't think as much of it.

Our general environment:

* ESXi v5.0 on a Dell R510, Dual Quad Core, 48GB, 12x3TB on H700, 2x 10GbE NIC's and 2x 1GbE on board.
* Veeam VM runs here. It has a vNIC on the data port group, and 2 vNIC's on two Port Groups on the 10GbE NIC's, that are the same NIC's used for ISCSI. Each Port Group has only one of the pNIC's active. This matches how our ESX host is configured and as this host is backup only, it really only has "a connection" to the SAN, but is not running VM's from it other than to see it for backups, etc.
* ESXi v5.0 or ESX v4.1U2 on Dell R710's, Dual 6 core, 72-144GB RAM, internal disks not used, 2x 10GbE NIC's along with various 1GbE NIC's.
* Each host sees the EQL PS6010XV SAN via 2 10GbE NIC's, also assigned to 2 pNIC's on a number of iSCSI initators. Performace measured inside the VM is amazing, so that part is solid has been a couple of years.
* Dell PowerConnect PC8024F 10GbE switches, x2 with 4x10GbE LAG between.

As you can see, our back end is pretty solid and high speed. But when we process a VM the first time, or doing regular fulls, or reverse incremental the first time, etc, we only hit speeds of about 100MB/sec. We can actively see from the Veeam backup that we are only hitting between 5-15% total on the 2x ISCSI 10GbE NIC's. This matches what we saw when we were trialing Veeam on a physical box with similar specs, same 10GbE NIC's. We just always thought that was normal for the first backup. But then we started digging. It seems other people are able to get faster, and during our backups, we're seeing the following:

Source 99% > Proxy 92% > Network 5% > Target 4%

The FAQ indicates this means that the Source is waiting for the disk reader component to get blocks from storage. It shoudln't be the network, and we have the iSCSI target enabled in the VM to talk to the SAN. We know it's using both NIC's and MPIO. So we're not sure why Veeam is "waiting on source".

Its one of those things that only matters during the first time backup (in a Reverse Incremental). The incremental backups the next night seem to process fast enough, even on backups that use NBD (a Dev server, without SAN), that it doesn't matter. But it would be good to find out how to better optimize it.

Any suggestions on what we should look at?
dellock6
Veeam Software
Posts: 6137
Liked: 1928 times
Joined: Jul 26, 2009 3:39 pm
Full Name: Luca Dell'Oca
Location: Varese, Italy
Contact:

Re: 10GbE ISCSI Performance

Post by dellock6 »

If Veeam is in a VM, have you used the VMXNET3 network card? This is the choice to get 10 gbits in virtual machines, other virtual nics are only 1 gbit, and the speed you are getting seems aligned to the 100 MB/s you are seeing: consider data passing inside a 1 gbits channel, theoretical maximum is 125, add tcp and ip headers, add some TCP retransmits....

Luca.
Luca Dell'Oca
Principal EMEA Cloud Architect @ Veeam Software

@dellock6
https://www.virtualtothecore.com/
vExpert 2011 -> 2022
Veeam VMCE #1
tsightler
VP, Product Management
Posts: 6009
Liked: 2843 times
Joined: Jun 05, 2009 12:57 pm
Full Name: Tom Sightler
Contact:

Re: 10GbE ISCSI Performance

Post by tsightler »

NetWise wrote:Source 99% > Proxy 92% > Network 5% > Target 4%
So there's two things to note here, obviously the fact that Source is 99% says that our reader process was 99% busy, that's fine, and while that is listed as the "bottleneck" I'd actually look more at the next step in the chain, the "proxy" percentage. The fact that you are seeing 92% proxy usage would indicate that you are processing just about as much data as that proxy will ever be able to do. Are you CPU's really showing being that busy during backup? How many CPU's do you have.

In my experience a 4-CPU VM acting as a proxy will typically top out between 100-150MB/sec based on the underlying CPU horsepower and other load on the host, the type of data being compressed/deduped, etc. I've definitely seen physical systems with many cores perform faster, but that' pretty much right in the ballpark for a single virtual proxy when running a full backup.
NetWise
Influencer
Posts: 13
Liked: 3 times
Joined: Aug 26, 2009 5:52 pm
Full Name: Avram Woroch
Contact:

Re: 10GbE ISCSI Performance

Post by NetWise »

We are using the VMXNET3 for the reasons you indicate - to get the 10GbE support. Also the NIC is set to Jumbo Frames as is the Port Group, vSwitch, NIC's, switches and SAN all the way end to end.

The CPU's are definitely showing as busy, and that would seem to be reasonable. The VM has 8 vCPU assigned, and the host has dual quad core, so it using all it has available to it. If you have seen 4 do 100-150MB/sec, then 8 doing 200-300MB/sec sounds about reasonable, even if the 10GbE "could" do a theoretical maximum of > 1000 MB/sec.

Would this be helped in any way by having a 4 vCPU VM on the cluster as a proxy, packing up and streaming data to the Veeam backup VM? And/or would this allow the Veeam backup server then to process parallel jobs better, by having each host have a proxy on it, that passes the data back to the central Veeam backup for writing to disk?
dellock6
Veeam Software
Posts: 6137
Liked: 1928 times
Joined: Jul 26, 2009 3:39 pm
Full Name: Luca Dell'Oca
Location: Varese, Italy
Contact:

Re: 10GbE ISCSI Performance

Post by dellock6 »

Doing parallel jobs with several proxies is the way to go to increase performances.
But if you are going to use the Veeam Server as the only repository of all your proxies, it may become your new bottleneck in the chain (target).
Luca Dell'Oca
Principal EMEA Cloud Architect @ Veeam Software

@dellock6
https://www.virtualtothecore.com/
vExpert 2011 -> 2022
Veeam VMCE #1
joergr
Veteran
Posts: 391
Liked: 39 times
Joined: Jun 08, 2010 2:01 pm
Full Name: Joerg Riether
Contact:

Re: 10GbE ISCSI Performance

Post by joergr »

Netwise: Please clarify which NICs you use in your DELL Systems, is it Broadcom 57711 or Intel x520 DA?

Now, another important question: Could you afford a physical system for the veeam machine (e.g. DELL R720, with two six cores and an intel x520)? That would be the OPTIMAL method of getting most power during high bandwidth point-to-point transmissions as you would need during a backup window. And don´t expect to get 100% of 10GB bandwidth. The veeam dedup and compression engine need much cpu horsepower. If you find yourself after a little redesign with a physical system at using about 60% of the whole 10GB Link you made a VERY good job.

Oh and PLEASE upgrade 8024F Firmware to 4.2.1.3 and your EQL to 5.2.1. That is no marketing jerk jabber - i am aware of some serious load balancing and flow control issues which i can confirm are corrected with this setting. After you did that, DCB will be turned on in the EQLs. Turn it off! It is not 100% ready yet with this 8024F FW release.

Best regards,
Joerg
NetWise
Influencer
Posts: 13
Liked: 3 times
Joined: Aug 26, 2009 5:52 pm
Full Name: Avram Woroch
Contact:

Re: 10GbE ISCSI Performance

Post by NetWise »

All of the 10GbE facing NIC's are Intel X520-DA2's.

We could probably afford a physical system, but we JUST bought these, it would be incredibly difficult for me to go back and ask for more. That said, performance is acceptable. We are completing in acceptable windows for full's, incrementals are working just fine, etc. So the speed is not a requirement as much as diagnosing "the NIC's only show 5-15% utilization, which seems odd" and "the Source 99% suggests problems reading and waiting, which seems odd". As such, it is is all working good, we're just at the tweaking and optimizing stage. As the CPU is definitely pinned on all 8 cores, that is a reasonable place for the bottleneck to be, it just didn't seem immediately obvious from the logs that that was true.

Our PC8024F's are 4.1.1.9, so you are right on the firmware there. And EQL is v5.1.2 - wth did v5.2.1 come out? I guess in the last 60 days, we haven't checked since just before Christmas. Thanks for the heads up, we're downloading them now and will test them in our PreProduction environment then schedule our rollouts. Thanks also for the tip on DCB.
joergr
Veteran
Posts: 391
Liked: 39 times
Joined: Jun 08, 2010 2:01 pm
Full Name: Joerg Riether
Contact:

Re: 10GbE ISCSI Performance

Post by joergr »

Yep, you are welcome. And BTW: Intel x520 DA2 are great and stable NICs. I have each two of them spread around every our esxi machines. For me personally, they rock in performance and stability.
NetWise
Influencer
Posts: 13
Liked: 3 times
Joined: Aug 26, 2009 5:52 pm
Full Name: Avram Woroch
Contact:

Re: 10GbE ISCSI Performance

Post by NetWise »

So before I wrap this up - the topic of the Proxy. Whether it be a standalone host with no SAN or a clustered host with SAN access, if we put a.... 4 vCPU proxy out on each host, that should help the backups. Let's start with the simple example of the standalone host as the cluster has potential issues with DRS and VM placement and knowing which host the proxy and VM's are on, etc.

If you are using a Proxy on a host, which has 10GbE NIC's (to be ABLE to connect to the SAN, maybe in the future, but not to access shared volumes now) with local disks, and the proxy has 4 vCPU and the host has 8 pCPU/Core. What sort of expectations are reasonable? My assumption is that the local proxy is doing the reading from local disk, and the packaging (dedupe/compress/streaming) to the Veeam backup. The Veeam backup is reading its local disks/backup repository to be able to provided either details on which blocks are changed (I understand CBT comes from tables on the host side, not a comparison against previous backups though) or to provide the Read/Write/Write that occurs with a Reverse Incremental (Read in the VBK, Write out the changed data into the VBK, Write out the previous data to the VBR, generating a 3x IO hit). So I should expect the Veeam Backup server to be busy doing some of that, even if there is a Proxy in use.

Am I far off? I wouldn't expect a 4 vCPU equivalent trade off from the Veeam host, but the equivalent to 1-2 vCPU perhaps?

I guess one thing is that it would be hard to tell, as we have opted to run jobs in parallel vs series. I would rather have 3 2 hour jobs end up taking 2.5-3.0 hours becuase of slowdowns than have them take 6 hours due to running in sequence. So if we offloaded to Proxies, we wouldn't likely see a reduction in load on the Veeam server, only a reduction in total backup window processing time.
Post Reply

Who is online

Users browsing this forum: Google [Bot] and 267 guests