Job rate difference between direct san and storage snapshot

gurneetech · Post by **gurneetech** » May 15, 2018 5:03 pm this post

We’re bringing up a new Veeam environment on new hardware and aligning our backup strategy to better/recommend practices. I’ve got a new backup server with direct attached fast SAS disk for backup landing, and a large JBOD for longer term storage combined with a tape library. Since the hardware is new and we are not under a particular go-live schedule, I have a good opportunity to test/tune. I am seeing a bottleneck that I don’t understand, so I’m looking for some coaching.

At this location we’ve got 3 HP DL380 G9’s running ESXi 6.0 Update 3d Express Patch 12 (initial Spectre patch) and have a 6-node HP StoreVirtual/Lefthand P4300 SAN. Networking for all of the above is 10Gbps via dual Cisco 4500-X and SFP+ twinax cabling.

This new backup server is physical, an HP DL380G10, with 10x2.4TB SAS 10K disk, and a Supermicro JBOD cabinet. This is also dual connected to the C4500-X. Windows Server 2016 and 64GB RAM. For this discussion, the JBOD is not in play. Jumbo frames are not configured, as both the backup server and the vmware hosts have trunked network adapters and jumbo’s would apply to all L2 frames.

Our Veeam VBR license is Enterprise, not Enterprise Plus, we cannot leverage storage snapshots. However, the storage snapshots are the genesis of my question.

We are using software iSCSI initiators both on the vMware host to the HP/Lefthand storage and on the physical backup server. When we deployed this generation environment, there was a bug in the firmware/driver for the Emulex/HP 556 10Gb hardware that caused decreased performance and eventually disconnects when using the hardware iSCSI HBA mode. (I suppose I could test on the new physical backup server and its Intel/HP 10Gbps card in HBA mode, but I have not). The Microsoft iSCSI initiator is setup here, not in multi-path mode, and the HP/Lefthand LUNs are presented read-only and are present in disk manager.

Bringing up the new backup server, I spent a bunch of time testing with iPerf and Diskspd and eventually became satisfied that I had good drivers and configuration for both networking and storage. I then installed the latest Veeam 9.5 build and dropped a trial license on for a more real-use testing phase.

I configured the local Veeam proxy and setup a job. When I did this, I missed that the trial license allows the Storage snapshot to be enabled by default. I hadn’t even looked at the tab, as I ‘knew’ we weren’t going to use it.

The test job ran great! Windows monitoring showed that I got ~7-9Gbps throughput and the job stats showed:

Overall processing rate of 907 MB/s
With:
Load: Source 60% > Proxy 71% > Network 82% > Target 29%
Using backup proxy VMware Backup Proxy for retrieving Hard disk 1 data from storage snapshot

I pretty quickly figured out it had done the storage snapshot, which worked well but wasn’t what I needed. I unchecked the job’s storage integration box and ran the exact same job again as an active full.
This is the part I don’t understand:

Overall processing rate: 431MB/s
Load: Source 99% > Proxy 16% > Network 6% > Target 1%
Using backup proxy VMware Backup Proxy for disk Hard disk 1 [san]

Seeing the processing rate is half, and the load is Source, I set about trying to find the bottleneck.

We already know that the backup server direct from storage is fast from the storage network snapshot job. I cannot measure this with iperf, but we have that first Veeam job run showing the good rate.

iperf2 from the backup server storage network NIC to the vmware host storage vmk port IP shows ~7.5Gbps. I ran this with 4, 8 and 20 streams to get the 7.5 number. All runs were pretty close together.
iperf3 on a vmware guest to the backup server shows ~8Gbps
iperf3 from the old backup server to the new shows hi-8Gbps. This is mostly irrelevant, just did it to reinforce the environment is capable.

I am not sure this test is directly relevant to the Veeam conversation, but interesting because the numbers are close to the above [san] Veeam job rate, I ran diskspd from inside a guest to test the storage disk speed access. It came out at 410MB/s.

So if the network rates from Veeam server to storage are good, and Veeam to vmware host storage NIC are good.... What concept am I missing that explains Source load at 99% with a decreased job rate vs. the storage snapshot method?
Does it make sense that the in-guest storage access speed is essentially the same as the Veeam san job rate?

May 20, 2018 7:39 pm

Thanks for the request.

Direct SAN process data through VMware VDDK kit in synchronous way... meaning that it asks for a single data block, wait until received, then ask for the next one.

With Backup from Storage Snapshot, we bypass VDDK and read asynchronous from the storage. We will ask for a higher number of data blocks at same time, which allow the Storage Controller to optimize the reads from disks and caches. Depending on Storage type and situation you see 2-5times faster processing. Near 2x I would say is common.

We use asynchronous read with Backup from Storage Snspshot, DirectNFS and Virtual Appliance Mode (Hotadd).

R&D Forums

Job rate difference between direct san and storage snapshot

Re: Job rate difference between direct san and storage snaps

Who is online