VEEAM Active Full writes to Repo much slower than DiskSPD

KarmaKuma · Post by **KarmaKuma** » Feb 20, 2022 9:21 pm this post

When doing an active full backup to my NFS/SMB Repo (Dell Isilon/PowerScale Storage Cluster), I only get about half the write throughput to the Repo for a single backup/write task (one VM disk) of what I get with DiskSPD when running the Full/Incremental test pattern according to https://www.veeam.com/kb2014 with "diskspd.exe -c25G -b2048K -w100 -Sh -d600 \\my\smb\repo\diskspd-test.dat" to simulate a local target (large) type full backup.

I get something between 300-350MB/s from a VEEAM Hot-Add Proxy, and around 700MB/s from DiskSPD run on the same Proxy VM.

The Repo is capable of ingesting between 1.8-2.6GB/s VEEAM Backup data, depending on other load and VEEAM junk size (Local Target / Local Target (large)) when fed with multiple Backup tasks concurrently to all Cluster Nodes. One Cluster-Node maxes out when hitting its 10GbE NIC limit (dual NIC Tests have shown the limit to be pretty much 1.3GB/s per Node) so it is not bottoming out at all from a single backup task/stream.

When doing our SAP DB-Dumps, we regularly see one dump-stream pretty much saturating a Cluster Node 10GbE link... Same with Explorer single File Copy/Paste (I know that's not a reference, but a piece of info in this case, and we are forcing SMB3 CA with Write-Through and NFS with Sync on all Backup Shares/Exports, so no lazy writes, early commits et al...).

Also, the VEEAM Proxies are not bottoming out with this, as I can have them push between 900-1.3GB/s per Proxy to the Repo with multiple concurrent tasks and easily max out the 10GbE of a single Repo Cluster Node.

I also checked SMB Multichannel between Proxy and Repo. That's working for both, VEEAM and DiskSPD. Both distribute the load on two NICs on the Proxy quite evenly (also PoSH Get-SmbMultichannelConnection shows multiple concurrent SMB connections for both). CPU load incl. distribution on Cores, Memory usage, etc. on the Proxy all O.K and well behaved. Same for the Repo btw.

What could be the reason that one single VEEAM task tops out at somewhere between 300-350MB/s real writes to the Repo Cluster Node, when multiple concurrent tasks on one Proxy to the same Cluster Node can max out the Cluster Node 10GbE NIC at about ~950-1'100MB/s? Load reports Target (90% +) as Bottleneck in all cases, be it single task 300MB/s or multiple tasks 1GB/s btw. Source and Network are both capable of 1.5GB/s and more as seen during "high-dedupe rate" tasks where the dedupe rate keeps the load on the Target at said 300-350MB/s per task...

I know, this is complaining on a high level. But, you know =)

Feb 20, 2022 11:16 pm

Diskspd is only meant to show what the target storage is capable of under the synthetic streaming workload of perfectly aligned blocks of constant size. No one ever said Veeam will perform equally well writing to its transactional backup storage format which contains not just actual data blocks (and of variable size) but also redundant metadata banks with data block digests (used for deduplication). For example, diskspd does not need to go back and forth to update those metadata banks periodically (every 100MB written, if I'm not mistaken).

I'd expect diskspd to always be faster even just because diskspd does not issue periodic flush commands. How much faster depends on how long it takes the target storage to react to the flush command. Our support engineers can enable performance debug logs and tell you exactly how much time is spent on each I/O operation. Sometimes we uncover some really weird storage-specific stuff thanks to these logs.

Another cause of difference could be well-compressible data: when diskspd always writes 2MB blocks, Veeam could be writing significantly smaller ones post-compression.

In any case, this is clealy an environment-specific issue and as such it should be handled by our technical support. Please don't just ignore the forum rules displayed when you click New Topic, they explain all this in great details. Thank you for your understanding!

KarmaKuma · Post by **KarmaKuma** » Feb 21, 2022 8:41 am this post

It was not my intention to break the forum rules, sorry about this. With such a huge throughput difference I was in the believe that I might have missed a certain config spot (one that might be quite obvious to someone else) or a limitation w.r.t. SMB/NFS Repos (something like a default max write limit per task/stream to minimize component-overload related data error risks, for users writing via lower-shelf network devices to lower-shelf NAS boxes) that would lead to such thing. Especially since every now and then in this forum, pro's and experts have OPs compare VEEAM throughput against Diskspd results to see what the Repo is actually capable of before jumping into false conclusions and unnecessary VEEAM "tuning".

Again, sorry for apparently breaking the forum rules.

Post by **Gostev** » Feb 21, 2022 4:21 pm this post

Indeed diskspd has been a perfect support tool for demonstrating customers with zero storage knowledge that their low-end NAS has extremely poor performance despite of massive capacity allowed by its few modern hard drives, and saving their time on unnecessary VEEAM "tuning". But it's not so good for the opposite scenario, simply because diskspd does not do things which start to matter with high performance storage.

This is why I would highly recommend to start from having our technical support enable and analyze "performance debug" logs. Because if your storage can accept "perfect" data twice as fast, and assuming Veeam compresses protected data to about the same block sizes, then the rest of the time can only be spent on Veeam waiting for some other I/O operations to complete (those which diskspd does not perform in principle). And the performance debug log analysis helps to pinpoint those I/O operations by laying out how much time was spent on each operation in total. Which makes it very easy to see the main "offenders".

KarmaKuma · Post by **KarmaKuma** » Feb 22, 2022 10:07 am this post

Thanks for the input Gostev!

I'll open a Support case then - I need to have my test license extended anyways as we are still in evaluation phase =)

And will update this thread with news when ready...

KarmaKuma · Feb 22, 2022 11:15 am

Support Case opened

Case number: #05296567

R&D Forums

VEEAM Active Full writes to Repo much slower than DiskSPD

Re: VEEAM Active Full writes to Repo much slower than DiskSPD

Re: VEEAM Active Full writes to Repo much slower than DiskSPD

Re: VEEAM Active Full writes to Repo much slower than DiskSPD

Re: VEEAM Active Full writes to Repo much slower than DiskSPD

Re: VEEAM Active Full writes to Repo much slower than DiskSPD

Who is online