Maximize Performance for large Hyper-V Backups

Post by **dasfliege** » Jan 15, 2019 9:59 am this post

We have a pretty big Hyper-V environement (700 VMs), where we use Veeam to back it up. We're already pretty limited in terms of the amount of backups we can take during the day, as the backup tasks are running almost continously. As we plan to expand the Hyper-V farm by more then the double-amount of VMs, we need any way to improve the performance.
We were using on-host backup for 3 years, as off-host wasn't working well with our Nimble storage. Since the very latest relase of Nimble hardware VSS provider, off-host backups are working well. However, it seems like they are slower then on-host. We had like ~200Mb/s with on-host and ~120Mb/s wit off-host backup.
I try to describe our environement below as precise as possible and would be thankful for any input from people, who may have similar-sized environements and were struggeling with performance "issues". I know that it not really is an "issue", as we have pretty good throughput. But as we also have pretty big vSphere environements where we use Veeam, i know that even more throughput is possible, if the base supports it. So i basically search for best practices for hyper-v, that could suit our situation and help speed up our backups.

We run two datacenter on two different sites, but they are populated equally, so i describe only one site below:

Storage: HPE Nimble CS3000, 80TB HDD, 3TB SSD (Cache), 2x 10Gbit iSCSI
Hyper-V: 6x HPE Proliant DL360 Gen9/10, 512GB RAM, 2x 10Gbit iSCSI
Backup Proxy AND Repository: HPE Proliant DL380 Gen8, 16 Cores, 192GB RAM, 2x 10Gbit iSCSI, Dual 6Gbit-SAS, 16 concurent tasks configured on proxy and repo
Backup Disks: JBOD 180TB (60x MDL SAS in Windows Storage Spaces), ReFS 64k, Attached via dual 6Gbit-SAS to the above Repo-Server

Backup Jobs: ~30 Jobs with 10-30 VM's each. VM's which are in the same job, also lay on the same hyper-v host, so we can leverage the "process multiple VM per snapshot" function. The jobs run 3 times a day each.

GFS Jobs: All Backup Jobs have a separate GFS job. GFS is running once a day and are utilizing ReFS Blockclone.

I'm happy about any input that some people may have, which could help to improve our backup performance. But i also have a few specific questions, where i would be glad to have an answer for:
- Is it an exprected behavior, that (on hyper-v) backups from storage snapshots (off-host) are slower then on-host backups in our configuration? That said, should we stay with on-host, as long as we do not encounter any performance drops on our production hosts during backup times?
- If we use the same physical server (with 16 physical cores) for repo and proxy, should we limit concurrent tasks for repo and proxy to 8 each, or can we go with 16 each?
- Bottleneck is shown as "source" most of the time. Doesn't matter if we use on- or off-host backups. I doubt that this is true, as our Nimble Storage isn't busy at all. Is there any way to track down the bottleneck any further?
- Are there any optimizations i could consider in this specific scenario?

Thanks in advance for any input

Post by **HannesK** » Jan 15, 2019 3:56 pm this post

Hello,
just to make sure I understood you correctly:
1) do you really have more than 100 VMs per host?
2) if yes, is it possible that the CPU load on the Hyper-V hosts is already very high?

Is it an exprected behavior, that (on hyper-v) backups from storage snapshots (off-host) are slower then on-host backups in our configuration?

not in general. In general the hardware-vss providers or storage firmware in combination are just not stable. That's why I never recommend off-host proxy. I have heard of people where it works, but usually it becomes a mess sooner or later. Speed should be good depending on the number of tasks you configured.

80TB HDD, 3TB SSD (Cache)

the amount of cache compared with the amount of data looks small to me. How is your storage load? You mention that bottleneck analysis says "source". Is it "always" 99% ?

If we use the same physical server (with 16 physical cores) for repo and proxy, should we limit concurrent tasks for repo and proxy to 8 each, or can we go with 16 each?

as long as you don't use the off-host proxy, the proxy runs directly on the Hyper-V hosts. That said, I would go for 16 or more repository tasks.

Are there any optimizations i could consider in this specific scenario?

you could open a case and ask support for help (please post the case ID here). But if the storage / Hyper-V hosts are overloaded, they will not be able to help.

I have seen Hyper-V a environment with ~700 VMs with 30 hosts with good speed, so it is not a general issue.

Best regards,
Hannes

Post by **dasfliege** » Jan 16, 2019 8:17 am this post

Hi Hannes

1) do you really have more than 100 VMs per host?
2) if yes, is it possible that the CPU load on the Hyper-V hosts is already very high?

No. The 700 VMs are split over two sites. Each site has 6 hosts, so we have an average of 50-60 VMs per host. CPU load is pretty low. We run only a small amount of high-consuming VMs. Most of the machines are AD, Print, Exchange, SQL, etc...
We have seperate Hosts for VDI, which produce more load but aren't part of the veeam protected environement.

not in general. In general the hardware-vss providers or storage firmware in combination are just not stable. That's why I never recommend off-host proxy. I have heard of people where it works, but usually it becomes a mess sooner or later. Speed should be good depending on the number of tasks you configured.

Thats what i've heard at veeamON and was pretty surprised, because 1-2 years ago, the best-practices always were to use off-host whenever possible. But i understand that it's a mess if you have to deal and rely on storage vendors and microsoft, to keep their stuff stable.

the amount of cache compared with the amount of data looks small to me. How is your storage load? You mention that bottleneck analysis says "source". Is it "always" 99% ?

Storage load is decent. Nimble only keeps hot-blocks in the cache and we have between 90-98 percent cache-hit-rate. So that works fine. I guess backup-traffic don't really utilize hot-blocks, as it's mainly a serial stream and not random i/o. Bottleneck analysis says the following:
Busy: Source 60% > Proxy 9% > Network 28% > Target 47%

As i said above: I don't think that it is an issue, as everything is working fine atm. Therefore i don't have to open a support-case. But as we're are expanding to something around 2000 VMs and some more hosts, we will run into problems at some point. I'm looking for and hints that helps us to avoid that. I guess, based on everything i heard so far, we will switch back to on-host backups instead of off-host. But maybe someone has some more things we could check and enhance, to be ready to expand our farm...

Post by **HannesK** » Jan 16, 2019 9:08 am this post

Hello,
hmm, the bottleneck analysis looks good, there should be more speed possible. Maybe more parallel tasks (repo and proxy, directly on the Hyper-V host) could help. For 12 hosts in parallel processing I would expect more than 200 MByte/s. I mean, if there is a low change rate for many VMs, then the speed looks quite slow because the overhead of creating a snapshot etc. takes much time compared with the time where data is transferred.

Another reason for slow speed could be that change block tracking does not work properly. I have seen that during transition from Hyper-V 2012R2 to 2016. This should be shown in the logs (but usually result in higher values for "source" bottleneck).

Best regards,
Hannes

Post by **dasfliege** » Jan 16, 2019 1:13 pm this post

Maybe more parallel tasks (repo and proxy, directly on the Hyper-V host) could help.

I will give that a try and see how it behaves...

Yes, i also think the Snapshot creation and mount takes way too much time, compared to the actual backup task. Especially if more then one snapshot needs to be take, because the VMs are on different hosts, or there are more then 8 VMs per job. CBT seems to work properly, even though we migrated our hosts from 2012R2 to 2016 recently.

What is a real-world best practice for task limit in on-host mode? We used "4" parallel tasks per hyper-v host, before we switched to off-host. Can we go higher there? Our hosts have 24 physical cores (10-20% busy) and 512GB RAM (50-70% used).

Post by **HannesK** » Jan 16, 2019 4:12 pm this post

I'm not aware of an official best practice as the load is very different from customer to customer. I would try raising slowly and wait for a week before raising values again. On the repository you could raise faster as it does not really influence production.

Post by **Mike Resseler** » Jan 17, 2019 6:27 am this post

One additional item.
I kind of agree and disagree with Hannes about his unstable hardware VSS. I have seen many larger deployments that use this very effectively. The trick however is in making sure that the exact firmware off the different components (storage / network cards / ...) matches the VSS provider and even the software updates are all alike. Which, as you know, isn't an easy task to achieve.

I heard from the admins on those larger deployments that it took them quite some time and discussion with the hardware vendor to have a good and stable baseline matrix for firmware/ software and so on. Once they had...

One thing you didn't mention (or I missed it). Are you using Hyper-V 2016? Or lower?

R&D Forums

Maximize Performance for large Hyper-V Backups

Re: Maximize Performance for large Hyper-V Backups

Re: Maximize Performance for large Hyper-V Backups

Re: Maximize Performance for large Hyper-V Backups

Re: Maximize Performance for large Hyper-V Backups

Re: Maximize Performance for large Hyper-V Backups

Re: Maximize Performance for large Hyper-V Backups

Who is online