We were using on-host backup for 3 years, as off-host wasn't working well with our Nimble storage. Since the very latest relase of Nimble hardware VSS provider, off-host backups are working well. However, it seems like they are slower then on-host. We had like ~200Mb/s with on-host and ~120Mb/s wit off-host backup.
I try to describe our environement below as precise as possible and would be thankful for any input from people, who may have similar-sized environements and were struggeling with performance "issues". I know that it not really is an "issue", as we have pretty good throughput. But as we also have pretty big vSphere environements where we use Veeam, i know that even more throughput is possible, if the base supports it. So i basically search for best practices for hyper-v, that could suit our situation and help speed up our backups.
We run two datacenter on two different sites, but they are populated equally, so i describe only one site below:
Storage: HPE Nimble CS3000, 80TB HDD, 3TB SSD (Cache), 2x 10Gbit iSCSI
Hyper-V: 6x HPE Proliant DL360 Gen9/10, 512GB RAM, 2x 10Gbit iSCSI
Backup Proxy AND Repository: HPE Proliant DL380 Gen8, 16 Cores, 192GB RAM, 2x 10Gbit iSCSI, Dual 6Gbit-SAS, 16 concurent tasks configured on proxy and repo
Backup Disks: JBOD 180TB (60x MDL SAS in Windows Storage Spaces), ReFS 64k, Attached via dual 6Gbit-SAS to the above Repo-Server
Backup Jobs: ~30 Jobs with 10-30 VM's each. VM's which are in the same job, also lay on the same hyper-v host, so we can leverage the "process multiple VM per snapshot" function. The jobs run 3 times a day each.
GFS Jobs: All Backup Jobs have a separate GFS job. GFS is running once a day and are utilizing ReFS Blockclone.
I'm happy about any input that some people may have, which could help to improve our backup performance. But i also have a few specific questions, where i would be glad to have an answer for:
- Is it an exprected behavior, that (on hyper-v) backups from storage snapshots (off-host) are slower then on-host backups in our configuration? That said, should we stay with on-host, as long as we do not encounter any performance drops on our production hosts during backup times?
- If we use the same physical server (with 16 physical cores) for repo and proxy, should we limit concurrent tasks for repo and proxy to 8 each, or can we go with 16 each?
- Bottleneck is shown as "source" most of the time. Doesn't matter if we use on- or off-host backups. I doubt that this is true, as our Nimble Storage isn't busy at all. Is there any way to track down the bottleneck any further?
- Are there any optimizations i could consider in this specific scenario?
Thanks in advance for any input
