We have a cluster with 12 nodes, 12 CSVs, and around 400 VMs. We also have other clusters with various configurations, some of which use SoFS SMB storage and off-host proxy configured on the jobs.
Off-host proxy lists will distribute tasks to stream each individual VHDx on any of the proxies based on usage. I.E. A VM with 3 VHDs would get a snapshot taken by the host running the VM, then 3 off-host proxies might each choose one of the VHDs to actually process and send to the repositories. This behavior is very efficient and jobs seem to rarely sit waiting between snapshot and data transfer.
On the other hand, clusters with CSV storage will do the same type of process, but when it comes time to transfer the backup data all VHDs are streamed by the CSV owner. This causes an extreme bottleneck in the whole process. Yes we can increase the max thread count on our Hyper-V host proxies, but this only pushes the issue back a bit. Even with 8-10 tasks per host, we often will see a job waiting between snapshot and data transfer. If we need to backup three VMs with 4 VHDs each who are all on the same CSV, unless we have 12 max threads set on the host (a bit higher than recommendations), the job will have to wait even though every node in the cluster has access to the CSV and could easily split the load. The bottleneck of not utilizing resources that are available is an obvious benefit, but the bigger issue is that to get maximal job optimization, the CSVs need to be distributed evenly across the hosts.
This isn't so much of an issue if the csv count matches the host count (in the grand scheme of things during our nightly backup window, but it is still an issue if one wants to quick backup a handful of VMs) as failover clustering will balance the CSV owner role across the cluster, but on some of our smaller clusters we have something like two or three times the hosts as CSVs (this making it so that we will always have a bottleneck of whichever hosts own the CSV role). And on one of our clusters we have twice as many CSVs as hosts due to the architecture of the cluster, but half of the CSVs don't have VMs that need to be backed up. This particular cluster is interesting as after a rolling reboot for updates, sometimes the few CSVs that we take backups on will be doubled up on the hosts (making this situation even worse)
Not sure if it's relevant, as I don't think the behavior is different using RCT in ws2016, but all our main clusters are running WS2012R2.
TL;DR - Please allow CSV-based cluster backups to distribute VHD processing across the cluster instead of confining it to the CSV coordinator. I'm not aware of a technical limitation for this behavior, if there is please let me know.