The setup of my jobs is:
App Group 1 ---> Auto Proxy ---> Using CIFS Share Direct to StoreOnce NAS Share (Limited to 12)
App Group 2 ---> Auto Proxy ---> Using CIFS Share Direct to StoreOnce NAS Share (Limited to 12)
App Group 3 ---> Auto Proxy ---> Using CIFS Share Direct to StoreOnce NAS Share (Limited to 12)
And so on ...
My two proxies are HP BL460c Gen8 Servers with 2 x 8 Core Intel Xeon, 128GB RAM, HP FlexFabric LOM (2 x 10GbE Uplinks in a Microsoft Switch Independent Team) and HP QMH2752 (Dual Port 8Gb Fibre HBA for connection to 3PAR) and running Windows Server 2012 R2. The proxies can handle 16 concurrent tasks each, and we leave it set at that (everything will bottleneck to the lowest concurrency of your proxies and your repository so the 16 never gets reached anyway).
The StoreOnce we have added directly to Veeam as a CIFS Share. I did ask the question on these forums of whether I should remove the repository and re-add it as a HP StoreOnce in version 8 and Veeam replied, I believe it was Anton Gostev who replied, that in Version 8 adding a StoreOnce as a HP device doesn't really change anything, just sets the default settings (in v9 this will change so you can add a Catalyst Store as a repository), so we have left the repository as a CIFS Share. There was no benefit to re-adding it as it doesn't change how Veeam interacts with it.
I have the concurrency set on the CIFS repository at 12, so this is limit on the concurrency that the proxies will not go-over.
I just checked my StoreOnce's and I too and now using StoreOnce OS 3.13.0 (we upgraded a week or so ago).
All of my jobs start at the same time and I often see VMs in jobs saying they are waiting for resource availability. This is due to the max concurrency of the StoreOnce being set to 12 and the two proxies combined being able to process 32 disks. The 3PAR Storage Snapshot has already been taken so its not like this is placing an unnecessary load on the ESXi Clusters so I don't worry about seeing this. The VMs will process in due time. Once that 3PAR snapshot is done I know I have a point-in-time snap of the VMs so even if they weren't processed for 12 hours or a day I'm still backing up the VM from the point in time the 3PAR snapshot was created.
I can, sort of, understand limiting the concurrency on the storeonce to 1. This is a HP recommendation for a specific reason. To get the best possible dedupe HP recommend limiting the inbound data stream to a StoreOnce to be single objects. I'm not quite sure how this technically makes the dedupe any better other than the StoreOnce receives one full block of data and stores it and then moves on to the next one. with multiple streams it could receive parts of one stream and parts of another and whilst they may be identical it hasn't received the full block of data yet so may not make that data match. However, in my testing I have not see an overall increase in performance to the storeonce by setting the concurrency to 1. All I saw was a similar overall time to back everything up, just the VMs sat there saying "waiting for infrastructure availability" for longer.
If you have the storeonce concurrency set to 1 then there is no benefit to selecting only one proxy as if you had ten proxies you'll still only get one of them sending data at once. I like having multiple proxies for the redundancy, I don't need to edit any jobs if I have a failure of one proxy. I also split my proxies across blade enclosures so I can still backup if I lose an entire blade chassis.
In terms of networking, we have the following:
HP StoreOnce - 2 x 10GbE in Bonding Mode 4 (Ether-Channel/Port-Channel) - This means the StoreOnce can process up to 2 simultaneous data streams, both at 10Gb/sec. There is no guarantee this will always happen as you are reliant on the load balancing policy to distribute across both NICs using the Source MAC Hash, but its the best we can achieve with the limited network control on a StoreOnce.
HP Blade Enclosure - 2 x HP FlexFabric 10Gb/24-Port Modules, each module has 2 x 10GbE Uplinks, so 4 x 10GbE uplinks per enclosure. These are also setup in Port-Channels on the Cisco Switch (the two 10Gb/E ports in FlexFabric Module 1 are in Port-Channel A, and the two 10GbE ports in FlexFabric Module 2 are in Port-Channel B). We also have two shared uplink sets, one for FlexFabric Module 1, and another for Module 2. Every VLAN is added to both shared uplink sets. This is the only way HP support using an Active/Active configuration for Virtual Connect Modules. If you don't do this you end up in an Active/Standby configuration and half of your uplinks out of the enclosures are essentially useless unless you lose a virtual connect module. All traffic into the Standby module is passed from that module over to the active module and then out to the network, which is not what I personally want.
On the blade servers that act as the proxies, their Virtual Connect Profiles are setup as follows:
NIC 1 ---> Force same vlan mapping as Shared Uplink Set 1 ---> Network from Shared Uplink Set 1
NIC 2 ---> Force same vlan mapping as Shared Uplink Set 2 ---> Network from Shared Uplink Set 2
Within Windows 2012 R2 we then create a Network Team using both 10GbE network adapters and set the load balancing mode to switch independent (this is because the blade servers cannot establish a port-channel with the virtual connects or the upstream cisco switch, the virtual connects create that port-channel).
The only downside we have to this configuration is that if one of the virtual connects is offline or fails then the blade servers report a loss of one of their nics and that nic can't see its traffic to the other virtual connect modules. But I see this is a low-risk or even not a risk at all because if a virtual connect goes down in an Active/Standby configuration you are still only getting the throughput of one module anyway, and even when everything is happy and online you still only get that. Maximum performance whilst still having redundancy is what I go for.
Hope that helps ... sorry for the wall of text. It's kinda hard to explain this setup quickly
The last point I wanted to make was that all of my backup jobs use the Forward Incremental processing mode and we perform an Active Full every Friday. We have approximately 25TB of VMs we backup with the above setup and we achieve this quite easily inside 4-5 hours each night for incremental job runs, and usually within 14 hours for a full backup every Friday night. The only exception is our 6.4TB Exchange 2013 Mailbox Server VM which takes about 20-24 hours to complete a full backup, however some of this time is usually spent waiting for resource availability due to the concurrency limits. We have no concerns with this.
We also have the exact same setup at another datacentre (we try to ensure our datacentres are replicas of each other) and we have the exact same experience at this other site as well so it's not something we've fluked, the setup so far appears solid.