
I've been using Veeam since its infancy days (8yrs or so) but over the past week I've come to realize how little I still know about Veeam & how best to configure it. Before diving into my query, I'll explain briefly our environment - vSphere55U2, 3 main Clusters of VMs I back up and replicate using Veeam. Use Nimble for prod (VM/source) storage, VNX5300 for target (Repository) storage, and Veeam8U2. How I've config'd Veeam in yrs past - proxy VMs (1 per Cluster) config'd with 4GB RAM & 4 cores & run 4 conc tasks; 1 Veeam VM for b/u & 1 Veeam VM for replication at remote site (jobs are config'd similarly as on the b/u server VM, as far as VMs go..same amt of VMs, same data/size); add VNX storage volumes to vSphere as Datastores, then config a VMDK taking up that whole volume & present it to the Veeam VM as OS Volumes & use those as Repositories. I've config'd the proxies to use nbd specifically because of occasional 'permissions' errors on a VM or 2 within a job, or snaps not releasing from proxies when config'd to use hotadd. I have similar config for my replica Veeam VM (we have a separate VNX for replica data). Performance runs from 3MB/s to 30-40MB/s. Based on numbers I've heard folks get, this is horrible performance/throughput. But, jobs finish generally in under 2hrs. Sometimes jobs take longer. And, I use reverse incr to try & minimize storage size impact of the b/u data. Jobs have anywhere from 8-30VMs in them & anywhere from 1/2TB data (at least total processed data) to just under 5TB. I have no doubt my throughput, speed, & job completion time can be increased/minimized...and probably significantly. So, here I am

Our goal is to move away from VNX as target & use Nimble for everything, and to try & get more than 7 restore points (i.e. 7 days worth of data). That being said, I don't mind starting from scratch and will ask this - given a vSphere environment with 3 Clusters of VMs & Nimble storage, how would you EXPLICITLY (details please) set up Veeam for best performance?....assume a 'blank slate' (how config proxies, how many, conc tasks, how set up repositories & where [proxy or Veeam VM] & conc tasks, what Veeam features to turn on or off, etc., etc.). I don't think there's any true benefit between Direct-SAN & hotadd performance-wise, but am open for either.
I did a test config like this - config'd a proxy VM with 4GB RAM & 4 Cores & added 2 vmnics & presented them as iSCSI to a vSphere vDS (& enabled 9000MTU in the guest OS nic settings). Using Nimble Connection Manager (NCM), which for those who don't know is basically a proprietary piece of software you install in Windows (i.e. on the proxy and/or Veeam server) that's used as a guest OS software iSCSI Initiator. I 'connected' all my Datastores to the proxy via NCM, but didn't present them as volumes (which, if I did would wipe out all my VMS!). Then, I configured a volume on my Nimble & presented that to the proxy which I then DID configure as a Volume in Windows. I then config'd that new volume on the Proxy VM to be used as a Repository on the Veeam VM. I config'd a b/u job to then use that 1 Proxy (config'd the proxy transport mode to 'autoselect') and use the Repository presented to that Proxy as the target. Since I have 4 Cores on the Proxy, I ran at 4 conc tasks, and had the same config'd for the Repository (4 conc tasks). On initial seed/job run, I hit around 400MB/s. My orig job config on initial run only max'd at just under 200MB/s if I recall. Subsequent job runs on my test job only hit at anywhere from 20MB/s-60MB/s. Why such a drop-off on the incremental runs? I spoke with a Veeam engineer last wk & asked & he said the b/u speed just hadn't had a chance to reach multiple hundreds of MB/s yet before the VM finished backing up. Hmm...don't think I believe that, but open for suggestions.And, I believe how I have this test job config'd is a complete Direct-SAN config. I got a suggestion from someone on Twitter who is getting 200MB/s I believe on subsequent (incremental) job runs who uses hotadd. His config is a bit like my test job- 1 Proxy per Host in a Cluster; Proxy connected to Datastores via NCM; Proxy config with 4 cores & thus 4 conc tasks; Repository config'd 'directly' on Veeam VM (proxy feature on Veeam VM disabled) using NCM; config b/u job to 'autoselect' proxy & use the Repository config'd on Veeam. Only latency I would suspect with that config is the network stack transferring data from Proxy to Veeam VM to the Repository. But, according to him, that isn't latent at all. So anyway, given all this (sorry for the long post), what do you all suggest as a super fast b/u (& replica) job config. Oh, 1 other note... our LAN & WAN is 10Gb, so network "shouldn't" be latent; although, I think somewhere in the path there is 1Gb that slows things down, but to this point not sure where that is (Windows guest OS for Proxy or Veeam VM?...not sure what speed the vmxnet3 adapter runs at). The problem I see with my test job is ability to only use 1 Proxy...at least for 'best performance'. I mean, sure I can config the job to use all my proxies (autoselect), but not all the proxies are configured to use the Repository.. just the 1 Proxy (would think there would be some kind of conflict/corruption if all Proxies were connected to the same Volume?). So, using a 1-to1 config doesn't seem too efficient to me.
So, I thinks that it. I think that's all my questions

Thanks much all for any input/insight you can provide.
@coolsport00