We aren't nearly as big to justify a Netapp. Our main storage is a HP MSA2040, with one 16x450GB 10krpm RAID6 volume and one 6x900GB 10krpm RAID6 volume. The first one holds regular filedata like homedrives, profiles and regular fileshares. The second volume holds the VM's (Hyper-V in our case). Our backup device is a HP P2000 with 16x 1TB 7200rpm in RAID5. The backup volume has 2012R2 dedup enabled. I am using an offhost proxy.
While a complete different setup, Hyper-V with HP entry-level SAN's and just 4Gbps fiber (although dual path with round robin - actual rate is about 600MB/sec) my full jobs sustain about 200-300MB/sec. That's with only one job running at a time in order not to congest the storage, but with three or four volumes at a time. The MSA2040 has 4GB cache (hardly usefull when reading), the P2000 has 2GB of cache but no addition flash-cache or anything. Our storage can't do dedupe by itself, but as said we DO have 2012R2 dedupe enabled. 2012R2 dedupe is post-process though, not inline. That means when writing data there is absolutely no impact of using 2012R2 dedupe.
I can't believe our puny HP MSA2040 performs so much better than your Netapp with additional flashboards. To be short I think your issue is, or atleast should not be the netapp. As stated before here, check your network topology, are there 1Gb links in there between your hypervisors and proxy and / or proxy and storage (if not on the same box)? Do you see high cpu load on specific processes on your Veeam box? If so, do you have compression or dedupe enabled in Veeam as well? If the latter is true, note that compression is not beneficial to dedupe, and deduping on three levels (netapp, 2012R2 and veeam) defeats the purpose of it somehow.