We have a very similar setup to you (HP Blade Servers, 3PAR, StoreOnce, etc). My observation is:
What is the maximum concurrent tasks you have configured for the Veeam Repository? StoreOnce is great at ingesting data in a single stream, but not multiple streams at the same time on a NAS/CIFS Share. The best solution we found was to limit the number of concurrent streams to 12 in the backup console (we found this value specified in an old StoreOnce User Guide as the maximum supported NAS/CIFS streams supported).
We have two proxies, each with two eight-core CPUs (16 cores each) so can process 32 VMDKs with parallel processing enabled. We have the StoreOnce added as a CIFS Share. If we allow the proxies to flood the StoreOnce with 32 streams we see the performance nose-dive. Limiting to 12 seemed to give us the best of both worlds (we found this figure in an older StoreOnce User Guide).
Also important is the StoreOnce OS Version as older versions had performance issues specifically with Veeam. Anything higher than 3.11.4 I believe is best suited for Veeam backups. We are running 3.13.0 at present and have no issues (I believe 3.13.1 will be required for Veeam v9 and it's StoreOnce Catalyst integration).
If I am running a backup job of a VM with a single VMDK I can see upwards of 500MB/sec writing to the storeonce as it is a single stream of inbound data to the StoreOnce. As soon as other VMDKs are backed up at the same time the performance starts to drop, this is just the trade-off with using a dedupe appliance as the repository.
The average speed of my backup jobs (I have up to ten backup jobs all running at the same time but the concurrent operations set to 12, so usually about 4-7 VMs being processed at the same time as each VM usually has 2 or more VMDKs) and my jobs all average around 100-150MB/sec. Remember, that is up to 12 VMDKs being processed at the same time across a number of different jobs and each one processes at 100-150MB/sec.
To get the highest throughput possible you can disable parallel processing in your jobs or at the Veeam server level, however I wouldn't recommend doing this. I would rather process all my VMs at the same time than see my stat lines show higher throughput. As is with any file copy if you copy 1 file as opposed to 10 files at the same time, the performance overall will drop when doing multiple concurrent tasks. It's finding the sweet spot, which for us was 12 concurrent tasks (that is with a HP StoreOnce 4430 with 2 x 10GbE and 36 x 2TB 7200RPM SATA Disks).
As long as you are using the 3PAR Snapshot Integration then you have already offloaded the processing from your ESXi hosts so if the 3PAR Snapshots remains for, say, 20% longer whilst you process the data its not that big of a deal as there is no load on the ESXi hosts during the backup. That's how I look at it. My Active Full Backups on a weekend take about 24 hours to complete (some jobs finish in an hour or two whilst our Exchange 2013 VM which is 6.5TB takes almost a full day) but because the processing is away from the ESXi host I don't really mind.
When Veeam v9 comes with it's native StoreOnce Catalyst integration I am hoping to see about 3x improvement in backup performance. At the moment Veeam has to send every block to the StoreOnce and then the appliance decides whether it has to write it or can discard, in v9 that dedupe is offloaded to your proxy so only blocks required to be written are sent over to the StoreOnce. This dramatically increases the write performance as it allows the StoreOnce to simply act like any old NAS array and just concentrate on writing data, it doesn't have to do dedupe calculations.