masonit wrote:Well I have already answered all your questions in the thread but I try again.
Oops, sorry about that. I guess that's what I get for answering a post first thing in the morning.

Please accept my sincere apology.
masonit wrote:As I said we don't limit anything on the repository but on proxy we have set 20 concurrent tasks. Last night I tried with 40 concurrent tasks but that didn't help. I could try with even more but 40 concurrent tasks should be enough to get enough streams. If we allow to many concurrent tasks we get another problem were there are alot of active snapshots in vmware. Because of the heavy load on the storage, backup take forever and eventually when the backup is done. Then removing the snapshots will use alot of I / O in vmware and thats never good.
So this is why I'm thinking it's taking longer. I'm assuming you have quite a number of jobs, i.e., enough that you were able to run 20 jobs concurrently at one time. This would have created 20 I/O streams with only 20 VM snapshots open, and would likely have kept your repository busy pretty much 100% of the time (pretty much confirmed by the bottleneck stats of 99%).
Now you have 20 tasks, which is completely different. If a job has 25VMs, and some of the VMs have mulitple jobs, 20 tasks may very well only keep one job running, even though it is actively backing up 20 VM disk, they are all going to 1 single backup file, and thus only 1 I/O stream on the repository.
To get maximum performance from your setup I would anticipate that you would need 4-6 I/O streams to fully saturate the backend storage. Where does this number come from? You have 17 disks in your array, with 512K stripe size. Typical I/O size for Veeam is 512K as well (assuming Local target), which means that a typical Veeam I/O will be serviced by only a single drive in the array. Since you are using reverse incremental, each changed block creates three I/Os (two write, one read) so that will also increase the number of disks used since parity bytes will need to be read and re-written, however, overall, there's no way that a single job can saturate the spindles.
If I were you I would reorganize my repositories to get optimal performance from the storage array, as well as optimal parallel processing. For example sake here's what I'm talking about:
Non-Parallel Config
20 Backup Jobs
1 Repository - No Limits
1 Proxy - 20 tasks
With this config you could potentially have all 20 jobs running at the same time, each processing a single VM disk. This provides good performance, and likely completely saturates the repository, but likely means that VMs take longer than they should since there is contention for the repository as it can't really handle I/O for all 20 jobs at once so it's the bottleneck.
Parallel Config - non-Ideal
20 Backup Jobs
1 Repository - No Limits
1 Proxy - 20 tasks
This is basically what happens if you just tick the "Enable parallel processing" option. Now, even if you start all 20 Backup jobs at the same time, only a small subset (perhaps even only one), because all 20 tasks slots may be assigned to a single job. This means there may only be 1 or 2 I/O streams going to the repository, thus not fully utilizing the repositories available capacity for random I/O.
Parallel Config - Ideal
20 Backup Jobs - 4 jobs per repository
5 Repositories (just sub directories on the same disk subsystem) - 4 tasks per repository
1 Proxy - 20 tasks
This setup requires a little more planning, but it allows all 20 jobs to start at the same time and makes sure that 5 I/O streams are going the the repository at all times (well, assuming a given repository has jobs pending). This should generate approximately enough I/O to keep the repository saturated, without completely overrunning it, thus optimizing the time any given VMs snapshot is held open, and allows for the additional efficiencies of parallel processing.
Note that the number of repositories is more of an example based on the information you've provided, but it fits other profiles I've worked with in the field. I have an IOmeter profile that you can use to test and see how many I/O threads are optimal for your repository. One of the most common mistakes I see is that people think of "tasks" as I/O threads, but that's not the case. "Jobs" define I/O threads, so a single job with a 20 tasks is still one I/O thread. Indeed if the repository needs mulitple I/O threads to hit saturation having 20 jobs each with one "task", which is 20 I/O threads, can easily be faster than 1 job with 20 "tasks" which is only 1 I/O thread.