Write streams

maynard01 · Post by **maynard01** » Apr 25, 2014 3:59 pm this post

We've recently migrated our backup repository over to an EMC Data Domain (DD2500). For the most part things are going good however, we keep seeing that the target is our bottleneck on our backup jobs. The DataDomain statistics do not look like they are being overwhelmed when we compare them to other operations that the Data Domain is doing (i.e. SQL Dumps). With other posts that I have read I'm thinking that the single write stream might be our bottleneck. Our performance is not terrible however, we'd always like for it to be better if possible.

My main question about this write stream is how best do you enable more simultaneous write streams from Veeam to the Data Domain? I'd like to know if there are settings that can be changed, if we need to split our jobs further, or if adding more proxies is the ticket.

Any feedback from how the write stream technology works to how other folks have used deduplication appliances is appreciated.

Post by **tsightler** » Apr 25, 2014 4:50 pm this post

The easiest thing to do is to create multiple repositories on the Data Domain (they can just be directories of the same share), and then point different jobs at the different repositories, then limit the number of task per repository to less than the number of proxy tasks you have available.

For a simple example, if you have enough proxy resource to run 16 concurrent tasks, create 4 repositories each limited to 4 tasks, and then create 4 jobs each one pointed at a different repository and start them at the same time. This way each job will still get some concurrent processing, but you can also have 4 concurrent jobs running together so you have 4 I/O streams to the DD.

maynard01 · Post by **maynard01** » Apr 25, 2014 5:19 pm this post

Is the write stream limited at the job level or the repository level then? I'm a little confused. If it is as the job level then creating more jobs would help but if it's at the repository level then creating and targeting those new repositories with some of our jobs would help. Thanks.

Apr 25, 2014 9:51 pm

It's not really limited at the repository level, each job with an active task has an I/O stream, however, there's no way to limit the number of concurrent tasks within a job so just creating more jobs doesn't really guarantee that there will be more than one job running at a time.

To continue to expound on the previous example, let's say you have 100 VMs (I have no idea how many you actually have), and you create 4 jobs each with 25 VMs all pointing at the same repository. You still have 16 available tasks slots, so you start all four of your jobs at the same time. That doesn't mean that all 4 jobs will actually run concurrently, instead the first job that has 25 VMs will take all 16 tasks slots, but that's only a single I/O stream, the other 3 jobs will just queue up waiting for proxy resources. Eventually the first job won't be able to use all 16 tasks slots, so the second job will start picking them up and thus a second I/O stream will start. Finally the first job is finished and it's no longer using resources and it's I/O stream stops, but now the second job is using all 16 tasks slots, so you're back to 1 I/O stream. This will not provide the maximum performance because having 16 tasks running in one job when the target is already showing as the bottleneck is unlikely to provide significantly improved performance.

On the other hand, if you create 4 repositories, and point each of your 4 jobs at those, you will guarantee that those 4 jobs will start and create 4 I/O streams on the target immediately while still running 16 streams. Of course, this method isn't perfect either, as it requires more manual job planning since you'd want the 4 jobs to be roughly equal in size so you have to decide whether it's worth it, but for larger setups with many jobs this is the way to maximize performance of your repository. Normally, I try to find the optimal number of I/O streams and then work backwards from there to create repositories and jobs to fully utilize that capacity.

There are other potential ways to achieve the similar results based on how your jobs are configured. For example, if you create jobs by datastore, you can leverage the fact that Veeam limits the number of active snapshots per datastore to 4 (and this can be tweaks via registry). But the idea is that you want multiple jobs running to create I/O streams.

maynard01 · Post by **maynard01** » Apr 26, 2014 12:13 am this post

Great, thank you for all of the good information.

From your explanation I think we will need to create a new repository or three to get the performance we would like to achieve. We currently have 10 jobs with roughly 20 machines per job, these jobs are based off of Folder in ESX not the datastores. We'll add the repositories and see how far that gets us, but I will keep the concurrent snapshot setting in mind, as we proceed.

My hope is for any slowness to exist in the Source speed, then we will know that Veeam/DataDomain are operating as close to efficiently as we can get them.

R&D Forums

Write streams

Re: Write streams

Re: Write streams

Re: Write streams

Re: Write streams

Who is online