Strategies for full backups that run too long

pshute · Post by **pshute** » Jan 18, 2017 4:39 am this post

We've been adding more and more VMs to our main backup job, and now it's no longer finishing overnight. I don't want it running during work hours, so what's the best way around it?

My first thought was to split it into two jobs, and run each one on alternate days. That should do what we want, but then it becomes more difficult to tell which VMs are backed up up and which ones aren't. Is there any other way around it?

Post by **Andanet** » Jan 18, 2017 10:47 am this post

I think best way is to split in more jobs to run at the same time.
Concurrent jobs permit to use better write stream to your repository.
We backup over 1000 VM in a backup window from 6.30PM to 05.30 AM.

Jan 18, 2017 10:49 am

Some info on your setup would help.
Using SAN backup mode?
Merging taking a long time?
Dedup box?
Where is the bottleneck?

Jan 18, 2017 5:25 pm

What version of VBR is it? If you're on 9.0, the Advanced Data Fetcher in 9.5 could help. Also Eamonn's questions are additional details we'd need to see what are the best recommendations to solve your problems.

Joe

pshute · Post by **pshute** » Jan 18, 2017 8:52 pm this post

jmmarton wrote:What version of VBR is it? If you're on 9.0, the Advanced Data Fetcher in 9.5 could help. Also Eamonn's questions are additional details we'd need to see what are the best recommendations to solve your problems.

Joe

Yes, we're still on v9. I'll upgrade it and see if it improves.

pshute · Post by **pshute** » Jan 18, 2017 8:57 pm this post

Eamonn Deering wrote:Some info on your setup would help.
Using SAN backup mode?
Merging taking a long time?
Dedup box?
Where is the bottleneck?

The job history says the bottleneck is "source", and shows that nearly 9 of the 17 hours are spent reading one disk of one particular machine, our Exchange server. How long should I expect it to take to read 1TB?

Not sure about the answers to the other questions, or how to find out.

pshute · Post by **pshute** » Jan 18, 2017 9:13 pm this post

Andanet wrote:I think best way is to split in more jobs to run at the same time.
Concurrent jobs permit to use better write stream to your repository.
We backup over 1000 VM in a backup window from 6.30PM to 05.30 AM.

It looks like just splitting one particular machine to its own job would fix the problem, but I'll have to test whether running them both concurrently will reduce the total time. I thought each job processed more than one VM at a time anyway. Is that not correct?

Post by **foggy** » Jan 19, 2017 11:32 am this post

Look for the transport mode tag in the job session log ([nbd], [san], [hotadd], etc. - you should see it right after the proxy server name after selecting the particular VM in the list).

pshute · Post by **pshute** » Jan 19, 2017 11:18 pm this post

Like this? "18/01/2017 9:52:13 AM :: Using backup proxy VMware Backup Proxy for disk Hard disk 7 [nbd]"

This is configured in the proxy properties, correct? It's set to automatic there. We've never changed any settings there.

I see the Max Concurrent Tasks is set to 2. Would increasing that help anything?

pshute · Post by **pshute** » Jan 19, 2017 11:25 pm this post

jmmarton wrote:What version of VBR is it? If you're on 9.0, the Advanced Data Fetcher in 9.5 could help. Also Eamonn's questions are additional details we'd need to see what are the best recommendations to solve your problems.

Joe

Now on v9.5, and last night's incremental took about as long as normal.

DaveWatkins · Jan 20, 2017 3:15 am

Like this? "18/01/2017 9:52:13 AM :: Using backup proxy VMware Backup Proxy for disk Hard disk 7 [nbd]"

This is configured in the proxy properties, correct? It's set to automatic there. We've never changed any settings there.

I see the Max Concurrent Tasks is set to 2. Would increasing that help anything?

If you're on 10Gb nbd mode might be ok, otherwise setting up Hot-Add would be an easy way to go. Without knowing how many VM's you backup or how fast your repository is it's hard to make any recommendations. Simply increasing the concurrent tasks would likely speed you up massively assuming your Repo can handle it. NBD mode if you're not on 10Gb is the slowest way to backup machines. Add a new windows VM and install the proxy role on it and then you'll be able to use Hot-Add.

If your B&R server is physical and can be connected directly to your SAN via iSCSI or Fibre Channel you could configure that for Direct SAN mode which may be faster still, but I doubt you'll need to go that far to fix your current issue, increasing the concurrent tasks on both the proxy and the Repository would likely be enough, assuming you have the resources to support it.

Answering some of the earlier questions about your archtecture and what Veeam reports as the bottleneck will go a long way to getting more good advice. At the moment we're guessing

pshute · Post by **pshute** » Jan 20, 2017 3:53 am this post

Our B&R server is physical, and is connected to the ESX host via 4 teamed 1Gb/s connections. I've never really even looked how it's connected before. Proxy is on the same machine.

Veeam was reporting this when it was set to 2 concurrent tasks - Source 99% > Proxy 12% > Network 1% > Target 0%. That was with a single VM with 3 disks. Overall speed was 47MB/s

I increased it to 4 concurrent tasks, and got this - Source 99% > Proxy 18% > Network 1% > Target 0% . Overall speed now 76MB/s

I increased it to 8 concurrent tasks and added another VM with another 5 disks, and got - Source 99% > Proxy 21% > Network 1% > Target 0% . Overall speed 84MB/s

I'm not convinced it's actually doing 8 disks at once, as some finish before others start. Certainly at least 4 at once. I changed it in the proxy properties - is there some other setting limiting it?

My tests are probably not representative of what I'll get for nightly backups, as the system is in use while I'm testing. My nightly backups have been getting an overall speed of about 63MB/s.

Can you please explain the bottleneck stats? What does source 99% mean? I know it means the source is the problem, but I don't know where the 99% comes from.

If I was to put the proxy on a VM on the ESX host, I assume it could read the disks much faster, but I'm not sure if I'd be able to give it the resources to handle it. I guess I'll just have to try it and see how it works out. I'll see how tonight's backup goes before I decide whether it's worth trying.

pshute · Post by **pshute** » Jan 20, 2017 4:18 am this post

The throughput graphs on the job history were as high as 119MB/s. I'm not sure what they were before because it only shows a graph on the most recent job. Is that statistic saved anywhere?

pshute · Post by **pshute** » Jan 20, 2017 4:20 am this post

A question has been asked here about putting a proxy on a VM. Our ESX machine has three hosts. Would there only been a read speed increase for VMs on the same host as the proxy?

Post by **foggy** » Jan 20, 2017 10:50 am this post

pshute wrote:Can you please explain the bottleneck stats? What does source 99% mean? I know it means the source is the problem, but I don't know where the 99% comes from.

If I was to put the proxy on a VM on the ESX host, I assume it could read the disks much faster, but I'm not sure if I'd be able to give it the resources to handle it.

Bottleneck source means that data cannot be retrieved from the storage any faster. Currently source data reader is the slowest component in the data processing chain, while other components are able to process more data, but just sitting and waiting for it. Giving them the ability to process more data will result in overall backup performance increase (and bottleneck will probably shift to another component).

pshute wrote:The throughput graphs on the job history were as high as 119MB/s. I'm not sure what they were before because it only shows a graph on the most recent job. Is that statistic saved anywhere?

You can look up previous jobs stats in the sessions History.

pshute wrote:A question has been asked here about putting a proxy on a VM. Our ESX machine has three hosts. Would there only been a read speed increase for VMs on the same host as the proxy?

If the proxy VM has access to the shared storage, it will be able to use hotadd for all VMs stored there, regardless of the host they reside on.

pshute · Post by **pshute** » Jan 23, 2017 10:22 pm this post

foggy wrote:Bottleneck source means that data cannot be retrieved from the storage any faster. Currently source data reader is the slowest component in the data processing chain, while other components are able to process more data, but just sitting and waiting for it. Giving them the ability to process more data will result in overall backup performance increase (and bottleneck will probably shift to another component).

OK, that makes sense - it's reading from source 99% of the time, and is always making other components wait. But how then did allowing more concurrent tasks manage to get more data from the host? How can the host send data more quickly just by doing multiple streams?

You can look up previous jobs stats in the sessions History.

I can see duration, processing rate, processed, read and transferred for all sessions, but they don't tell me everything that's on the graphs. I'm interested to see what maximum transfer rate we achieved, and whether there were periods of low speeds.

The processing rate seems a bit deceptive. It's the amount of data processed divided by the job duration. But swap file and deleted blocks don't actually get read, making the rate higher than the actual read rate. But all the statistics to do with actual read rates are unavailable for all but the most recent job.

Jan 24, 2017 12:56 pm

pshute wrote:But how then did allowing more concurrent tasks manage to get more data from the host? How can the host send data more quickly just by doing multiple streams?

Storage systems often perform better with multiple parallel streams instead of a single one.

pshute wrote:The processing rate seems a bit deceptive. It's the amount of data processed divided by the job duration. But swap file and deleted blocks don't actually get read, making the rate higher than the actual read rate. But all the statistics to do with actual read rates are unavailable for all but the most recent job.

Processing rate is calculated off the actually read data. But overall, I can see your point here.

pshute · Jan 27, 2017 4:22 am

Now that we've increased the number of concurrent tasks and rearranged the order of the VMs so that the biggest ones start backing up first, and also enabling deleted block skipping, our full backups are done in 8h20, down from 17 hours. We now have a few hours of time left in the night we can expand into. No need to try installing a proxy on the host for now. Thanks for your help with this.

pshute · Post by **pshute** » Jan 31, 2017 9:33 pm this post

A possibly unrelated question - would enabling Per-VM Backups improve the speed where the bottleneck is Source? I'm think no, but I'm interested in enabling it anyway, so I'd like to know if there's an additional advantage.

The main reason I'm tempted to enable it is that our repository is short on space to do restores from tape. If I change to Per-VM, I will be able to restore just a single machine's backup without having to have enough room for the whole backup? Are there any disadvantages to splitting the backups up, apart from having way more files in the repository?

R&D Forums

Strategies for full backups that run too long

Re: Strategies for full backups that run too long

Re: Strategies for full backups that run too long

Re: Strategies for full backups that run too long

Re: Strategies for full backups that run too long

Re: Strategies for full backups that run too long

Re: Strategies for full backups that run too long

Re: Strategies for full backups that run too long

Re: Strategies for full backups that run too long

Re: Strategies for full backups that run too long

Re: Strategies for full backups that run too long

Re: Strategies for full backups that run too long

Re: Strategies for full backups that run too long

Re: Strategies for full backups that run too long

Re: Strategies for full backups that run too long

Re: Strategies for full backups that run too long

Re: Strategies for full backups that run too long

Re: Strategies for full backups that run too long

Re: Strategies for full backups that run too long

Who is online