1 Backup Job vs. 2 Backup Jobs

jbarrow.viracoribt · Sep 21, 2015 8:36 pm

So we need to backup 160 (or so) VM's nightly.

This produces about a 10TB full backup file.

Once we get to 4 days retained, on the 5th day we start to do mergers, and merge every single night (to trim off an old retention point).

It was suggested by our local Veeam reps that was split this job up so that the file isn't so big and I'm trying to figure out why I should be doing that.

1. It makes my jobs more complex, I have to figure out what VM's fit in what jobs, then manage multiple jobs vs. 1 large job.
2. If I'm producing 2 5TB full backup files instead of 1 x 10TB backup file, won't the math pan out in the end and be about the same if both still need to merge?
3. Having separate jobs causes me to store more data as each job is going to do it's own deduplication and compression.

What am I missing here? Do things just run slower as the backup files get bigger and run faster if it's the same data but in smaller files?

Case: 01002324

Sep 22, 2015 4:14 am

In general putting lots of VMs into a single job will always be slower than having multiple smaller jobs. Exactly how much is difficult to predict because it's based on a multitude of factors. There is some advantage to simply having smaller files as there is less metadata per job, so less data to rebuild and less data to flush throughout the various stages of the merge process, but the primary benefit is from simply having mulitple merges occurring in parallel. In v8 the merge is a 100% serial process, each file from each point is merged in sequence so if you have 160VMs in a job, you will find that there will be ~1 minute of overhead per VM, which is almost 3 hours just working on metadata.

On top of that, when doing that serial merge, there's only a single I/O stream. Assuming you're backing up to decent storage this will not fully utilize the I/O available. To find out home many I/O streams you need to maximize I/O usage you can leverage the excellent whitepaper from my colleague, Luca, available here.

Larger jobs are also more prone to fragmentation as well as more commonly experience issues that might requiring running an active full to correct. With a single 10TB backup job, if you have something happen to a chain, you'll have to run another 10TB full and keep both for some period of time. If you have 4-5 smaller files, not only will the backups likely run faster and be less subject to such issues, but if you did need to run a full backup you can get by with a much smaller percentage of used space.

As a general "best practice" it's recommended that jobs should not exceed 50VMs, and the normal recommendation is 20-30 VMs per job. Yes, it's a little more work up front, but the benefit is faster and more reliable backups that take full advantage of the hardware and capabilities. I've been helping customers design their Veeam solutions for 4+ years (and for myself prior to that), and I can say that customers that follow these recommendations have far less support issues than customers that build huge jobs with 100's of VMs.

Regarding the space loss from lack of dedupe, my guess is it won't be nearly as much as you think. With 10TB of data, unless you are backing up tons of systems that are largely block-for-block identical, my guess is perhaps 2-3% extra space, certainly no more than 5% total. Admittedly that's not nothing, but if you're already running that tight then you're likely setup for trouble anyway. Assuming you're repository is sized appropriately this little bit of extra space will be made up for by the easier management you'll experience when something happens to a backup job and you need to run a new active full.

jbarrow.viracoribt · Sep 22, 2015 1:08 pm

tsightler wrote: but the primary benefit is from simply having mulitple merges occurring in parallel. In v8 the merge is a 100% serial process, each file from each point is merged in sequence so if you have 160VMs in a job, you will find that there will be ~1 minute of overhead per VM, which is almost 3 hours just working on metadata.

This would only be the case if I had multiple jobs, say 4, all running at the same time, all working on the merger step right? If for instance I split this large job out into 4 jobs, DEV, TES, TRA, PRO, and then had one run after another, I would not see these benefits. But if all four jobs fired off at once and hit the merge step around the same time, then we would be merging multiple files at each time, correct?

Post by **tsightler** » Sep 22, 2015 1:22 pm this post

It can also benefit if you have a job that is working on a merge when the next job is running backup. So indeed, if you simply chain everything then it probably won't help much, but even just having two jobs running can be a big benefit assuming the backend storage can deal with it. I work with customers all the time that have a huge job with 100's of VMs taking 24 hours and, after we split the same job into 4 job and let them run in parallel they run and complete in <8 hours, but this does assume your repository hardware (and your host environment) can deal with that, which certainly isn't always the case. That's part of the reason it is so difficult to make general statements on the forum regarding best practice. A customer backing up to an SMB NAS with 5x SATA drives in RAID isn't going to be able to do the same things as a customer using a C3160 repo with 56x drives and a 4GB SSD write-back cache.

jbarrow.viracoribt · Sep 22, 2015 1:41 pm

We only have 1 physical proxy to use in our environment so without the feature to allow me to set parellel processing on a per job basis, running multiple jobs at the same time gets really goofy. For instance, one of the jobs will gobble up all the proxy slots and the next 3 jobs will sit around waiting until the first job frees up slots. The second issue to running multiple jobs at a time is that I seem to only be able to have one storage snapshot in place at a time. If job 1 creates a storage snapshot, job 2 (if running at the same time) gets an error it can't create one.

Sidebar: I'm hearing good things about the C3160 box as a repository. Does it number crunch pretty well even with 7.2k RPM drives (which it seems to come with)?

Sep 22, 2015 2:14 pm

jbarrow.viracoribt wrote:For instance, one of the jobs will gobble up all the proxy slots and the next 3 jobs will sit around waiting until the first job frees up slots.

Curious, why do you consider this a problem? It's the way it's designed to work. Is there some reason you consider that not good? I mean, if all of the VMs are in one job those VMs are still waiting around for the proxy slots.

jbarrow.viracoribt wrote:The second issue to running multiple jobs at a time is that I seem to only be able to have one storage snapshot in place at a time. If job 1 creates a storage snapshot, job 2 (if running at the same time) gets an error it can't create one.

I wouldn't expect that unless there is some limit on the storage device for the number of snapshots that it can support, for example, not enough snapshot space available. What is the source storage?

jbarrow.viracoribt wrote:Sidebar: I'm hearing good things about the C3160 box as a repository. Does it number crunch pretty well even with 7.2k RPM drives (which it seems to come with)?

I personally love this box, but I don't think there's anything really special about it in the grand scheme of things, it's just a well engineered box for what it does, with lots of memory, CPU, and a great RAID controller, which is really the key from a repository performance perspective.

There's nothing really wrong with 7.2K RPM drives, the vast majority of the enterprise customers I work with use them. Sure they have less I/O per spindle than higher speed disk, but when you pack in 56 spindles that's still a lot of I/O, so pair that with a great RAID controller and you can get excellent performance. Having a nice, wide stripe formatted RAID provides great throughput, and the 4GB write-back cache really speeds up I/O intensive task like merges and synthetic operations. Probably the biggest weakness in the box is the limited ingest bandwidth (only 2x 10GbE ports, or mulitple 1GbE), which means, for full backups you can't actually get data into the box as fast as the disk can write it, but this isn't a huge issue for most applications. Throw in some SSDs for the OS boot and a great vPower NFS cache, and you've got a high speed repo for both backup, restore, and Surebackup functions.

Of course, and I've seen great repositories built from other vendor hardware, but I have a reasonable number of customers running C3160s in the field at this point, with petabytes of Veeam backups in total, and so far the feedback has been nothing but positive.

jbarrow.viracoribt · Sep 22, 2015 7:57 pm

I guess it's not a bad deal, running multiple backup jobs at once due to the time constraints involved. We found more of an issue when it came to parellell processing when it comes to multiple replication jobs which we want to all finish within the same time window not have 1 or 2 jobs wait due to limited proxy slots when 1 or 2 other jobs had the proxies all occupied.

We are using NetApp FAS8020 storage and when I fired off multiple backup jobs at once, each of which contain a VM or two that exist on our NetApp SAN I get this error:

9/22/2015 2:39:17 PM :: Creating storage snapshot
9/22/2015 2:39:18 PM :: Failed to create snapshot for LUN na_ls_lun_101 Details: Clone operation failed to start: Device busy..

Post by **tsightler** » Sep 22, 2015 8:22 pm this post

jbarrow.viracoribt wrote:9/22/2015 2:39:17 PM :: Creating storage snapshot
9/22/2015 2:39:18 PM :: Failed to create snapshot for LUN na_ls_lun_101 Details: Clone operation failed to start: Device busy..

Do you perhaps not have a Flexclone license on that box? I'm thinking that could be the reason for this limitation. If that's the case, I can see how that could be a challenge for multiple jobs, especially if you have only a single datastore.

cparker4486 · Post by **cparker4486** » Sep 28, 2015 1:54 am this post

jbarrow.viracoribt wrote:1. It makes my jobs more complex, I have to figure out what VM's fit in what jobs, then manage multiple jobs vs. 1 large job.

Hope I didn't miss the details in another post but if you're using vSphere you should be using folders to organize your VMs. For me I have folders based on business priority: T1, T2, T3. I have three backup jobs that are instructed to backup whatever is in the folders (respectively). There's very little thinking to do in this regard (except for disk exceptions and the like.)

I also have two backup copy jobs. One job copies the T1 backup and the other copies T2+T3 together.

Having said that, my system is a little overly complex for my size. I have ~25 VMs and a test backup of all VMs produced a ~1.5TB VBK. I'm considering backing up all folders with one job (as well as one copy job.)

In any case, folders in vSphere is the way to go. You do not want to manually manage which VM goes in which job. That sounds very painful.

patrickbeau · Post by **patrickbeau** » Sep 28, 2015 6:36 am this post

I had the same problem and i've made something like Cparker said. It now made a year and i've no more trouble in my jobs.
The size lost by the possible deduplication process of Veeam is not a problem for me, since I backup to a Windows 2012R2 with deduplication enabled which save a lot of space.

ps:
Source: 5 esxi host, 10Tb data.
Destination: R720XD w/ 12x4Tb + MD3200 w/ 12x4Tb all in Raid6 in a storage space pool.
History: 60 days for all (7Tb) except for file servers which are conservated as much as possible.

jbarrow.viracoribt · Nov 20, 2015 1:32 pm

tsightler wrote:Do you perhaps not have a Flexclone license on that box? I'm thinking that could be the reason for this limitation. If that's the case, I can see how that could be a challenge for multiple jobs, especially if you have only a single datastore.

We are licensed for FlexClone. Opening up a case on this now.

R&D Forums

1 Backup Job vs. 2 Backup Jobs

Re: 1 Backup Job vs. 2 Backup Jobs

Re: 1 Backup Job vs. 2 Backup Jobs

Re: 1 Backup Job vs. 2 Backup Jobs

Re: 1 Backup Job vs. 2 Backup Jobs

Re: 1 Backup Job vs. 2 Backup Jobs

Re: 1 Backup Job vs. 2 Backup Jobs

Re: 1 Backup Job vs. 2 Backup Jobs

Re: 1 Backup Job vs. 2 Backup Jobs

Re: 1 Backup Job vs. 2 Backup Jobs

Re: 1 Backup Job vs. 2 Backup Jobs

Who is online