SOBR capacity balancing with ReFS repositories

sarnold · Post by **sarnold** » Oct 27, 2020 6:34 am this post

I've recently taken our 2x 56TB SANs and reformatted them to move from NTFS to ReFS. Before I did that, I created a SOBR of both of them, and then I migrated all the old backups created on NTFS away, and then formatted one SAN at a time, and once the SAN was formatted as ReFS, moved the data over to that one. Long story short, I now have the 2 ReFS formatted SANs in a Scale-Out Backup Repository. Great so far.

I have 3 jobs. One of the jobs is fairly tiny with only one physical machine, but the other 2 jobs are fairly large (one contains all of our remote office site servers, the other contains all our data center servers). Those 2 jobs are 5-8TB in size each. I create synthetic full backups every week for each of the 3 jobs, and since I'm now on ReFS, I've been working on creating a new Active Full backup of each of the 3 jobs, as I understand an Active Full needs to be taken once the repo is on ReFS to take advantage of the true storage cost savings the next time a synthetic full is taken. Again, looking good.

So! Now that you know a bit about the setup, my question is this. Seeing as the SANs are the same size, I ended up with the backup directories of the single machine job and the smaller of the 2 large jobs on one SAN, and the largest job on the second SAN. In running incremental jobs after resuming the jobs once everything was transferred, the incrementals went to their respective job folders on the correct SAN, as they should. When I then started the process of creating Active Full backups, the active fulls of both the first 2 jobs stayed on the first SAN, as expected because that's where their job folders are, BUT, I just noticed that because the first SAN had a bit more free space than the other SAN, as I'm running the Active Full for the largest backup job (final Active Full I need to do), it's ALSO going to the first SAN now, away from the other job files for that job residing on the second SAN. I only discovered this because an incremental backup for the other large backup job that had completed it's Active Full the night before was now hanging at "Waiting for backup infrastructure resource availability" and I thought "that's odd...I put the files on two different extents"...then I went to check the drive contents and realized the Active Full for the large job I was doing had decided to go on the extent with slightly more storage space available.

Now, this makes sense, because I have the SOBR configured for Data Locality (figuring this would be the best use of ReFS deduplication), and all Data Locality requires is that the files of the same chain be on the same extent, not necessarily all files for the job. My concern now though is...once the backups age out and are deleted, is the second extent just going to be empty and sitting there doing nothing? Previously when I only had the one NTFS SAN doing all 3 jobs, I never ran an Active Full, I just let the weekly synthetic fulls happen. But now I'm concerned that I've overlooked something, because the only way I see this going is that the second extent ends up empty, and now I'm wondering if having these in an SOBR is still even a good idea at all (SOBR with ReFS was originally recommended to me by Veeam, and it sounded great until I realized what was happening here). If the large job running an Active Full now was still on the second extent with the rest of its job files, the incremental for the other job would be running right now. But instead it's just waiting for resources, because now all 3 jobs have made it to a single extent. Am I missing something?

EDIT: One other note, we're not using per-VM backup files because the data deduplication from Veeam we're getting right now by having all of our site servers in the same backup file is pretty good and we're hoping to not lose that, but maybe that's the only way around this to make better use of having two extents. Open to suggestions.

Thanks for any advice!

Post by **HannesK** » Oct 27, 2020 7:19 am this post

Hello,

we're not using per-VM backup files

you answered the question on your own here

SOBR balancing only works properly with per-VM chains.

I have a clear opinion on per-VM chains: always use them if you have more than a hand full of very small VMs.

There is this small amount of deduplication vs. this list (random order) of advantages of per-VM chains:

- Easier tape restore
- More performance through parallel processing
- Easier job management (put more VMs in one job)
- Resource usage with SOBR
- Optional Windows Server 2016 Dedupe if files < 1TB
- Easy deletion of VMs from backups
- Per VM accounting

Best regards,
Hannes

sarnold · Post by **sarnold** » Oct 27, 2020 7:28 am this post

Thanks for your reply Hannes. This makes sense. 2 followup questions then:

[1] Since only files belonging to a given backup chain need to be on the same extent, is it possible that the second extent would still be a potential destination for Synthetic Fulls? This would mean that the synthetic full created at the end of the week, and the following incremental backups during the next week would all be on the same extent, but they could be on a different extent the week after.

[2] Let's say we do move to per-vm backups; what do I need to do since our extents are ReFS? Can I simply enable the "per-vm backup job" on the SOBR Advanced Settings, hit Finish, and it'll take effect on the next run? Do I have to do another Active Full like I did when moving to ReFS? What's the best course of action for migrating to per-vm in our scenario?

Post by **HannesK** » Oct 27, 2020 9:29 am this post

@1: synthetic fulls must be on the same extent. otherwise you loose all REFS benefits.

@2: yes, you need active full. the software does not split existing files into small files. that would cause massive IO.

sarnold · Post by **sarnold** » Oct 27, 2020 8:11 pm this post

Ok, this makes sense. Now, in the current state, I'm using "Data locality" as the policy on this SOBR. Even if I were to switch to per-vm backups and run Active Fulls for the jobs I have to get them started on per-vm, the extent with the most amount of space is still the first SAN, and because I'm now using ReFS across the board, it makes the most sense in my mind for the data for the same job to be on the same SAN (I may be missing something, please correct me if I am).

Assuming I change to per-vm backups on the SOBR and I re-run Active Fulls on each job, am I going to have the same problem where everything is only on one extent, since that's currently the one with more storage space available?

Should I be changing the SOBRs I have to use the Performance policy instead of Data locality so I can fine tune a bit what goes where, or is this pointless in my situation?

sarnold · Post by **sarnold** » Oct 28, 2020 2:20 am this post

After reading online on other's experiences for a day, my concern I think with using SOBRs and ReFS in the current state where one 56TB extent has roughly 10-15TB more free space than the other 56TB extent, is that I'll never get them balanced without deleting all backups, enabling Per-VM backup chains on the SOBR, and running the jobs fresh so that Veeam places the VMs backups from all 3 jobs evenly between the two extents. Because the way I see it right now, once the retention period ages out the backups, all the backups on the other extent are going to be deleted, and nothing will ever end up there unless A, the first extent fills up, or B, the first extent is offline AND the box is checked to run a full backup to the second extent because of the first being offline (which that box is currently not checked because of these extents being ReFS and wanting to take advantage of ReFS deduplication).

However, that obviously means that I need to either move all the history off of the SANs, which takes 2-4 days, or risk it and delete all the history files, optionally go a step further and reformat the SANs and recreate the SOBR, and start the backups ASAP, which should then evenly distribute the files between the two SANs.

Obviously, that would probably work, but would be heavily undesirable from losing those backups. Is that really the best thing to do to actually get full advantage of ReFS based SOBR extents?

Post by **Steve-nIP** » Oct 28, 2020 6:43 am this post

If you want to take advantage of all the features of ReFS and Veeam, you need to use data locality. If you don't, you won't get fast cloning or spaceless full functionality.
Of course, Per-VM chains should always be enabled in a scale-out repository so Veeam can more finely balance space use between extents..

sarnold · Post by **sarnold** » Oct 28, 2020 7:03 pm this post

If you want to take advantage of all the features of ReFS and Veeam, you need to use data locality. If you don't, you won't get fast cloning or spaceless full functionality.

Right, this makes perfect sense.

Of course, Per-VM chains should always be enabled in a scale-out repository so Veeam can more finely balance space use between extents..

This also makes sense. So, that then begs the question, do you think it's still worth me having an SOBR at all in this case? Or would I be better served by 2 separate ReFS repositories, where I can control what jobs go where? If I do 2 separate repositories then, that also brings up the question of should I just leave the jobs as 1 file (don't use per-vm) to take advantage of not only ReFS spaceless fulls during synthetic full operations, but also to continue what I have now with deduplication within files.

I see both options as working. I just want to know what would be the best route forward.

Nov 02, 2020 10:01 am

Hi,

SOBR also offers other advantages, such as the ability to manage your backup storage easier throughout its life cycle. You can add and remove storage easier, you can offload to a capacity tier, etc. without needing to manage the backup location on your backup jobs. So introducing more storage and replacing older storage becomes easier. Hence, I would keep the SOBR and if you really don't like what the extents do for you, and have only one extend. That way you get to keep the ease of storage management while still having only one location. Just add the new SAN when that time comes as a second extension and seal the original one, so the data can gratefully expire after which you can remove it.

Cheers,
Didier

sarnold · Post by **sarnold** » Nov 03, 2020 1:40 am this post

Thanks for your replies. I ended up going with keeping the SOBR (extents formatted as ReFS), and switching to per-vm backups.

R&D Forums

SOBR capacity balancing with ReFS repositories

Re: SOBR capacity balancing with ReFS repositories

Re: SOBR capacity balancing with ReFS repositories

Re: SOBR capacity balancing with ReFS repositories

Re: SOBR capacity balancing with ReFS repositories

Re: SOBR capacity balancing with ReFS repositories

Re: SOBR capacity balancing with ReFS repositories

Re: SOBR capacity balancing with ReFS repositories

Re: SOBR capacity balancing with ReFS repositories

Re: SOBR capacity balancing with ReFS repositories

Who is online