-
- Service Provider
- Posts: 48
- Liked: 7 times
- Joined: Feb 20, 2023 9:28 am
- Full Name: Marco Glavas
- Contact:
What is the point of Synthetic Fulls?
Hi everyone
I expect many of you will find this a weird or even stupid question and I expect I might be missing a vital piece of info here. So correct me if I'm wrong but:
From my understanding Synthetic Fulls are from a time of GFS backups. The idea was that instead of fully transferring an Active Full every Saturday and making the infrastructure sweat on those days, you'd just calculate a new Full from the last Full plus all the incremental data you had in between. That takes a lot of CPU on the backup side but at least lives network and source in peace.
However, we are not really saving GFS backups anymore, are we? From what I understand, Veeam deduplicates the chain. So any block that is repeated anywhere in the chain gets saved only once with a reference for every block that s the same linking to this one block. So a Synthetic Full isn't actually a Full anymore. It's also just a bunch of references to previously saved blocks (probably most of it, actually) plus the few individual blocks that Veeam hadn't seen before that needed to be saved in their entirety.
So my question is therefore, what is the point of a Synth Full in a deduped environment? Is it helpful for copy to tape jobs, which we do have? Is it helpful for restores because fewer increments have to be taken into account?
Why is this important to me? Well, we're struggling with SOBR extents filling up. The way backup chains with data locality placement policy is handled leads to cascading filling of the SOBR once the extents come close to being full. If the extents are very large and therefore carry many chains, it can happen that suddenly one extent runs full and all of the chains have to start a new backup on another extent. If the others aren't half empty either, it can trigger, in turn, those to fill up too and so on.
Now this seems to be a problem for synthetic full backups. Once those run, the cascade likes to begin.So how is it? Can a Synthetic full be redirected to a new extent and then take the full space it needs because deduplication no longer works, because a new chain was started? Is my understanding of this process correct or am I completely off?
And if my understanding is correct and indeed Synthetic fulls can lead to new chains being started undeduped, what is the impact of turning off Synth fulls and going forever incremental?
Edit: Just been reading on Reddit that Veeam does not deduplicate between Full chains, synthetic or otherwise. That is only achieved by ReFS. Which we use. That still leaves me with the question of whether Synthetic fulls make sense because what do you gain? If the base chuncks are corrupt in any way, all full chains will be too, if they reside on the same ReFS partition. So data integrity is not helped at all, right? That really only leaves restores and copy jobs suffering, right?
I expect many of you will find this a weird or even stupid question and I expect I might be missing a vital piece of info here. So correct me if I'm wrong but:
From my understanding Synthetic Fulls are from a time of GFS backups. The idea was that instead of fully transferring an Active Full every Saturday and making the infrastructure sweat on those days, you'd just calculate a new Full from the last Full plus all the incremental data you had in between. That takes a lot of CPU on the backup side but at least lives network and source in peace.
However, we are not really saving GFS backups anymore, are we? From what I understand, Veeam deduplicates the chain. So any block that is repeated anywhere in the chain gets saved only once with a reference for every block that s the same linking to this one block. So a Synthetic Full isn't actually a Full anymore. It's also just a bunch of references to previously saved blocks (probably most of it, actually) plus the few individual blocks that Veeam hadn't seen before that needed to be saved in their entirety.
So my question is therefore, what is the point of a Synth Full in a deduped environment? Is it helpful for copy to tape jobs, which we do have? Is it helpful for restores because fewer increments have to be taken into account?
Why is this important to me? Well, we're struggling with SOBR extents filling up. The way backup chains with data locality placement policy is handled leads to cascading filling of the SOBR once the extents come close to being full. If the extents are very large and therefore carry many chains, it can happen that suddenly one extent runs full and all of the chains have to start a new backup on another extent. If the others aren't half empty either, it can trigger, in turn, those to fill up too and so on.
Now this seems to be a problem for synthetic full backups. Once those run, the cascade likes to begin.So how is it? Can a Synthetic full be redirected to a new extent and then take the full space it needs because deduplication no longer works, because a new chain was started? Is my understanding of this process correct or am I completely off?
And if my understanding is correct and indeed Synthetic fulls can lead to new chains being started undeduped, what is the impact of turning off Synth fulls and going forever incremental?
Edit: Just been reading on Reddit that Veeam does not deduplicate between Full chains, synthetic or otherwise. That is only achieved by ReFS. Which we use. That still leaves me with the question of whether Synthetic fulls make sense because what do you gain? If the base chuncks are corrupt in any way, all full chains will be too, if they reside on the same ReFS partition. So data integrity is not helped at all, right? That really only leaves restores and copy jobs suffering, right?
-
- Veeam Legend
- Posts: 403
- Liked: 231 times
- Joined: Apr 11, 2023 1:18 pm
- Full Name: Tyler Jurgens
- Contact:
Re: What is the point of Synthetic Fulls?
SOBR extents filling up can definitely be a pain to deal with. My suggestion is to be a bit proactive with them if you can.
When you find one extent getting full, see if you can trigger an active full backup on one backup job targeting that SOBR. SOBR placement will calculate a new SOBR extent for the new backup chain that will be created by the Active Full backup. Synthetic Full backups will always try to stay on the same SOBR extent as the rest of the backup chain for the block cloning benefits you gain from ReFS or XFS. Active Full backups will start a new backup chain and will trigger a new extent for the job to start on. As long as you use per VM backup chains, each VM in that backup job can target a different SOBR extent.
The goal here is to avoid an extent getting full and an incremental backup landing on a different extent than the full backup.
When you find one extent getting full, see if you can trigger an active full backup on one backup job targeting that SOBR. SOBR placement will calculate a new SOBR extent for the new backup chain that will be created by the Active Full backup. Synthetic Full backups will always try to stay on the same SOBR extent as the rest of the backup chain for the block cloning benefits you gain from ReFS or XFS. Active Full backups will start a new backup chain and will trigger a new extent for the job to start on. As long as you use per VM backup chains, each VM in that backup job can target a different SOBR extent.
The goal here is to avoid an extent getting full and an incremental backup landing on a different extent than the full backup.
Tyler Jurgens
Veeam Legend x3 | vExpert ** | VMCE | VCP 2020 | Tanzu Vanguard | VUG Canada Leader | VMUG Calgary Leader
Blog: https://explosive.cloud
Twitter: @Tyler_Jurgens BlueSky: @explosive.cloud
Veeam Legend x3 | vExpert ** | VMCE | VCP 2020 | Tanzu Vanguard | VUG Canada Leader | VMUG Calgary Leader
Blog: https://explosive.cloud
Twitter: @Tyler_Jurgens BlueSky: @explosive.cloud
-
- Service Provider
- Posts: 48
- Liked: 7 times
- Joined: Feb 20, 2023 9:28 am
- Full Name: Marco Glavas
- Contact:
Re: What is the point of Synthetic Fulls?
What you are suggesting can itself trigger a cascade effect.
Also how do you suggest I get an idea of what chains reside on a certain extent and how large they are? The best option I have found so far is actually scanning the files with powershell.
Also, as far as I am aware, the only actual mechanism to move data is by job and then it's from SOBR to SOBR.
Why is there still no way to move single VM chains without taking the whole Job offline?
Also how do you suggest I get an idea of what chains reside on a certain extent and how large they are? The best option I have found so far is actually scanning the files with powershell.
Also, as far as I am aware, the only actual mechanism to move data is by job and then it's from SOBR to SOBR.
Why is there still no way to move single VM chains without taking the whole Job offline?
-
- Veeam Software
- Posts: 2113
- Liked: 509 times
- Joined: Jun 28, 2016 12:12 pm
- Contact:
Re: What is the point of Synthetic Fulls?
Hi Marco,
I will comment on your situation in a bit but first a few clarifying points as there is a bit of confusion here I see and just want to ensure we're talking on the same level about these elements.
First, a few definitions and explanations:
Synthetic Full => Fully functional and normal full backup produced by combining data from previous existing backups of a machine into a new full. Performed entirely on the repository saving network resources and reducing the impact on production (an incremental backup is still performed on Synthetic Full days). No deduplication/space savings happens with _just_ the Synthetic Full operations
Fast Clone => A special function that file systems may support (such as XFS or ReFS and a few additional storage vendors) which allows for fast copies on the file system by creating a reference to an existing block instead of actually writing the data again. This is where you will see a performance improvement in the synthetic operations (synthetic full creation, merging increments into full backups) as well as the space savings. Fast Clone requires that the data reside on the same volume as noted in the User Guide link, and it only works between individual backup chains for a given machine, it is not global across all files on the repository; remember fast clone is a _copy operation_, it's not a background process like global deduplication.
The main advantages of Fast Clone here are that you get fast backups, fast creation of Full backups, and you save space in the process; while Veeam includes inline deduplication and compression, that's limited to the individual backup session, and Fast Clone allows further space savings by reusing blocks from previous backups in the chain.
Clear so far?
So why would you want it? As opposed to Active Full backups, you've already found the main reason, saving resource consumption and placing the workload on the repository. With Fast Clone, this makes the workload even more efficient on the repository AND saves you additional space, while having actual full backups available.
Why are full backups important? Well, it's largely up to you -- many Veeam users run Primary jobs with just Forever Forward Incremental, so no periodic fulls. I suspect most would have a Backup Copy job or Backup to Tape job to create archival restore points from, but it's not a requirement, just very common. But it's also very common to use GFS to have the archival restore points set aside and such retention handled automatically -- as only full backups are able to be GFS points, this is another reason you would consider.
The placement policy for Scale-out Backup Repositories is outlined here in the User Guide, and given that Locality is selected (required) for Fast Clone to work, that's why it's favoring the existing extent.
A few questions I would have:
1. I would check some of the full backups properties on the ReFS volume itself and ensure that the file size really did reduce; there should be two values when the file has fast clone applied: Size and Size on Disk. Similarly, I would check the most recent job run where a Synthetic Full was supposed to run and see if if really used Fast Clone -- the job statistics window will note on the line about Synthetic Full if fast clone was used or not.
2. Keep in mind that depending on the increment size and the size of the full backup, as well as the unique change rate of the data, the space savings may not be enough to keep up with the data ingest; VeeamOne has a capacity planning report that should help with monitoring the ingest rate for your repositories.
As for easy way to check which files are where, I wrote this powershell function some time ago for a different reason, but you can pass a backup object returned by Get-VBRBackup to it and it will print a list of where the backup files are across SOBR extents (and some other information): post509821.html#p509821
I wrote a lot, so will give you time to review and digest it, but maybe it will help with your concerns here.
I will comment on your situation in a bit but first a few clarifying points as there is a bit of confusion here I see and just want to ensure we're talking on the same level about these elements.
First, a few definitions and explanations:
Synthetic Full => Fully functional and normal full backup produced by combining data from previous existing backups of a machine into a new full. Performed entirely on the repository saving network resources and reducing the impact on production (an incremental backup is still performed on Synthetic Full days). No deduplication/space savings happens with _just_ the Synthetic Full operations
Fast Clone => A special function that file systems may support (such as XFS or ReFS and a few additional storage vendors) which allows for fast copies on the file system by creating a reference to an existing block instead of actually writing the data again. This is where you will see a performance improvement in the synthetic operations (synthetic full creation, merging increments into full backups) as well as the space savings. Fast Clone requires that the data reside on the same volume as noted in the User Guide link, and it only works between individual backup chains for a given machine, it is not global across all files on the repository; remember fast clone is a _copy operation_, it's not a background process like global deduplication.
The main advantages of Fast Clone here are that you get fast backups, fast creation of Full backups, and you save space in the process; while Veeam includes inline deduplication and compression, that's limited to the individual backup session, and Fast Clone allows further space savings by reusing blocks from previous backups in the chain.
Clear so far?
So why would you want it? As opposed to Active Full backups, you've already found the main reason, saving resource consumption and placing the workload on the repository. With Fast Clone, this makes the workload even more efficient on the repository AND saves you additional space, while having actual full backups available.
Why are full backups important? Well, it's largely up to you -- many Veeam users run Primary jobs with just Forever Forward Incremental, so no periodic fulls. I suspect most would have a Backup Copy job or Backup to Tape job to create archival restore points from, but it's not a requirement, just very common. But it's also very common to use GFS to have the archival restore points set aside and such retention handled automatically -- as only full backups are able to be GFS points, this is another reason you would consider.
The placement policy for Scale-out Backup Repositories is outlined here in the User Guide, and given that Locality is selected (required) for Fast Clone to work, that's why it's favoring the existing extent.
A few questions I would have:
1. I would check some of the full backups properties on the ReFS volume itself and ensure that the file size really did reduce; there should be two values when the file has fast clone applied: Size and Size on Disk. Similarly, I would check the most recent job run where a Synthetic Full was supposed to run and see if if really used Fast Clone -- the job statistics window will note on the line about Synthetic Full if fast clone was used or not.
2. Keep in mind that depending on the increment size and the size of the full backup, as well as the unique change rate of the data, the space savings may not be enough to keep up with the data ingest; VeeamOne has a capacity planning report that should help with monitoring the ingest rate for your repositories.
As for easy way to check which files are where, I wrote this powershell function some time ago for a different reason, but you can pass a backup object returned by Get-VBRBackup to it and it will print a list of where the backup files are across SOBR extents (and some other information): post509821.html#p509821
I wrote a lot, so will give you time to review and digest it, but maybe it will help with your concerns here.
David Domask | Product Management: Principal Analyst
-
- Service Provider
- Posts: 48
- Liked: 7 times
- Joined: Feb 20, 2023 9:28 am
- Full Name: Marco Glavas
- Contact:
Re: What is the point of Synthetic Fulls?
I think I'll have to contact Veeam support. I don't think Fast Clone is working in our environment.
-
- Veeam Software
- Posts: 2113
- Liked: 509 times
- Joined: Jun 28, 2016 12:12 pm
- Contact:
Re: What is the point of Synthetic Fulls?
If there is suspicion it's not working, it's worth checking with Support indeed. I would advise @EWMarco that you pick an example job that looks to be affected and provide logs to it, as well as perhaps some screenshots of the file properties for Full backups you suspect did not participate in fast cloning. It will expedite the research greatly. Please share the case number once created, thanks!
David Domask | Product Management: Principal Analyst
Who is online
Users browsing this forum: Bing [Bot], Google [Bot] and 22 guests