Standalone backup agent for Microsoft Windows servers and workstations (formerly Veeam Endpoint Backup FREE)
Post Reply
BackupBytesTim
Service Provider
Posts: 507
Liked: 124 times
Joined: Apr 29, 2022 2:41 pm
Full Name: Tim
Contact:

Forward Incremental vs Reverse Incremental

Post by BackupBytesTim »

Over the past couple days I've seen some comparisons between forward incremental and reverse incremental backups, the forward incremental backup scheme has always been recommended to me, and by me, but thinking about the behavior of a forward incremental backup and a reverse incremental backup, the reverse incremental backups seem to be more resilient to data loss and so I'm starting to wonder if we should be using Reverse Incremental backups as our standard backup scheme.

Here's one such discussion veeam-agent-for-windows-f33/size-of-backups-t88962.html

My experience with any other sort of backup software has always been an all-in-one format, without any sort of "chains" that Veeam has, which seems more efficient for a number of reasons, so while I've been using Veeam for over a year now the whole concept of "backup chains" still seems strange to me, partially because it seems incredibly complicated from a development standpoint so without having ever gotten a good explanation for why it's useful I've never figured out why it's a thing, I know it complicates everything from a management perspective.

Anyone have a good comparison of forward incremental vs reverse incremental backups?
Gostev
Chief Product Officer
Posts: 32761
Liked: 7970 times
Joined: Jan 01, 2006 1:01 am
Location: Baar, Switzerland
Contact:

Re: Forward Incremental vs Reverse Incremental

Post by Gostev »

Reversed incremental is where Veeam started over 15 years ago. Back then we focused on VSMB customers, and having the latest backup as full backup available as a self-contained file provided certain benefits to those types of customers (it allowed for simple scripted export of backup files to removable media, tape etc.)

These days, reverse incremental makes no sense because Veeam can do virtually everything natively (all those use cases that benefitted from reverse incremental backup format). While this backup mode never really had technical benefits, only drawbacks (in terms of performance and I/O load). We plan to discontinue reverse incremental backup mode in future, so I recommend you avoid deploying at least your new jobs like that. This, the comparison is very simple: don't use it :D

Please note that literally every backup software has a concept of "backup chains" due to relying on incremental backup paradigm. Most just "hide" this fact from you because they do not let you select different types of backup chains. Veeam had to do it because we started from reverse incremental, then added more classic forward incremental for larger customers, but could not immediately remove reverse incremental due to many customers still having use cases dependent on that.

In fact, it was not before later Veeam version and especially the proliferation of modern backup storage (ReFS/XFS/object) when reverse incremental really started to sink dramatically in our customer base, because it really does not make ANY sense for such deployments.
BackupBytesTim
Service Provider
Posts: 507
Liked: 124 times
Joined: Apr 29, 2022 2:41 pm
Full Name: Tim
Contact:

Re: Forward Incremental vs Reverse Incremental

Post by BackupBytesTim »

I'll admit to not having done a lot of digging into the internal content formatting of the all-in-one file format that every other backup software I've used uses, but from my perspective that format seems a lot more efficient and robust due to not relying on external databases and metadata files that may need additional maintenance and infrastructure to support, but also in that there's no "merging" process.

If I set up a new computer, do a full backup, I now have 1 file containing the entire computer's contents.
With Veeam, and a forward incremental, if I do another backup (an incremental backup), now I have 2 files. One that contains everything from the first run, and another that contains updated data from the new run. And so on until I run into either the next scheduled active full backup, or in my case the restore point limit, which will trigger the merge process. At which point Veeam does a new backup, an incremental, saving new data, then taking the oldest incremental restore point data and squishing it into the original full backup file. Which seems to create extra work.

To use a common product for direct comparison, Acronis, which I'd worked with extensively for years before using Veeam. When I do the second backup run, new data is added in to the existing file. And the same occurs each time the job runs, until the desired restore point limit is met. Then "expired" data gets removed alongside the new data getting added. But the existing data is already in the same file, so there's no "merging" to be done.

So, from my perspective Veeam's "backup chain" is like:

Code: Select all

VBK file: [A] [B] [C] [D] [E]
VIB file 1: [A]
VIB file 2: [E]
VIB file 3: [C]
VIB file 4: [A] [B]
VIB file 5: [B] [C]
Which creates this thing that fits the description of a "chain" where, in order to recover the latest version you need to take the latest copy of data from each separate file that contains it. So if I want the latest form of the computer I need data from the VBK file, VIB file 2, and VIB file 5.

Again, I never looked into it too extensively because the following method just makes a lot more sense in my head so this is how I assumed it worked.

Other software (all in one file):

Code: Select all

Version 1: [A] [B] [C] [D] [E] (new file created)
Version 2: [A] (file modified)
Version 3: [E]  (file modified)
Version 4: [C] (file modified)
Version 5: [A] [B] (file modified)
Version 6: [B] [C] (file modified)
So being that it's all in one file, it's more like just grabbing the latest version of each file on the computer. Another good explanation might be, for other software, if you have 5 documents on a table, you modify and reprint 3 of them. But you don't replace the original documents, you just place the new copies on top of the old ones. So when you want to get the latest version of the entire set you just grab the top (latest) document from each stack.

But for Veeam, the same example, you'd reprint the documents and have another table for the new versions of your documents, and only include the 3 new copies. So if you want to get the latest version of the entire set you have to compare the contents of each table, find which one has the latest version of each document, and then get that latest version, consolidate them all together on one table to have the complete up to date set.

Perhaps the way Veeam records data in the database and metadata files makes the process more efficient then it seems, such as by recording what VIB file has the latest version of each bit of data (file or block) so the recovery process doesn't have to scan through everything in order to find the latest data at recovery time.

If that's the case, then I suppose I my original statement about other software not having "chains" really just comes down to how we define a "chain". In my examples I would define a "chain" as the collection of files required to store the data. So, by that definition, other software does not have "chains". Though, Acronis has the option for that format, it never made sense to me to use and always took up a lot more space (presumably due to improved data compression from storing everything in one file).

That said, what I'm mainly curious about is how reverse incremental backup chains seem more resilient to data loss. It's not a common thing, especially with any sort of redundant/self-repairing file systems like ReFS or XFS. But for any more standard storage format it would seem a reverse incremental chain would be the better option, since you'd always have all the latest data in one file, with no dependency on all the other incremental files in order to perform the recovery. Whereas with a forward incremental chain you're dependent on all incremental files being available, and intact, in order to recover to the latest restore point.

I am also curious, regardless of why the move to primarily using forward incremental chains was done, why the decision to drop reverse incremental chains?

I'll also add, the vast majority of our customers are small businesses users, who are less concerned about things like ransomware and more concerned about losing or accidentally breaking their laptops, or accidentally deleting a file. So recovery of the latest versions is almost always the priority when planning. Past versions being available is seen as a benefit, but is rarely desired from a recovery standpoint. Just to clarify where I'm coming from with some of the questions.

Semi-related side question, my understanding has always been that even after the "merge" process in a forever forward incremental backup chain, the old data that is "expired" and no longer needed still exists in the VBK file. Is that accurate?
Gostev
Chief Product Officer
Posts: 32761
Liked: 7970 times
Joined: Jan 01, 2006 1:01 am
Location: Baar, Switzerland
Contact:

Re: Forward Incremental vs Reverse Incremental

Post by Gostev »

Reverse incremental backup chains are LESS resilient to data loss due to modification applied to the existing backup file. In-place modification can never be good for reliability. While what has a chance to get corrupted in case of reverse incremental backup mode is what EVERY other restore point is dependent on: the base full backup.

As I've said, as of today this backup mode provides no technical benefits, including for recoveries from the latest restore point. And on the other hand, it has multiple drawbacks, many of which are simply not present with other backup modes. Thus the decision to drop.

Semi-related side question: no, this is not accurate. The blocks behind "expired" data are reused with new data.
BackupBytesTim
Service Provider
Posts: 507
Liked: 124 times
Joined: Apr 29, 2022 2:41 pm
Full Name: Tim
Contact:

Re: Forward Incremental vs Reverse Incremental

Post by BackupBytesTim »

Reverse incremental backup chains are LESS resilient to data loss due to modification applied to the existing backup file. In-place modification can never be good for reliability. While what has a chance to get corrupted in case of reverse incremental backup mode is what EVERY other restore point is dependent on: the base full backup.
But in the case of a forever forward incremental backup chain, isn't the VBK file being modified on every backup run as well? Thus, same risks for data loss/corruption from the file modification process, no? I understand the hypothetical benefit to write the file once, and never modify it, but it does get modified, each time, so that would seem to not actually apply in a forever forward chain, only if we do periodic full backups (synthetically or actively).
Semi-related side question: no, this is not accurate. The blocks behind "expired" data are reused with new data.
Okay, so that sounds like when old data is "expired" it's not technically removed, but rather the blocks are written over with new data when more space is needed. Similar to when a file is "deleted" from a file system, typically it's just marked as available space, but not written over until new data needs to use the space it occupied. That sound like an accurate explanation?

Assuming that understanding is accurate, what does the "Remove deleted items after: X days" option in the "Full Backup File Maintenance" section of the job settings do? Does that remove the unused blocks from the file?
Gostev
Chief Product Officer
Posts: 32761
Liked: 7970 times
Joined: Jan 01, 2006 1:01 am
Location: Baar, Switzerland
Contact:

Re: Forward Incremental vs Reverse Incremental

Post by Gostev »

Forever forward incremental backup chain is subject to the same reliability considerations indeed, which is why it is not a default setting and I expect it will also be gone eventually. Nevertheless, it is a wee bit better than reverse incremental still, as it has a few benefits over the latter. Which is why the reversed incremental backup mode will be the first to go.

Here's the explanation on what this setting does > Retention Policy for Deleted Items
BackupBytesTim
Service Provider
Posts: 507
Liked: 124 times
Joined: Apr 29, 2022 2:41 pm
Full Name: Tim
Contact:

Re: Forward Incremental vs Reverse Incremental

Post by BackupBytesTim »

Okay, that makes more sense then as far as the stability/reliability long-term of the incremental backup methods. I was under the impression that forever forward backups were more strongly recommended than that would seem to suggest.

However, on the topic of the Deleted Items Retention settings, that seems to apply to virtual machines backed up with the Backup and Recovery Server, but what does the setting do for computers backed up individually with the Agent software?
Gostev
Chief Product Officer
Posts: 32761
Liked: 7970 times
Joined: Jan 01, 2006 1:01 am
Location: Baar, Switzerland
Contact:

Re: Forward Incremental vs Reverse Incremental

Post by Gostev »

According to the User Guide, it is doing exactly the same there too: deletes backups of machines which were removed from a production environment after the set number of days.
BackupBytesTim
Service Provider
Posts: 507
Liked: 124 times
Joined: Apr 29, 2022 2:41 pm
Full Name: Tim
Contact:

Re: Forward Incremental vs Reverse Incremental

Post by BackupBytesTim »

Okay, that makes sense then. I didn't think about it doing that, probably on account of us not using any backup policies assigned at scale across multiple computers and managed by a VBR server. Only jobs customized and assigned individually to individual computers using the VSPC. So the only time a job runs is when the computer it's assigned to is online and does a backup.

In that scenario, if the option is set, will the repository server delete files for a computer that has been removed from management by the VSPC server after a time (such that the job isn't actually running, just maybe the server still checks if the data is expired now and should be removed)?

Similarly, what if the computer is just offline, but still managed, just not backed up in so long that the latest backup is now past the expiration period (I assume this doesn't count as "expired" at that point since the computer hasn't been deleted from the VSPC server)?

Another side question since it's in that same page you linked, for a VSPC managed Agent, backing up to a Cloud Connect Repository, does the health check process run on the server side, with the VCC VBR server or the repository server, or does it run on the agent, communicating with the server in order to access the data for the health check process?
I assume it's all on the agent, but want to make sure, since there seem to be some places where using a VCC VBR server processes data differently than using a regular VBR installation.
Gostev
Chief Product Officer
Posts: 32761
Liked: 7970 times
Joined: Jan 01, 2006 1:01 am
Location: Baar, Switzerland
Contact:

Re: Forward Incremental vs Reverse Incremental

Post by Gostev »

I'm not sure, however frankly this is also a complete off-topic for this thread in any case. Please post these questions in the dedicated VSPC/VCC/VAW subforums, where they will be seen and answered by the responsible PMs.

While we can keep this topic for further questions regarding backup modes comparison.
Post Reply

Who is online

Users browsing this forum: No registered users and 4 guests