Backup Mode and ReFS (introduction and when to use what)

jochot · Post by **jochot** » Mar 07, 2018 3:17 pm this post

Hey,
So I changed to a new Backup Storage recently and formatted it with ReFS as I see big advantages for Backup Repositories.
I don't really want to go into detail about what ReFS does and why it saves space and whatnot, but I'd like to talk about Backup Modes as they take advantage of ReFS differently and we have the choice of a few with Veeam:
- Reverse incremental
- Forever forward incremental
- Forward incremental with Synthetic fulls
Ok for the people who don't read into ReFS, basically it works with metadata and doesn't necessarily need to move/copy the blocks on the disk.
This shows, that the main advantage from ReFS comes from lower I/O operations on the Backup repository and therefore higher speed.

From a German whitepaper (https://www.veeam.com/de/wp-microsoft-r ... arget.html)which I didn't yet find in English, I get the following advances:
- Forever forward incremental
NTFS: 1x read, 2x write
ReFS: 1x write 1x metadata-operation
The point here is, that the blocks are not written into the Full backup file itself and therefore are not moved/copied on the disk, but are moved/copied in the metadata file. (fast clone)
- Forward incremental with Synthetic fulls
NTFS: 1x read, 2x write
ReFS: 1x write 1x metadata-operation
The I/O is the same as the Forever forward, but the Synthetic fulls are faster and take less space.
The reason for that is, that, for a new Synthetic full, there is no new File created on the drive itself, but in the metadata file the new synthetic file is linked to the existing blocks
That means, there is no block copied/written into a new Synthetic full file, but it will link in the metadata file to the "old" block, improving speed and space consumption.
-Reverse incremental
Now that is where the whitepaper doesn't really give detailed information anymore, I found this article explaining the backup mode: https://www.veeam.com/kb1933 but it also is unclear about the I/O "The job will generate 3 times as much I/O on the target storage." (3 times as much as what?)
I read from it, that
1x "injecting" changed blocks = write
1x reading out replaced blocks = read
1x writing replaced blocks = write
So for NTFS: 2x write, 1x read
This would be the same amount of r/w as the others.
For ReFS I guess it would be: 1x write 1x metadata-operation, just as the others, although the metadata-operation could be more complex in this one?

So now that we kind of understood, what the advantages are, I'd like to compare them. [basically a description of how these work, you can skip that if you already know]

Forever forward incremental VS forward incremental with Synthetic full
So the main difference is, that, in forever forward, the last (not latest) backup is always the full backup and if you need to restore the newest state, you need to combine all of the incrementals + the Full backup file.
In forward with synthetic fulls, you create synthetic fulls every now and then and therefore only need the latest synthetic full + the incrementals between now and the synthetic full.
Interesting is the retention policy here:
In Forever forward, the time will come where the full backup file (as it is the oldest one) would have to be deleted. Instead of deleting that one, veeam will include the oldest incremental into the full backup and delete the replaced blocks from the full backup file.
In forward with synthetic, when you come to the max number of restore points (let's say 20) it could be that you end up with more than 20 restore points, because veeam can only delete the incrementals and the associated Full backup file (as long as you don't have rollbacks enabled).
So if you have a Synthetic full which is older than 20 restore points, but the next is the 17th restore point, to restore 18-20 the older synthetic full and all the incrementals are required.

Now If you look at reverse incremental, the Full backup is always the latest one, all the incremental changes are injected into the Full backup and a rollback file is created with the replaced data.
The retention policy is clear, it will delete the oldest incremental as the full backup is the newest and all the restore points are rollbacks.

[The actual Question / discussion material]

Now when to use what with ReFS?
Usually, Reverse incremental would make the most sense to me, as you have the latest data as the full backup and when it comes to a restore, you have the newest data in the Full backup file.
What I don't get is, why it is still that slow with ReFS? It should also be just one time writing and some metadata operations, still it is way slower (50%) than forward.
I guess this is only for small machines on fast storage?

When to use Synthetic fulls or not is not really clear to me.
With ReFS I have the benefit of Synthetic fulls not taking more space.
I don't get the advantage of not having synthetic fulls except for very little retention policies (until 10 restore points or so).
If you go to more than 2 weeks, I'd say the synthetic fulls provide me with an faster restore and easier move of backup data?

csydas · Post by **csydas** » Mar 07, 2018 7:10 pm this post

Hi jochot,

I've always been pretty particular that the server defines the backup mode. All my "right now" servers are done in Reverse incremental; if the server dies and I need it back in 10 minutes, it's a reverse incremental job for me. The reason you're seeing slowness is because from what I remember, you're seeing two merges, not one with ReFS. The incoming increment gets merged in, the former increment should get moved out, but I think this should be pretty fast still with ReFS. For your backups, were they pre-existing on an NTFS repo and moved or made fresh? If they were moved, you need to run an active full first to engage fast clone.

Synthetic fulls, for me, are for when you need (or want) a full backup, but you can't keep the machine on snapshot for that amount of time to do a full backup. We do storage snapshots with VMware, so this isn't much of an issue anymore, but prior to that we had a lot of sensitive servers that I just couldn't let sit on snapshot for 10 hours while our backup proxy feverishly grabbed data off the SAN for all 20 TB. That's the real benefit; the job can sit and merge on the storage as long as it darn well pleases while the Production VM is happily snapshot free.

I don't trust big Forever Forward Incremental chains; the idea of something screwing up a VIB 3 months ago and trashing 5 months worth of backups just makes me upset thinking about it, and not even Integrity Streams convinces me this is a good idea, so when I want a primary chain, I've always got fulls going. Even for smaller chains like 30 some days.

If you go to more than 2 weeks, I'd say the synthetic fulls provide me with an faster restore and easier move of backup d

This is my understanding - you have to read from every increment involved with the day you choose, and you can end up with really long chains if you have a lot of I/O to deal with, so you should see faster work with the periodic fulls in the chain, and synth is naturally faster.

R&D Forums

Backup Mode and ReFS (introduction and when to use what)

Re: Backup Mode and ReFS (introduction and when to use what)

Who is online