Questions on deduplication

FrenchBlue · Post by **FrenchBlue** » Jul 31, 2024 11:39 am this post

Hello,

I've read the doc there https://helpcenter.veeam.com/docs/backu ... ml?ver=120 but it's not a simple topic

- I understand it's only online dedup, not offline, right?
- The doc says that the dedup is made by the Veeam data mover, so it effectively happens on the repository, not the backup proxy? (which makes sense, just to be sure)
- At which level does the dedup apply, a given backup job or globally? For example if I have 2 backup jobs, each one backuping a single VM from the same template, will dedup work there?
- Does dedup apply in the same way to performance and capacity tiers?

Thanks.

Post by **david.domask** » Jul 31, 2024 11:45 am this post

Hi FrenchBlue,

Did maybe the wrong link get copy/pasted? Looks like that's a topic about how to import backups from Object Storage repositories in a disaster recovery event, and don't see deduplication and compression being discussed.

Check our User Guide link here: https://helpcenter.veeam.com/docs/backu ... ation.html

I think it answers your questions, and as noted in the document, there is space reduction happening both on the proxy and the repository to maximize the efficiency and potential savings.

FrenchBlue · Post by **FrenchBlue** » Jul 31, 2024 11:47 am this post

Hello, yes sorry it was a bad paste from me, I've corrected it with the proper link, but I still have the remaining questions then

Jul 31, 2024 12:00 pm

Source side and target side refer to backup proxy and repository respectively, I think maybe that is introducing the confusion for you, so you can read the below understanding it talks about proxies and repositories respectively:

Veeam Backup & Replication uses Veeam Data Movers to deduplicate VM data:

Veeam Data Mover in the source side deduplicates VM data at the level of VM disks. Before the source-side Veeam Data Mover starts processing a VM disk, it obtains digests for the previous restore point in the backup chain from Veeam Data Mover in the target side. The source-side Veeam Data Mover consolidates this information with CBT information from the hypervisor and filters VM disk data based on it. If some data block exists in the previous restore point for this VM, the source-side Veeam Data Mover does not transport this data block to the target. In addition, in the case of thin disks, the source-side Veeam Data Mover skips unallocated space.

Veeam Data Mover in the target side deduplicates VM data at the level of the backup file. It processes data for all VM disks of all VMs in the job. The target-side Veeam Data Mover uses digests to detect identical data blocks in transported data and stores only unique data blocks in the resulting backup file.

This happens per job, so it's not global across multiple jobs. The same data movers are used for both backups to Performance Tier and offloads (copy and move) to Capacity Tier, though the behavior is a bit different. Since offloads to Capacity Tier are working from already deduped/compressed backups, there won't be much savings from the datamovers.

FrenchBlue · Jul 31, 2024 12:14 pm

Thanks, all clear now.

R&D Forums

Questions on deduplication

Re: Questions on deduplication

Re: Questions on deduplication

Re: Questions on deduplication

Re: Questions on deduplication

Who is online