I've read the doc there https://helpcenter.veeam.com/docs/backu ... ml?ver=120 but it's not a simple topic
- I understand it's only online dedup, not offline, right?
- The doc says that the dedup is made by the Veeam data mover, so it effectively happens on the repository, not the backup proxy? (which makes sense, just to be sure)
- At which level does the dedup apply, a given backup job or globally? For example if I have 2 backup jobs, each one backuping a single VM from the same template, will dedup work there?
- Does dedup apply in the same way to performance and capacity tiers?
Did maybe the wrong link get copy/pasted? Looks like that's a topic about how to import backups from Object Storage repositories in a disaster recovery event, and don't see deduplication and compression being discussed.
I think it answers your questions, and as noted in the document, there is space reduction happening both on the proxy and the repository to maximize the efficiency and potential savings.
David Domask | Product Management: Principal Analyst
Source side and target side refer to backup proxy and repository respectively, I think maybe that is introducing the confusion for you, so you can read the below understanding it talks about proxies and repositories respectively:
Veeam Backup & Replication uses Veeam Data Movers to deduplicate VM data:
Veeam Data Mover in the source side deduplicates VM data at the level of VM disks. Before the source-side Veeam Data Mover starts processing a VM disk, it obtains digests for the previous restore point in the backup chain from Veeam Data Mover in the target side. The source-side Veeam Data Mover consolidates this information with CBT information from the hypervisor and filters VM disk data based on it. If some data block exists in the previous restore point for this VM, the source-side Veeam Data Mover does not transport this data block to the target. In addition, in the case of thin disks, the source-side Veeam Data Mover skips unallocated space.
Veeam Data Mover in the target side deduplicates VM data at the level of the backup file. It processes data for all VM disks of all VMs in the job. The target-side Veeam Data Mover uses digests to detect identical data blocks in transported data and stores only unique data blocks in the resulting backup file.
This happens per job, so it's not global across multiple jobs. The same data movers are used for both backups to Performance Tier and offloads (copy and move) to Capacity Tier, though the behavior is a bit different. Since offloads to Capacity Tier are working from already deduped/compressed backups, there won't be much savings from the datamovers.
David Domask | Product Management: Principal Analyst