Comprehensive data protection for all workloads
Post Reply
JaquesC
Influencer
Posts: 12
Liked: never
Joined: Jan 22, 2014 5:46 pm
Contact:

Veeam Dedup question

Post by JaquesC »

Hi there,

The scenario is that I have x2 proxies, master b&r server and replication b&r server(only does replication jobs)

My question is, where exactly does DEDUPE take place? Is it something I can specify?

Who carries the workload and handles the dedupe process?

As an example if I kick off a backup job from master over back-end into EQL storage and it grabs the VM directly from the datastore and brings it across into the attached MD storage, where is the dedupe happening?

If I use the proxies to handle the replications, who is doing the dedupe?

Need some clarity on that.
foggy
Veeam Software
Posts: 21144
Liked: 2143 times
Joined: Jul 11, 2011 10:22 am
Full Name: Alexander Fogelson
Contact:

Re: Veeam Dedup question

Post by foggy »

Deduplication is performed by the source (proxy) and target (repository) agents. Dedupe on the source side is done within the processed disk only: the content of virtual disks is consolidated, overlapping blocks, zero blocks and swap file blocks are filtered out. If the block already exists in some existing restore point for this job, it is not sent over to the repository again. Dedupe on the target side is done between blocks belonging to virtual disks of all VMs in the job (if the disk block already exists in the restore point currently being created, it is not saved again).

There's no target-side deduplication in replication jobs.

A couple of useful links to follow:

User Guide - Backup Architecture section
FAQ - Deduplication section
MAA
Expert
Posts: 101
Liked: 3 times
Joined: Apr 27, 2013 12:10 pm
Contact:

[MERGED] Backup Deduplication question

Post by MAA »

Hello,

I read an article http://helpcenter.veeam.com/backup/80/v ... ation.html
but I have a question:
before sending a data block over network, Backup Proxy checks for coincidence of data blocks only on the source side?
or also before sending data block over network, Backup Proxy checks for the presence the same data block in the Backup Repository?
foggy
Veeam Software
Posts: 21144
Liked: 2143 times
Joined: Jul 11, 2011 10:22 am
Full Name: Alexander Fogelson
Contact:

Re: Veeam Dedup question

Post by foggy »

Seems that my post above answers your question in detail.
MAA
Expert
Posts: 101
Liked: 3 times
Joined: Apr 27, 2013 12:10 pm
Contact:

Re: Veeam Dedup question

Post by MAA »

As I understood from your links, BEFORE sending data block over network, you are checking for a match the blocks ONLY on the SOURCE SIDE (source VM and Backup Proxy).
And at the TARGET SIDE (Backup Repository) you check blocks matches ONLY AFTER transmission over the network.

Why before sending data block over network to the Backup Repository side, Backup Proxy don't checks for the presence the same data block in the Backup Repository? This can further reduce the network traffic.
foggy
Veeam Software
Posts: 21144
Liked: 2143 times
Joined: Jul 11, 2011 10:22 am
Full Name: Alexander Fogelson
Contact:

Re: Veeam Dedup question

Post by foggy »

If you read my post more carefully, you will find that blocks are checked at both sides, before and after being transferred:
foggy wrote:If the block already exists in some existing restore point for this job, it is not sent over to the repository again.
dellock6
VeeaMVP
Posts: 6166
Liked: 1971 times
Joined: Jul 26, 2009 3:39 pm
Full Name: Luca Dell'Oca
Location: Varese, Italy
Contact:

Re: Veeam Dedup question

Post by dellock6 »

MAA wrote:Why before sending data block over network to the Backup Repository side, Backup Proxy don't checks for the presence the same data block in the Backup Repository? This can further reduce the network traffic.
It's always a trade-off between speed and efficiency: the only location that is able to see all incoming blocks into a given job is a repository, so deduplication between different virtual disks among a job can happen here. Because of parallel processing and multiple proxies updating blocks into the backup files, if complete dedup would happen at source, at each new block created by any proxy a new "block map" would have to be updated and redistribited to each proxy. And in between, no write could happen to guarantee consistency.

As it is today, a proxy can safely dedup its own content that it is processing (a given virtual disk at a certain point in time) and then the repository compares all the blocks it receives with already existing blocks, and finishes the dedup activity. You can say it could be even more efficient, but it gives a good balance between deduplication and performances.
Luca Dell'Oca
Principal EMEA Cloud Architect @ Veeam Software

@dellock6
https://www.virtualtothecore.com/
vExpert 2011 -> 2022
Veeam VMCE #1
Post Reply

Who is online

Users browsing this forum: Baidu [Spider], Bing [Bot] and 50 guests