Comprehensive data protection for all workloads
Post Reply
kan_two
Lurker
Posts: 2
Liked: never
Joined: Jan 09, 2018 2:35 pm
Full Name: Kan Two
Contact:

Dedupe rate question

Post by kan_two »

Is there any data on Veeam efficiency wrt deduplication vs other deduplication methods?

As I understand is Veeam dedup inline as opposed to post process. I imagine that a post process dedupe could greatly reduce space usage of e.g. two separate full backups of two separate but similar VMs (eg same OS). How efficient would Veeam inline dedupe be in this scenario? Is the checksumming and duplicate block checking so efficient that one would achieve almost the same dedupe rates, or are the inline dedupe mainly able to match against data in the same backup job/stream?

I imagine that changed block tracking, deltas + synthetic fulls, ReFS and so on used by Veeam are contributing to a LOT of space savings, but these are not strictly dedupe tools even though the outcome is much the same, so I wonder how much dedupe in itself contributes, compared to these others.

Cheers,
foggy
Veeam Software
Posts: 21069
Liked: 2115 times
Joined: Jul 11, 2011 10:22 am
Full Name: Alexander Fogelson
Contact:

Re: Dedupe rate question

Post by foggy »

Hi Kan, you're correct, Veeam B&R performs inline deduplication within it's backup files that is completely different from what dedupe appliances or Windows Deduplication do (and typically doesn't affect their dedupe rate). Here're a couple of links that should give you a better understanding:

Deduplication
Data Compression and Deduplication
kan_two
Lurker
Posts: 2
Liked: never
Joined: Jan 09, 2018 2:35 pm
Full Name: Kan Two
Contact:

Re: Dedupe rate question

Post by kan_two »

Hi, thanks for the info!

It would be nice to know how Veeam inline dedupe compares with post process dedupe?

Is it so efficient wrt checksumming and comparing blocks so it approaches post process dedupe, or are there any situations where Veeam dedupe would be significantly less efficient (e.g. across different backup jobs or runs, after manual full backups and so on), knowing that post processing will scour every stored bit for dupliacte occurrences, whereas inline dedupe somehow need to keep track of data already in the pipeline or previously processed.
tdewin
Veeam Software
Posts: 1775
Liked: 646 times
Joined: Mar 02, 2012 1:40 pm
Full Name: Timothy Dewin
Contact:

Re: Dedupe rate question

Post by tdewin »

Veeam is inline dedupe
tdewin
Veeam Software
Posts: 1775
Liked: 646 times
Joined: Mar 02, 2012 1:40 pm
Full Name: Timothy Dewin
Contact:

Re: Dedupe rate question

Post by tdewin »

So didn't complete the previous post. Veeam is inline dedupe. It is based on a default 1MB block size (lan target) although you can change this to wan target (256kb). This makes the deduplication overhead quite low while still given good results when for example encountering empty blocks or VM OS data deployed from the same template. Veeam does keep track of blocks in the current chain. The good thing is, this way Veeam does really fast backups, and really fast restores since data fragmentation is low, with a limited amount of resource.

Global dedupe with post processing might give you more gain. This is exactly the reason why we do integrate with StoreOnce and DataDomain. However, the rehydration process does give a significant impact during restore. That's why it is mostly recommended to use them in a second tier
Post Reply

Who is online

Users browsing this forum: Google [Bot], mikeram and 285 guests