Theorical question about deduplication

tntteam · Post by **tntteam** » Sep 14, 2015 3:56 pm this post

Hi,

I have some questions regarding how deduplication is supposed to work with Veeam.

Veeam deduplication is working on a "per job" basis.
I setup a backup job of 117 VMs (mixed windows 2008, 2012, linux'es).
I setup full each saturday, and start the backup job on thursday for the first time.
So I get a .vbk (=full) on thursday, plus 1 .vib on friday, then one .vbk on saturday.
Deduplication is on and set on "local target"
Compression is on, default settings

Why do my first full backup os 3.4TB and the second full backup that did happen on saturday (day+2) is 3.4TB too ?
The data can't have changed this much ?

What Am I missing ? Is the "per job" deduplication means "per run of each job" and deduplication doesn't work between each subsequent run of the same job?

Sorry it may be not very clear

Post by **PTide** » Sep 14, 2015 4:14 pm this post

Hi,

Why do my first full backup os 3.4TB and the second full backup that did happen on saturday (day+2) is 3.4TB too?

Deduplication takes place inside job, between VMs, which means that if there are many VMs with similar data (OS files, database files etc) in the same job then the resulting full backup file will be smaller than the total amount of data on all VMs due to deduplication applied to similar data. So, the statement

tntteam wrote:<...>deduplication doesn't work between each subsequent run of the same job<...>

is correct.

Thank you.

Post by **dellock6** » Sep 14, 2015 4:17 pm this post

Since you are running a "full" backup weekly, this backup has to have NO relation with previous chain and be independent. For this reason it does only deduplicate inside itself and doesn't look at blocks stored in previous restore points. Otherwise, you may look at forever-forward incremental or reversed incremental.

tntteam · Post by **tntteam** » Sep 15, 2015 7:16 am this post

THank you guys, I was misunderstanding the internal mechanics. I understand better why Veeam produced whitepapers about veeam+win2k12 builtin dedup.

Also when I see dedup 1.0x or 1.1x in backup results, I can conclude that deduplication is not worthy ?

Sep 15, 2015 10:08 am

Again, it's based on the kind of operation that is in place.
those numbers usually comes out during an incremental run, but exactly because it's only extracting changed blocks compared to previous run, chances are those blocks are all new and unique, thus there's no other block similar to them to have dedupe between them

I know sometimes deduplication can be tricky to understand...

R&D Forums

Theorical question about deduplication

Re: Theorical question about deduplication

Re: Theorical question about deduplication

Re: Theorical question about deduplication

Re: Theorical question about deduplication

Who is online