How De-duplication and Compression works

Hyper-V specific discussions

How De-duplication and Compression works

Veeam Logoby LMS » Mon May 29, 2017 5:42 am

Hi

We are new to Veeam, just completed the implementation and started taking HV VM backups (on-host backups are configured). Looking to get some clarity on how de-duplication & compression works on VM backup.

Consider one server with 40 GB used space disk, scheduled Active Full backup on every Saturday, Synthetic full on every Wednesday and incremental on weekdays with 60 restore points. As per Veeam documents, source side de-duplication ensures only unique data blocks not already present in the previous restore point are transferred across the network and target side de-duplication checks the received blocks against other virtual machine (VM) blocks already stored in the backup file. Here when we check backup file size, it's around 22 GB for all weekly full jobs. What we expect is since one full job data is already present in SAN, other full jobs shouldn't transfer and store full data again since we have multiple restore points. This causes issues with SAN space utilization and we changed many jobs to run as Forever forward incremental jobs, but this is not accepted by our organization. Just we want to know is the VBR works as expected or do we to make some changes here with the schedule.

One more point needs clarification. At present we configured scheduled jobs for each individual VMs. But if we create a single job with multiple VMs with similar retention and job schedule, will this improve de-duplication ratio (Target-side deduplication checks the received blocks against other virtual machine (VM) blocks already stored in the backup file, thus providing global deduplication across all VMs included in the backup job), even we tried this for a few VMs, but still found it creates multiple files for each VMs included in the job.

Looking for clarification and best practice to follow

Thanks in advance
LMS
Influencer
 
Posts: 11
Liked: never
Joined: Mon May 29, 2017 5:13 am
Full Name: MS Sunil

Re: How De-duplication and Compression works

Veeam Logoby Mike Resseler » Mon May 29, 2017 4:39 pm

Hi,

Furst: Welcome to the forums.

I am not sure if I understand everything what you are asking, but I will give it a try. Feel free to tell me that I am wrong :-)

1. Our deduplication only works per job. Which means one VM per job will not give you that much profit.
2. When you have moved multiple VM's in one single job, was per-VM backup files enabled? Because if it does, you can't take advantage of multiple VM's in one single backup file (see here: https://helpcenter.veeam.com/docs/backu ... tml?ver=95)

Cheers
Mike
Mike Resseler
Veeam Software
 
Posts: 3342
Liked: 379 times
Joined: Fri Feb 08, 2013 3:08 pm
Location: Belgium, the land of the fries, the beer, the chocolate and the diamonds...
Full Name: Mike Resseler

Re: How De-duplication and Compression works

Veeam Logoby foggy » Mon May 29, 2017 4:58 pm

Moreover, Veeam B&R deduplication works within a backup file, so all full backups for the given job will have comparable size, since data is not deduplicated between them.
foggy
Veeam Software
 
Posts: 15083
Liked: 1110 times
Joined: Mon Jul 11, 2011 10:22 am
Full Name: Alexander Fogelson

Re: How De-duplication and Compression works

Veeam Logoby LMS » Mon May 29, 2017 9:33 pm

Thanks a lot you all. I will explain the current configuration in detail and what I understood from your reply.

We are using only Hyper-V 2012 R2 environment, having single physical VBR 9.5 server, using on-host backups, using in-line de-duplication.

We were confused on Veeam statement "source side de-duplication ensures only unique data blocks not already present in the previous restore point are transferred across the network and target side de-duplication checks the received blocks against other virtual machine (VM) blocks already stored in the backup file", also Mike mentioned "Our deduplication only works per job" and Foggy mentioned "deduplication works within a backup file". So what I understood now on statement "source side de-duplication ensures only unique data blocks not already present in the previous restore point are transferred across the network" is, if data blocks are there with previous restore points then it won't be transferred over network, for eg say, a full job won't transfer the full data if the data blocks present with previous full backup, but the size of full backup file will be same or more as old full job which already there due to multiple restore points. Am I right?

(We thought if a full backup is there, later on wards while take full backups the file size would be less compared to initial full backup because de duplication and now I understood how de duplication works)

As per best practice recommendation "per-VM backup files enabled" is configured, I read the link you provided. So should we disable this option for better de-duplication and configure jobs with multiple VMs? If we configure backups with multiple VMs with single file, then how many VMs should be selected per job and how can we calculate the number of VMs to be processed at a time since we are using on-host backup.

Thanks a lot
LMS
Influencer
 
Posts: 11
Liked: never
Joined: Mon May 29, 2017 5:13 am
Full Name: MS Sunil

Re: How De-duplication and Compression works

Veeam Logoby Mike Resseler » Tue May 30, 2017 4:47 am

It looks like you are correct. The source-side dedup will lower the traffic across the network, target-side is responsible for saving storage.

The deduplication is indeed per backup-file. I apologize, still an old habit of saying it like that but per-VM backup files indeed makes multiple backup files per job and the deduplication will be per file.

Per-VM backup files has advantages, one of them (as an example) being that is great when you use Windows Server 2k16 Deduplication on your repository. But if your storage is not a dedupe appliance or does not run software dedup then it might be more interesting to run a few jobs with multiple VM's in there.
Mike Resseler
Veeam Software
 
Posts: 3342
Liked: 379 times
Joined: Fri Feb 08, 2013 3:08 pm
Location: Belgium, the land of the fries, the beer, the chocolate and the diamonds...
Full Name: Mike Resseler

Re: How De-duplication and Compression works

Veeam Logoby foggy » Tue May 30, 2017 2:53 pm

LMS wrote:So what I understood now on statement "source side de-duplication ensures only unique data blocks not already present in the previous restore point are transferred across the network" is, if data blocks are there with previous restore points then it won't be transferred over network, for eg say, a full job won't transfer the full data if the data blocks present with previous full backup, but the size of full backup file will be same or more as old full job which already there due to multiple restore points. Am I right?

Full backup resets the backup chain and is a self-contained backup file, where data is not deduped against previous restore points (only within the processed VM disk).
foggy
Veeam Software
 
Posts: 15083
Liked: 1110 times
Joined: Mon Jul 11, 2011 10:22 am
Full Name: Alexander Fogelson

Re: How De-duplication and Compression works

Veeam Logoby LMS » Mon Jun 05, 2017 6:06 pm

Thank You all

We tried both options with the repository (Per-VM backup files enabled and disabled) against a set of VMs, but it didn't make any difference in size. So we will go with Per-VM backup files option.
LMS
Influencer
 
Posts: 11
Liked: never
Joined: Mon May 29, 2017 5:13 am
Full Name: MS Sunil

Re: How De-duplication and Compression works

Veeam Logoby sg_sc » Mon Jun 05, 2017 9:48 pm

Full backup files (VBK) will always take up the full space (unless ReFS), no matter if previous full backup files are still present.

Veeam does the magic on the source side, using changed blocks technology (or Hyper-V equivalent) to only transfer changed blocks and on the target side the in backup-file deduplication will save storage space when you have multiple VM's with same blocks of data. For instance 10 Windows 2012 R2 VM's will definitely have a lot of blocks containing OS files in common, that will be deduped in the backup file.
If you enable per-VM backup files you do not have that last benefit, also if you create a job per VM you do not have that benefit.

If you want huge space savings without the need for special deduplication processes or appliances, you should look into ReFS 3.1 64K and Synthetic fulls.
As a test I have 9TB of backup copies (GFS: Q, M, W synthetic full VBK files) on a 2TB disk, thanks to the ReFS and Veeam fast block clone magic.
It does have a downside that ReFS needs a beefy server (lots of RAM) if you intend to put allot of TB's on it, and remember it must be ReFS 3.1 (win 2016) and use 64K blocksize otherwise things will not go smooth.
sg_sc
Enthusiast
 
Posts: 42
Liked: 8 times
Joined: Tue Mar 29, 2016 4:22 pm
Full Name: sg_sc

Re: How De-duplication and Compression works

Veeam Logoby LMS » Tue Jun 06, 2017 4:17 am

Thanks sg.

As I mentioned before we created a job which includes 4 VMs with & without Per-VM backup files option. But it didn't save a bit when compare each options. All the forum & Veeam documentation mentioned to disable Per-VM backup files to better de-dup, so we will open a case to check this

Regards
LMS
Influencer
 
Posts: 11
Liked: never
Joined: Mon May 29, 2017 5:13 am
Full Name: MS Sunil

Re: How De-duplication and Compression works

Veeam Logoby foggy » Tue Jun 06, 2017 9:58 am

What kind of VMs they are? VMs created from a single template would more likely get more blocks in common. Also, have you ensured per-VM option took effect (i.e. there were separate backup chains for each VM in the repository)? Since it takes effect only after active full backup, if the setting is changed for existing job.
foggy
Veeam Software
 
Posts: 15083
Liked: 1110 times
Joined: Mon Jul 11, 2011 10:22 am
Full Name: Alexander Fogelson

Re: How De-duplication and Compression works

Veeam Logoby BartP » Wed Jun 07, 2017 1:07 pm

Keep in mind that Deduplication often works best on (active) Full Backups.
Incremental backups use CBT and the changed blocks are, more often than not, unique blocks.
This changes when backing up a Fileserver or Mail server. Low values of dedupe can be achieved.
Bart Pellegrino,
Veeam Certified Trainer & Architect
twitter: @bpellegrino

Check http://backitup.online for VMCE and VMCE-ADO study materials and practice exams
BartP
Certified Trainer
 
Posts: 35
Liked: 11 times
Joined: Mon Aug 31, 2015 8:24 am
Location: Netherlands
Full Name: Bart Pellegrino

Re: How De-duplication and Compression works

Veeam Logoby LMS » Sun Jun 11, 2017 4:39 pm

Hi

VMs are Windows 2012 R2 servers with SQL DBs (the 4 servers we tested backup are using shared disks / VMs in cluster), we tried both the options and with per VM option it's creating separate files for each VMs. All jobs are created freshly, means we tried only active full backups
LMS
Influencer
 
Posts: 11
Liked: never
Joined: Mon May 29, 2017 5:13 am
Full Name: MS Sunil


Return to Microsoft Hyper-V



Who is online

Users browsing this forum: No registered users and 9 guests