Host-based backup of Microsoft Hyper-V VMs.
Post Reply
JRRW
Enthusiast
Posts: 78
Liked: 46 times
Joined: Dec 10, 2019 3:59 pm
Full Name: Ryan Walker
Contact:

Windows Dedupe vs SAN Dedupe - insight sought

Post by JRRW »

Currently we have around 50TB of VHDX that have windows dedupe enabled on them; unoptimized this is around 100TB raw in 99% file server data.

Thus, Veeam backs up 50TB of data, which then puts it to tape(s).

However... We are moving to a SAN that does dedupe, and is supposed to do better dedupe than the File Servers (logical as it's global vs per Volume) but when crossing our I's and dotting our T's the observation came up that in moving from Windows to a SAN, you have unoptimized the VHDXs which in turn makes Veeam backup 'more' data (and double the tape usage as well)

Does Veeam have any ability to dedupe that sort of file server inline, to make the backups smaller than raw? Or, is it pointless to use the SAN dedupe for these volumes as it will blow backups out of the water.

The SAN gives plenty of other benefits to make it worth it no matter what - we don't backup QA/Dev/Sandbox envs and thus can have identical data sets for all matching prod with minimal overhead on storage - but obviously we'll need to size the SAN larger if we're not going to get the dedupe on the file servers that already run Windows Dedupe.

Note: Most SAN dedupe can't see/dedupe the blocks inside a deduped volume to further dedupe/compress it.
HannesK
Product Manager
Posts: 14844
Liked: 3086 times
Joined: Sep 01, 2014 11:46 am
Full Name: Hannes Kasparick
Location: Austria
Contact:

Re: Windows Dedupe vs SAN Dedupe - insight sought

Post by HannesK »

Hello,
in many cases, in-guest deduplication creates larger backups than "normal" VMs. That's because more blocks are moving around. So it's hard to predict which backup size will be smaller.

To answer your question: the hypervisor and any software backup up on hypervisor level doesn't know about storage level deduplication. They see the full size.
Does Veeam have any ability to dedupe that sort of file server inline, to make the backups smaller than raw?
Yes. Full size is okay, because with that Veeam can apply its data reduction methods (usually around 2:1) and overall the result should be very similar.

In general: I would go for SAN dedupe, because production SAN is more expensive than backup storage.

Best regards,
Hannes
JRRW
Enthusiast
Posts: 78
Liked: 46 times
Joined: Dec 10, 2019 3:59 pm
Full Name: Ryan Walker
Contact:

Re: Windows Dedupe vs SAN Dedupe - insight sought

Post by JRRW »

Production SAN is more expensive than backup storage...... BUT - RPO/RTO is hard to do when you're in that range of data, if Veeam can't get the backup sizes down. And tapes are bloody expensive all the same =D

So if it can dedupe at 2:1 i'll check to see what the current dedupe ratio is on the 'windows' dedupe volumes. I assume it's not gaining much as msft already has done a lot.

Right now I'm running a test on my smallest server (only 5tb) and initially I'm surprised to see that SAN is still seeing a nearly 2.5:1 reduction (veeam restore of the prod VM into a dev environment) which is darn decent. Then I'll unoptimize the volume (let it grow to full size of ~8tb) and compare backup sizes.

Feels like this would be a good whitepaper for Veeam to do, as almost every vendor is realizing Dedupe is a huge benefit.
HannesK
Product Manager
Posts: 14844
Liked: 3086 times
Joined: Sep 01, 2014 11:46 am
Full Name: Hannes Kasparick
Location: Austria
Contact:

Re: Windows Dedupe vs SAN Dedupe - insight sought

Post by HannesK »

if Veeam can't get the backup sizes down.
vendor independent: the amount of data restored is the uncompressed value. Example: storage does 10:1 compression of 10TB data. Backup will be 10TB. Restore will be 10TB.

Reading compressed data from tape is a different story. Example: backup software compresses 10:1 of 10TB source data. 1TB will be written to tape. 1TB will be read from tape for restore. But still 10TB will be written to the production storage.

2.5:1 sounds even a relatively low value depending on your data and your production storage. But that depends on the storage vendor. "global dedupe" is not the same like "global dedupe". Marketing :-)
as almost every vendor is realizing Dedupe is a huge benefit.
From Veeam perspective the 2:1 rule has proven to be a conservative easy to use rule of thumb. Backup storage costs and speed is what customers usually care about. Deduplication ratio is more a marketing thing.

Example: a fancy super duper global dedupe backup software can do 10:1 compression of 10TB data to 1TB. But it requires large amount of CPU, RAM and flash drives for the deduplication database to do 1GByte/s backup and restore speed. Another software with no compression can do 1GByte/s with a cheap computer storing the 10TB on a hand full cheap spinning disks.

Now you spend for example 10k on the fancy super duper global dedupe backup software including hardware vs. 7k for the other combination (also including hardware).

You get same speed for lower price. One software stores 1TB on disk, the other 10TB. What would you chose? :-)
Post Reply

Who is online

Users browsing this forum: Amazon [Bot] and 8 guests