Storage Inline Dedupe vs. Veeam Dedupe

bonzovt · Post by **bonzovt** » Feb 22, 2018 8:02 pm this post

A question that has come up a lot recently on my end involves the settings that should be used when a backup repository resides on a SAN that is capable of doing inline dedupe and compression. Should this type of storage essentially be considered the same as a dedupe appliance, and from the Veeam side all dedupe turned off, and tell the repository to decompress data before it is written to disk? It seems like we are using backup repositories a lot now that aren't necessarily Data Domain, ExaGrid, etc. but are actually SAN arrays that have similar inline dedupe capabilities, so I was curious as to what the best option is in that scenario.

Also curious, in the case of HCI where there may be one overall storage container with global inline dedupe, if the the production VM and the backup repository happened to live in the same container (this is slightly theoretical

) how big would the backup of a specific VM be? I realize this is mainly up to the storage and how it does dedupe, but has anyone seen this in action? Theoretically, the data for that VM is already written on disk for the production VM, so wouldn't the backup file be significantly smaller because of dedupe? Or does the process of Veeam backing up the VM and changing it from VMDK format to VBK format somehow change the blocks enough that the storage could still dedupe the data a bit, but you would really have two versions of "similar" data on disk, one for the prod VM and one for the backup file?

DaveWatkins · Post by **DaveWatkins** » Feb 22, 2018 10:00 pm this post

Assuming your SAN dedup is any good (and a lot of them really aren't) then yes, you'd let the SAN do it and not Veeam

Putting your actual backups on the same storage as your VM's would, in theory, give you a huge dedup rate, but you're getting that because you're then not actually storing a full backup of your VM. If you get a single block corrupted it could affect your VM and your backup because both refer to that block. Additionally if that storage fails, your backup and your live data is gone.

chjones · Post by **chjones** » Feb 26, 2018 3:32 am this post

If your storage can perform inline deduplication you should always allow the storage to do this.

Remember, Veeam dedupe only works inside each backup file in a backup chain. If you have this scenario: VBK > VIB > VIB > VBK, dedupe from a Veeam perspective only occurs between blocks inside each of those four files. So you get dedupe between blocks inside the first VBK, then again only on blocks inside the first VIB, and so on. Veeam does not dedupe between backup files.

Deduplication at the storage layer provides deduplication across ALL of the blocks in ALL of the Veeam backup files, so your dedupe rate will be higher (assuming there are blocks to dedupe between files, which there should be if you have multiple full backup VBK files.

You should also ensure the data is decompressed before writing to disk, as storage level dedupe cannot occur efficiently on compressed blocks (same as you do if Veeam is writing to a dedupe appliance).

I'd also warn against storing your Veeam backup files on the same array as your primary storage. You should try to have at least some level of hardware separation for your production and backup data. I understand this isn't always possible, and typically your production storage is the highest performing storage in your environment, so you'd want to use it for fast restores. In this case, I'd recommend using Backup Copy jobs to ensure you have a copy of the required restore points on different media to maximise your data protection.

R&D Forums

Storage Inline Dedupe vs. Veeam Dedupe

Re: Storage Inline Dedupe vs. Veeam Dedupe

Re: Storage Inline Dedupe vs. Veeam Dedupe

Who is online