Page 1 of 1

Deduplication - Feature Enhancement?

Veeam LogoPosted: Wed Mar 15, 2017 5:55 pm
by ekisner
Hi there... so this post is 100% based on the assumption that my understanding of how Veeam's dedup works is a correct understanding.

My understanding is that Veeam will dedup on a per-chain basis... so if you've got multiple jobs, each with per-vm chain active, each VBK and VRB will be deduplicated only against itself. If you've got one super-job with no per-vm chain, then it will effectively dedup the entire backup. Or you can use file-system or block-level deduplication.

I was recently thinking about how it would be nice to have dedup with ReFS3, without having to resort to using storage hardware with built in dedup.

While I'm sure it would take a significant re-code, what are the thoughts on this kind of file layout:

1) A tiered "chunk" repository file, stored on nice fast flash storage (ie it can be stored "somewhere" defined by the user)... it would of course need to be configured to have a maximum size.
2) Each VBK, VRB, etc, is deduplicated against the chunk file.

In this regard, you now have the resilience of ReFS3, with deduplication, on tiered storage (both cost-effective and probably faster since most people probably don't run their backups to flash). Assuming you were backing everything up to homogenous storage (ie not storing the chunk file on faster storage) you could leverage the ReFS3 fast-clone to make deduplication of duplicate data and rehydration of no longer duplicate data almost immediate. You also get ideal scenario deduplication, resulting in both the flexibility of per-vm chains and the storage efficiency of a single blob job.

Re: Deduplication - Feature Enhancement?

Veeam LogoPosted: Thu Mar 16, 2017 1:32 am
by Gostev
Hi,

We've evaluated this before but decided against this, mostly because it appeared that our backups being self-contained files is one of the top features that our customers like about our product. This gives them better reliability (no SPOF of dedupe pool) and multiple operational benefits, for example you can easily copy required backup to some external drive and take it with you - something with your proposed architecture, where each file requires a dedupe blob.

And in any case, we can expect Microsoft to eventually port their dedupe to ReFS as well (because ReFS is going to completely replace NTFS down the road), which will give you the best of both worlds.

Thanks!