Deduplication

247it · Post by **247it** » Apr 07, 2010 9:16 pm this post

Just seeking clarification on the dedeplication. At what level does the dedupe happen, per server/job? i.e If I have 3 servers and want to use deduplication, do they have to be backed up in the same job, or does it matter, can they each be separate jobs going to different destinations?

Post by **tsightler** » Apr 07, 2010 10:25 pm this post

All servers must be part of the same job if you want dedupe.

247it · Post by **247it** » Apr 07, 2010 11:53 pm this post

Thanks for the response, but it seems a bit odd. So just to clarify, the jobs can only share a dedupe system if they are part of the same job? Can they even share the same 'filename' (in the job properties)? I have a number of systems that I know are 90% the same, so was hoping to dedupe them, but for logistic and other reasons, don't want them to try to all backup at once.

Post by **Gostev** » Apr 08, 2010 12:27 am this post

Correct, deduplication is only within the same job. Your requirements are a bit contradictive: on one hand, you do not want those VMs to be a part of the same backup file; but on the other hand, you want dedupe between all these VMs (which requires single storage by definition of deduplication).

Post by **tsightler** » Apr 08, 2010 12:27 am this post

Well, they don't backup all at once, the backup sequentially. If you have 10 systems in a job they'll backup one at a time.

As far as I know you can't use the same filename for different jobs. I believe the second job will give a message that the file already exist. I think each job uses a UUID to identify if the file correctly matches the job. I'm admittedly not 100% sure about this point.

Post by **Gostev** » Apr 08, 2010 12:30 am this post

Correct, different jobs cannot share the same backup file.

Post by **tsightler** » Apr 08, 2010 1:05 am this post

Gostev wrote:which requires single storage by definition of deduplication.

This is off-topic, but that's not completely true. We have software package that does dedupe that put's the "deduped" blocks in a common file, and the unique blocks in a unique file per host. This system uses a database like engine and allows simultaneous jobs running while supporting a common dedupe pool across all jobs.

Post by **Gostev** » Apr 08, 2010 12:10 pm this post

Not sure what is not true then, if you confirm there *is* a single file with all deduped blocks

and this was exactly my point.

While also off-topic, but I am interested - how large is this single file, and how do you handle this as part of your operations? Do you put it on tape (does it even fit there)? Do you need to restore this file from tape before you can restore individual host data?

Post by **tsightler** » Apr 08, 2010 5:51 pm this post

I think I misread your statement. My take away from your comment was that you believed it would not be possible to have a single file with multiple jobs. My point was that, while this may be true with the Veeam architecture, this isn't absolutely true. Actually, we have three non-Veeam products that do various types of dedupe, two that do file level, and one that's block level. All of these products use a database like engine where each server is it's own "job" an the dedeuped data is stored in a common pool, while unique data is stored in unique files. Veeam is the only product we use that keeps dedeuped and non-dedeuped blocks in the same file.

The advantage of the "pool" approach is that multiple jobs can share the same dedupe pool even while running simultaneously. The disadvantage seems to be that there is significant overhead from a performance perspective and of course you have to have the entire "pool" available to preform restores.

Post by **Gostev** » Apr 08, 2010 7:21 pm this post

Agreed... additional disadvantage is putting all eggs in one basket (actually, multiple disadvantages with this). Not that it is a big problem for us to extend our engine to global dedupe, but does it make sense?

Back in time, we have made decision to go with per-job dedupe for reasons of self-contained, self-restoring backup files (convenience and compliance reasons) and better tape support (huge global dedupe file is a nighmare to handle both at backup and at restore). We have also saw a strong trend on the dedupe storage device market, and many customers either already owned, or were planning to buy such device to store backups on. And right now, there are so many cheaper alternatives to established dedupe players, that it just makes little sense to buy NAS w/out dedupe (based on cost per TB). Whie storing backups on such devices gives best of both worlds: you get both global dedupe (between backup files of different jobs), and benefits of portable backup files.

Do you agree?

R&D Forums

Deduplication

Re: Deduplication

Re: Deduplication

Re: Deduplication

Re: Deduplication

Re: Deduplication

Re: Deduplication

Re: Deduplication

Re: Deduplication

Re: Deduplication

Who is online