Comprehensive data protection for all workloads
Post Reply
247it
Novice
Posts: 5
Liked: never
Joined: Apr 06, 2010 7:53 pm
Full Name: Tony Melia
Contact:

Deduplication

Post by 247it »

Just seeking clarification on the dedeplication. At what level does the dedupe happen, per server/job? i.e If I have 3 servers and want to use deduplication, do they have to be backed up in the same job, or does it matter, can they each be separate jobs going to different destinations?
tsightler
VP, Product Management
Posts: 6009
Liked: 2843 times
Joined: Jun 05, 2009 12:57 pm
Full Name: Tom Sightler
Contact:

Re: Deduplication

Post by tsightler »

All servers must be part of the same job if you want dedupe.
247it
Novice
Posts: 5
Liked: never
Joined: Apr 06, 2010 7:53 pm
Full Name: Tony Melia
Contact:

Re: Deduplication

Post by 247it »

Thanks for the response, but it seems a bit odd. So just to clarify, the jobs can only share a dedupe system if they are part of the same job? Can they even share the same 'filename' (in the job properties)? I have a number of systems that I know are 90% the same, so was hoping to dedupe them, but for logistic and other reasons, don't want them to try to all backup at once.
Gostev
Chief Product Officer
Posts: 31457
Liked: 6648 times
Joined: Jan 01, 2006 1:01 am
Location: Baar, Switzerland
Contact:

Re: Deduplication

Post by Gostev »

Correct, deduplication is only within the same job. Your requirements are a bit contradictive: on one hand, you do not want those VMs to be a part of the same backup file; but on the other hand, you want dedupe between all these VMs (which requires single storage by definition of deduplication).
tsightler
VP, Product Management
Posts: 6009
Liked: 2843 times
Joined: Jun 05, 2009 12:57 pm
Full Name: Tom Sightler
Contact:

Re: Deduplication

Post by tsightler »

Well, they don't backup all at once, the backup sequentially. If you have 10 systems in a job they'll backup one at a time.

As far as I know you can't use the same filename for different jobs. I believe the second job will give a message that the file already exist. I think each job uses a UUID to identify if the file correctly matches the job. I'm admittedly not 100% sure about this point.
Gostev
Chief Product Officer
Posts: 31457
Liked: 6648 times
Joined: Jan 01, 2006 1:01 am
Location: Baar, Switzerland
Contact:

Re: Deduplication

Post by Gostev »

Correct, different jobs cannot share the same backup file.
tsightler
VP, Product Management
Posts: 6009
Liked: 2843 times
Joined: Jun 05, 2009 12:57 pm
Full Name: Tom Sightler
Contact:

Re: Deduplication

Post by tsightler »

Gostev wrote:which requires single storage by definition of deduplication.
This is off-topic, but that's not completely true. We have software package that does dedupe that put's the "deduped" blocks in a common file, and the unique blocks in a unique file per host. This system uses a database like engine and allows simultaneous jobs running while supporting a common dedupe pool across all jobs.
Gostev
Chief Product Officer
Posts: 31457
Liked: 6648 times
Joined: Jan 01, 2006 1:01 am
Location: Baar, Switzerland
Contact:

Re: Deduplication

Post by Gostev »

Not sure what is not true then, if you confirm there *is* a single file with all deduped blocks :) and this was exactly my point.

While also off-topic, but I am interested - how large is this single file, and how do you handle this as part of your operations? Do you put it on tape (does it even fit there)? Do you need to restore this file from tape before you can restore individual host data?
tsightler
VP, Product Management
Posts: 6009
Liked: 2843 times
Joined: Jun 05, 2009 12:57 pm
Full Name: Tom Sightler
Contact:

Re: Deduplication

Post by tsightler »

I think I misread your statement. My take away from your comment was that you believed it would not be possible to have a single file with multiple jobs. My point was that, while this may be true with the Veeam architecture, this isn't absolutely true. Actually, we have three non-Veeam products that do various types of dedupe, two that do file level, and one that's block level. All of these products use a database like engine where each server is it's own "job" an the dedeuped data is stored in a common pool, while unique data is stored in unique files. Veeam is the only product we use that keeps dedeuped and non-dedeuped blocks in the same file.

The advantage of the "pool" approach is that multiple jobs can share the same dedupe pool even while running simultaneously. The disadvantage seems to be that there is significant overhead from a performance perspective and of course you have to have the entire "pool" available to preform restores.
Gostev
Chief Product Officer
Posts: 31457
Liked: 6648 times
Joined: Jan 01, 2006 1:01 am
Location: Baar, Switzerland
Contact:

Re: Deduplication

Post by Gostev »

Agreed... additional disadvantage is putting all eggs in one basket (actually, multiple disadvantages with this). Not that it is a big problem for us to extend our engine to global dedupe, but does it make sense?

Back in time, we have made decision to go with per-job dedupe for reasons of self-contained, self-restoring backup files (convenience and compliance reasons) and better tape support (huge global dedupe file is a nighmare to handle both at backup and at restore). We have also saw a strong trend on the dedupe storage device market, and many customers either already owned, or were planning to buy such device to store backups on. And right now, there are so many cheaper alternatives to established dedupe players, that it just makes little sense to buy NAS w/out dedupe (based on cost per TB). Whie storing backups on such devices gives best of both worlds: you get both global dedupe (between backup files of different jobs), and benefits of portable backup files.

Do you agree?
Post Reply

Who is online

Users browsing this forum: Amazon [Bot], ante_704, Bing [Bot], Google [Bot] and 161 guests