Comprehensive data protection for all workloads
rasmusan
Enthusiast
Posts: 47
Liked: never
Joined: Jan 20, 2015 9:03 pm
Full Name: Rasmus Andersen
Contact:

Deduplication appliance recommendation

Post by rasmusan »

Hello

I am looking for some recommendations regarding deduplication solution... Which products/solutions do you people have experinces with (good/bad) ?

Specifically I have a case with a customer who has around 40TB of data - where a big part of this is graphical/CAD data. Deduplication solution is to be used for longer term retetion, as they have other storage solution for first copy of backup data... We have been looking at EMC Data Domain DD2500 with DDBoost as a possible solution, however this is quite expensive...

also what about Windows server 2012 R2 deduplication for these amounts of data - has anyone experince with this?

Gostev
SVP, Product Management
Posts: 26676
Liked: 4268 times
Joined: Jan 01, 2006 1:01 am
Location: Baar, Switzerland
Contact:

Re: Deduplication appliance recommendation

Post by Gostev »

Most optimistic Windows Server 2012 R2 dedupe scalability limit that I saw reported was around 10TB, and I personally recommend no more than 5TB. That said, are you sure you want deduplication in this particular case? Most likely, this data will not dedupe very well... so a raw storage may end up much cheaper, and will guarantee the performance levels too. Remember, with Veeam you can just go with an industry standard server stuffed with large hard drives.

rasmusan
Enthusiast
Posts: 47
Liked: never
Joined: Jan 20, 2015 9:03 pm
Full Name: Rasmus Andersen
Contact:

Re: Deduplication appliance recommendation

Post by rasmusan »

Hi Gostev

yes I was not that into Server 2012 R2 Dedup as well... just could not find some documentation telling the limits, but thanks for pointing that out :)

Well, as I have a traditional storage array for the purpose of short retention (and also due to performance), however this is for the longer retention. If you want to retain like a year worth of data with a monthly interval, and Veeam can compress to lets say 25TB, you would need quite a lot of raw disk space - so the purpose of the dedup is to have these multiple archive copies... makes sense ?

Gostev
SVP, Product Management
Posts: 26676
Liked: 4268 times
Joined: Jan 01, 2006 1:01 am
Location: Baar, Switzerland
Contact:

Re: Deduplication appliance recommendation

Post by Gostev »

Yes, it certainly does. I did not know if you were thinking GFS, or something else.

By the way, depending on the time scale of this project, you may also want consider evaluating Windows Server 2016 deduplication, which has major deduplication engine enhancements coming, many of which are specifically aimed to enhance performance and scale.

Matte
Lurker
Posts: 1
Liked: never
Joined: Oct 27, 2014 10:29 am
Contact:

Re: Deduplication appliance recommendation

Post by Matte »

Looking at the enhancements Microsofts have done in Windows Server 2016 Deduplication, I'm not sure that they are enough to make it viable for such a case.

The limits in Windows Server 2012 R2 are volumes < 10TB, but files approaching 1TB aren't good candidates.
The limits in Windows Server 2016 are volumes up to 64TB (As it is now multi threaded) and files up to 1TB.

Can Windows Server Deduplication handle larger files? Probably, but would require some case specific testing, to see the actual results and performance. I have personally seen Windows Server 2012 R2 have issues with .vbk files larger than 1TB, and it isn’t pretty when that happens. Remember, the official Microsoft recommendation is that these files "aren't good candidates".

Of course you could split out your VMs into multiple backup jobs to keep the .vbk size down, but considering its graphical/CAD - the customer probably has large file server(s), which wouldn't make that a good option - but that’s just an assumption.

dellock6
Veeam Software
Posts: 5926
Liked: 1743 times
Joined: Jul 26, 2009 3:39 pm
Full Name: Luca Dell'Oca
Location: Varese, Italy
Contact:

Re: Deduplication appliance recommendation

Post by dellock6 »

Just a hint from a linux guy... If you are looking for a cheap/free solution, give a try to opendedup, it's has no file size limit as I remember ;)
Luca Dell'Oca
Principal EMEA Cloud Architect @ Veeam Software

@dellock6
https://www.virtualtothecore.com/
vExpert 2011 -> 2020
Veeam VMCE #1

Gostev
SVP, Product Management
Posts: 26676
Liked: 4268 times
Joined: Jan 01, 2006 1:01 am
Location: Baar, Switzerland
Contact:

Re: Deduplication appliance recommendation

Post by Gostev »

Matte, my recommendation above is also based on the fact that there is a new backup storage option in B&R v9 which goes hand in hand with Windows Server 2016 deduplication needs and requirements. It's a part of one top secret feature that we will not announce until closer to the release though. Thanks!

rasmusan
Enthusiast
Posts: 47
Liked: never
Joined: Jan 20, 2015 9:03 pm
Full Name: Rasmus Andersen
Contact:

Re: Deduplication appliance recommendation

Post by rasmusan »

I am not necessarily looking for at free/cheap solution - just the "optimal" solution for deduplication with fairly large amounts of data... what do other customers do? what are your experiences?

hans_lenze
Service Provider
Posts: 17
Liked: 4 times
Joined: Sep 07, 2012 7:07 am
Contact:

Re: Deduplication appliance recommendation

Post by hans_lenze » 2 people like this post

We mostly use standard rack servers with Windows Server 2012R2 deduplication and a lot of local storage. It works fine but you have to tick all the boxes when you set it up or you'll be in a world of hurt along the road. The maximum file fragmentation count for NTFS volumes is a nasty little detail that can cause big problems (format with /L and edit the registry to quickly defragment individual files). We store GFS backups offsite and you can expect 60 to 80% reduction on top of the Veeam dedup and compression. The full backup in most chains is over 2,5TB so big files work kind of okay with Windows Server dedup. The biggest problem is the fact that it's a single threaded process and it slow down when it runs out of memory or when the file is very big. In time it will deduplicate the whole file but it will take some time to process that first full backup. The deduplication process remembers which blocks were processed and picks up where it left off. You just won't see any additional free diskspace until the whole file has been processed.
Remember that you have to keep sufficient free disk space to fill the chuck store or the process wil fail and you're stuck.

We've been looking at dedicated appliances and got some ExaGrid boxes. So far they are a dream to work with. The on board Veeam datamover and dedicated "landing zone" work as advertised and the performance is very good. I've seen 600MB/s when restoring VMs to the virtualization platform. They don't need any additional licenses and they work as a repository straight out of the box.

damasta
Lurker
Posts: 1
Liked: 1 time
Joined: Jun 21, 2015 9:43 am
Contact:

Re: Deduplication appliance recommendation

Post by damasta » 1 person likes this post

Hi,

If you prefer a turn-key appliance, you might want to check out the Fujitsu ETERNUS CS800 - which is cheaper and faster than the EMC DD boxes.

But if you prefer a DIY solution, what's wrong with windows native dedup? It's free and included in every 2012 R2 server you buy. Just switch it on.

Microsoft officially supports dedup with their own product DPM, provided you follow these tweaking guidelines:
https://technet.microsoft.com/en-us/lib ... 91438.aspx
In short, they tell you to split your backup repository across multiple volumes that are between 5-7 TB in size, change dedup operation so it is better suited to very large container files and setup scheduling so that dedup doesn't collide with your data protection jobs.

As matte pointed out, most windows dedup limitations have been addressed in server 2016:
http://blogs.technet.com/b/filecab/arch ... iew-2.aspx
I am evaluating that as I write this - with great results so far.

Personally, I agree with hans_lenze's warning: windows dedup is offline, and the target must always have enough free space to be able to ingest a full backup. And if for some reason, dedup can't finish processing all data before the next dump - the problem is vastly exacerbated.
The DD and CS appliances don't have that issue. they dedup in memory, before anything hits the drives. So you don't need to worry about free disk space. Or at least not until much much later than any offline dedup scheme.
Also, a full backup of 40TB will be... a challenge for any target. So get something that performs well! In three to five years, those 40TB will easily swell to 80TB...

At the end of the day, evaluating windows dedup is only going to cost you some time and a bunch of TB to scratch around on. Spin up a windows server VM, and test dedup on a volume of your choice. Since dedup doesn't depend on storage spaces or hyper-v, you can run it comfortably inside a VM, for testing and production purposes. In fact, my current test setup is a ~1.5 TB VHDX residing on a qnap NAS box, which I mounted on my laptop, that has been passed through client hyper-v to my server 2016 test VM. No, it's not fast. ;-) but I am more interested in compression rates than performance right now.

rgarrison
Novice
Posts: 7
Liked: never
Joined: Jan 08, 2015 1:45 pm
Full Name: Ryan Garrison
Contact:

Re: Deduplication appliance recommendation

Post by rgarrison »

I'm a fan of Data Domain, which has been working very well for us with DDBoost and Veeam. We have 12 Data Domain appliances receiving backup data and replicating and after tweaking the settings, it "just works". It took some time and trial and error to find the optimal settings for the Veeam jobs, but once we did it's been smooth sailing. Dedup ratio is good for our data, but that is obviously very dependent on the nature of the source material.

DD and DDBoost is definitely expensive, but it does work very well with Veeam. Once Veeam adds the managed file replication capability, it will be close to perfect in my eyes.

In no way am I trying to bad mouth Windows dedup (I don't have experience with it), but I thought I'd at least share a positive experience with Data Domain and Veeam.

jian17
Novice
Posts: 3
Liked: never
Joined: Jun 26, 2015 2:18 pm
Contact:

Re: Deduplication appliance recommendation

Post by jian17 »

Hello rgarrison, what kind of restore speeds are you seeing from your DD? Both full VM and file level recovery?

We are seeing pretty slow restore coming from our DD2500 and have tried every setting suggested in the other threads.

smd32
Service Provider
Posts: 14
Liked: never
Joined: Jan 24, 2016 4:34 am
Full Name: Scott Drassinower
Contact:

Re: Deduplication appliance recommendation

Post by smd32 »

Gostev wrote:Matte, my recommendation above is also based on the fact that there is a new backup storage option in B&R v9 which goes hand in hand with Windows Server 2016 deduplication needs and requirements. It's a part of one top secret feature that we will not announce until closer to the release though. Thanks!
Did this secret feature turn out to be the scale-out backup repositories or something else?

Gostev
SVP, Product Management
Posts: 26676
Liked: 4268 times
Joined: Jan 01, 2006 1:01 am
Location: Baar, Switzerland
Contact:

Re: Deduplication appliance recommendation

Post by Gostev »

Per-VM backup file chains. This option lets you keep individual backup file size small (according to each individual VM size) without having to create a dedicated job for every VM. Microsoft does not recommend giving Windows dedupe large files to work with.

smd32
Service Provider
Posts: 14
Liked: never
Joined: Jan 24, 2016 4:34 am
Full Name: Scott Drassinower
Contact:

Re: Deduplication appliance recommendation

Post by smd32 »

So per-VM backup file chains, scale-out repository, and leave dedupe to Windows Server 2012 R2 or 2016 instead of Veeam -- gets you most of the functionality of a dedupe appliance? Or go with the first two and stick with Veeam dedupe but just spend the extra cash for more disk?

foggy
Veeam Software
Posts: 19427
Liked: 1762 times
Joined: Jul 11, 2011 10:22 am
Full Name: Alexander Fogelson
Contact:

Re: Deduplication appliance recommendation

Post by foggy »

I don't think you can get anything comparable to a dedupe appliance with a Windows box with deduplication enabled, however, per-VM backup chains and scale-out repository definitely allow you to get maximum of deduplicating storage in terms of backup jobs performance. As for Veeam inline deduplication, you can leave it enabled, but disable compression to achieve better deduplication rates.

kte
Expert
Posts: 176
Liked: 7 times
Joined: Jul 02, 2013 7:48 pm
Full Name: Koen Teugels
Contact:

Re: Deduplication appliance recommendation

Post by kte »

Storeonce VSA is also a software solution that is a VM and can be installed to 50TB of netto storage capacity. I just don't know if CAD duduplicates correctly

Gostev
SVP, Product Management
Posts: 26676
Liked: 4268 times
Joined: Jan 01, 2006 1:01 am
Location: Baar, Switzerland
Contact:

Re: Deduplication appliance recommendation

Post by Gostev »

foggy wrote:I don't think you can get anything comparable to a dedupe appliance with a Windows box with deduplication enabled, however, per-VM backup chains and scale-out repository definitely allow you to get maximum of deduplicating storage in terms of backup jobs performance.
Actually, the dedupe ratio is very comparable (I would even say identical) as all the same algorithms are used anyway. And performance will always be great because Windows server dedupe is not inline - but rather post-process, so backup always land on raw storage at full storage speed (and chunked into the dedupe store later). Same principle as ExaGrid.

What is not comparable is functionality: there is source-side dedupe client (like DDBoost or Catalyst), no deduped volume replication, etc.
smd32 wrote:So per-VM backup file chains, scale-out repository, and leave dedupe to Windows Server 2012 R2 or 2016 instead of Veeam -- gets you most of the functionality of a dedupe appliance? Or go with the first two and stick with Veeam dedupe but just spend the extra cash for more disk?
Windows Server 2012 R2 dedupe does not scale well, and is typically not recommended by our users who actually tried to use one on more than 5TB of VMs. I've seen occasional reports about 10TB and even more, but this required lots of tweaking.

Windows Server 2016 dedupe is yet to be released and seen in action - until then, no recommendations can be made.

mkaec
Expert
Posts: 348
Liked: 79 times
Joined: Jul 16, 2015 1:31 pm
Full Name: Marc K
Contact:

Re: Deduplication appliance recommendation

Post by mkaec »

The missing piece on Microsoft's side is post-deduplication replication. If one were to try to stay within the Windows tools, DFS replication would be used to replicate to an off-site location. Unfortunately, DFS-R currently rehydrates before replicating. I think even using Veeam to do the replication would still require rehydration from a Windows dedup store.

But, I think Windows dedup would work very well in cases when you don't need to replicate. (Such as if you are backing up files that have been received from Remote locations via DFS-R.)

Gostev
SVP, Product Management
Posts: 26676
Liked: 4268 times
Joined: Jan 01, 2006 1:01 am
Location: Baar, Switzerland
Contact:

Re: Deduplication appliance recommendation

Post by Gostev »

Correct. I would use it as a secondary backup repository to target Backup Copy jobs to.

chimera
Enthusiast
Posts: 57
Liked: 3 times
Joined: Apr 09, 2009 1:00 am
Full Name: J I
Contact:

Re: Deduplication appliance recommendation

Post by chimera »

We have been looking at EMC Data Domain DD2500 with DDBoost as a possible solution, however this is quite expensive
EMC DD with DD boost gets my vote (yes they are expensive, but you get what you pay for)
They are brilliant boxes. Set and forget. We get insanely fast backups to it from Veeam.

stephane.duprat
Lurker
Posts: 1
Liked: never
Joined: Jul 24, 2014 7:30 am
Full Name: Stéphane DUPRAT
Contact:

Re: Deduplication appliance recommendation

Post by stephane.duprat »

Hi,
I'm using 2 DD 620 and they get my vote too.
No problem with replication. Deduplication very good. EMC support OK.
I'm interrogating me for change to DD2500
But only for backup. Restore take long long time.

Chimera, I would know if with DD 2500, restores are better (Granular files or email or SharePoint objects).

Thanks

lightsout
Expert
Posts: 222
Liked: 59 times
Joined: Apr 10, 2014 4:13 pm
Contact:

Re: Deduplication appliance recommendation

Post by lightsout »

I've got a new Quantum DXi 4700, and that performs really pretty well. I am only doing backup copies to it, but it ingests data as fast as I can throw at it!

Doesn't have the cool integration that the Data Domain does, but 10Gb CIFS connectivity works well enough for my needs. I actually really like the look of Exagrid's appliances too.

rreed
Expert
Posts: 354
Liked: 72 times
Joined: Jun 30, 2015 6:06 pm
Contact:

Re: Deduplication appliance recommendation

Post by rreed »

Having nothing but trouble from our DD's, though it's been since found the problem may not have been quite as much w/ the devices directly, but then again we do still see occasional issues w/ ours that are resolved w/ nothing more than a reboot - we thought we'd give the Dell DR's (DR4100 in our case) a try. At a fraction of the up-front cost, a fraction of the yearly support renewals, native 10Gb, and better performance once the data gets onto the device, we will not be renewing our sky high DD support and simply letting them die on the vine as unsupported misc. archival storage. Granted, Dell has recently bought EMC so not sure how much of a moot point it would be anyway. :wink: Our Dells came in at about 1/3 of the cost of our old EMC's and been much more stable and much faster.

Now, as far as dedupe appliances, based on my experience I wouldn't recommend on buying into the fad for any manufacturer. Before walking in the door here I had about 4-5 years' experience w/ Veeam and the old classic staging to plain old Windows storage first, then pull that off to tape for archival. It just worked, rock solid, for years. We'd keep about two weeks on disk since that's into what most of our restore requests would fall, plus of course last night's disaster recovery. We had a very open expectation w/ users if we had to go to tape, it would take longer, especially if it was off-site tape. No problem, everybody was happy.

My current company had long before since into the dedupe sensation w/ two pairs of DD620's and a pair of DD2200's (two data centers w/ matching sets, cross-replicated at the DD level) before I walked in the door. As mentioned lots of writing problems that have since turned out to be a VMware NIC driver problem, but still occasionally like to fail authentication, which fails the backups for that night - or weekend, albiet a simple reboot the next morning fixes it until the next time. We have had to remove/re-add to AD authentication when they get *really* broken a few times but it's a quick fix. Still, "reliability." Now, speed. Dedupe devices will ingest really well, and that's what the marketing pushes. Try a restore and you'll be waiting around for a very long time. Setting your jobs to largest block size (16+TB) will help, but it's nowhere near pulling directly from disk. Start a VM from backup? No, though Veeam indicates there's a better chance of it happening w/ V9. Everyone here has been appalled w/ restore speeds, and we once had an emergency restore of about 2TB of our main file server. Took a couple of days. File level restore? What used to take around 10-15 min. back in the day of just plain old disk storage now takes no less than half a work day from dedupe. Maybe more if it bombs out half way through, the user says no that's not the file, etc. I've since been able to tune and tweak to get it better, but it's still dismal. And here we're writing straight to dedupe which is not best practice. I've tried pushing getting some staging disk space to help mitigate emergency restores but no, there's still the expectation that dedupe devices are perfect. In the end, they are effectively a one-way street of data storage. They'll store great, just don't expect to get much back out of it, at least not any time soon.

I get the dedupe/compression aspect - about 86%-88% real-world (despite what the sales manager tells you). That's fantastic, we're storing hundreds of TB's in just around 30-40TB of raw dedupe storage. I completely get the attraction. Veeam's dedupe and compression may not quite hit those marks but v9 has really slugged the crap out of it, I've seen our v9 final vbk's and vib's shrink to 1/3-1/4 of their previous v8 size in some cases due to dirty blocks being written to storage in previous versions. However, what is the cost of a large JBOD vs. even a Dell DR, or especially EMC DD? I bet a big tray crammed full of disks and v9's much better storage capabilities would win way out over a dedupe box - and you can actually restore files/VM's, run a VM from backup(!!!), more easily manage, etc. I would recommend staging JBOD w/ v9, and final archival to tape. Walk away from the cool aid of dedupe devices. They're marketed quite well but just don't deliver back to you when you need it back out of them. You asked for recommendation, these are my opinions based on experience.
VMware 6
Veeam B&R v9
Dell DR4100's
EMC DD2200's
EMC DD620's
Dell TL2000 via PE430 (SAS)

ChuckS42
Expert
Posts: 145
Liked: 21 times
Joined: Apr 24, 2013 8:53 pm
Full Name: Chuck Stevens
Contact:

Re: Deduplication appliance recommendation

Post by ChuckS42 »

All I will say is to NOT get a Dell DR4100 or DR6000, until and unless Veeam officially supports them. Very painful to use with Veeam right now.

rreed
Expert
Posts: 354
Liked: 72 times
Joined: Jun 30, 2015 6:06 pm
Contact:

Re: Deduplication appliance recommendation

Post by rreed »

Dang, hate to hear that Chuck. We've had opposite experience here w/ the Dells vs. EMC. Our DR4100's have been solid and reliable vs. the DD's, but admittedly our old DD's are hanging off Cisco 3750 switches which do not have adequate buffers to handle the throughput (high output drops at the DD switch ports). And we're piping (2x) 10G LAG from our SAN/VMware infrastructure to the 3750's, from there just 1G ports to the DD's so probably not fair. I'm in process of moving the DD's 1G over to our 10G core via mini GBIC's which has solved the output buffer problem but we still get the occasional "I'm not going to authenticate anyone until someone reboots me" w/ the DD's. Dells are hanging off the same 10Gbps core and seem to keep up w/ ingesting our backups just fine.

Anyways, if you're going to buy into deduplicating devices, please make sure your infrastructure can handle the throughput required for backups to run. Or any storage device for that matter. We had a lot of long-standing backup issues that had us pointing our fingers at the storage device until we figured out we were using desktop wiring closet switches in our data center and some old pNIC drivers on the VMware side were plaguing us w/ endless disconnects.
VMware 6
Veeam B&R v9
Dell DR4100's
EMC DD2200's
EMC DD620's
Dell TL2000 via PE430 (SAS)

pirx
Enthusiast
Posts: 72
Liked: 9 times
Joined: Dec 20, 2015 6:24 pm
Contact:

Re: Deduplication appliance recommendation

Post by pirx »

Any opinions in StoreOnce, especially the 6500? We already use some smaller modelles at remote sides withot Veeam, but we have 0 experience with other vendors.

ChrisSnell
Technology Partner
Posts: 126
Liked: 18 times
Joined: Feb 28, 2011 5:20 pm
Full Name: Chris Snell
Contact:

Re: Deduplication appliance recommendation

Post by ChrisSnell »

The most important consideration is not backup performance, but restore performance. Because ExaGrid is the only dedupe device to use a landing zone, it can recover from this 'normal disk' area without having to rehydrate deduped VMs. This gives a much quicker recovery, and so reduced downtime. The integration of a Veeam data mover in to the appliance also massively helps in terms of backup speed (1.6x faster than to CIFS) and also synthetic full creation (6x faster than using a proxy).

The landing zone, and then post-backup dedupe process in to a retention zone, also provides it with some of the quickest backup performance - this obviously depends on appliance.

Having been an SE at Veeam for 4 years, and now at ExaGrid - hopefully folks will be able to take my word. Happy to discuss further with anyone.

sdelacruz
Enthusiast
Posts: 50
Liked: never
Joined: Feb 01, 2011 8:09 pm
Full Name: Sam De La Cruz
Contact:

[MERGED]Re: Recommendations for backup storage, backup targe

Post by sdelacruz »

Need to decide for a new data repository. My goals is to have the fast recoveries.



Dell Windows Server 2016 refs with 16 4TBs NLSata vs Raid 10 vs. EMC Data domain 2200.

Do deduplication appliances perform well on recovery speed?

Sam

DGrinev
Expert
Posts: 1943
Liked: 248 times
Joined: Dec 01, 2016 3:49 pm
Full Name: Dmitry Grinev
Location: St.Petersburg
Contact:

Re: Recommendations for backup storage, backup target

Post by DGrinev »

Hi Sam,

In general deduplication appliance has a negative impact on the recovery performance, since deduplicated data blocks should be rehydrated before the restore (unless it has a so called landing zone that allows increase performance of restore process for the recent restore points).
Please review this thread for additional information. Thanks!

Post Reply

Who is online

Users browsing this forum: dmitry-ch and 66 guests