Best Practice for MS Server 2012 DeDup Repo

jfarr2008 · Post by **jfarr2008** » Oct 29, 2012 6:57 pm this post

Are there any "best practices" when it comes to using MS Server 2012 as a backup target/repo with the new Server 2012 DeDup featured in the Veeam Blog post? Or do the same principles apply that would with a dedup appliance?

Oct 29, 2012 7:25 pm

Pretty much the same principles apply, however, it's important to note that the Windows 2012 dedupe is "post-process", this means you must always have enough space on the storage to contain at least one full, uncompressed backup. This is an important design consideration. Also, I believe that the default schedule for Windows dedupe process to run is at 10PM and run for 6 hours, which is likely to conflict with the backup window. Also, the Microsoft recommendation is to design around the fact that Windows dedupe will process ~100GB/hr for a single volume (you can scale to multiple volumes).

Basically, this just really hasn't been out long enough to develop a "best practices" around yet, however, here's the things to think about:

1. Remember that you will always need at least enough space to store a single, uncompressed full backup
2. If using sure backup and for faster restores, you may want to configure Windows 2012 dedupe to only dedupe files older than a given date (the default is 7 days I believe).
3. Schedule your dedupe window so that it runs when backups are not running.
4. Remember that Windows 2012 dedupe will process only about 100GB of new data per hour. This is probably not a big deal for incremental backups, but can be impactful for full backups.
5. You must use forward incremental, and disable Veeam compression to get the best results.

Because of the "post-process" requirement to have enough space to store a full, uncompressed backup, this will work against the feature for moderate term retentions (i.e. < 30 days) when compared to simply using Veeam compression/dedupe. This is especially for reverse incremental backups which are generally very space efficient anyway. For customers that are looking for longer term retention there can be significant benefits however.

jfarr2008 · Post by **jfarr2008** » Oct 29, 2012 7:46 pm this post

Thanks Tom. With regard to the schedule of the dedup job and the veeam backup window - any reason we couldn't kick off the dedup process with a post veeam job script ? Also, is there any reason NOT to use server 2012 dedup in conjunction with my local reverse incremental backups? Any space savings I can get are a plus.

Post by **tsightler** » Oct 29, 2012 7:52 pm this post

I'm assuming you could schedule the dedup job with a post job script if you really wanted to.

First, assuming you are using Veeam compression with reverse incremental, you are unlikely to see much savings for your effort. First, unless you configure the Windows 2012 dedupe to process files that are "0 days" old, it will never actually dedupe the VBK since it's being changed every night, but if you do this, and happen to get good dedupe, you will likely see significant performance degradation of your backups. Dedupe is not recommended for files that see lots of I/O such as a VBK file in reverse incremental mode.

mongie · Post by **mongie** » Nov 13, 2012 8:36 pm this post

I've seen everyone saying how well Windows 2012 dedup works with Veeam backups, and I've seen hints in my own environment, but I can't seem to get it just right.

I have two repository servers at the moment. Both Dell servers with direct-attached disk. One ~30TB and one ~60TB. Both servers have 48GB of ram, and aren't really doing much apart from running Veeam.

I find that both background processing jobs, and scheduled jobs take FOREVER. On the smaller server, I am seeing 3.5TB saved (13%) which is pretty good, but on the bigger server only ~400GB (0%).

I've also noticed that I'm getting an event log entry that says "Optimization job on volume D: (\\?\Volume{be0c51af-5496-4c14-8df8-bfd76532924f}\) was configured with insufficient memory." I believe Windows uses 25% of available ram for background processing, and 50% for manual jobs by default. Is there a way to increase this?

In cases where you have seen great reductions, do you notice dedup jobs running along fairly quickly? or do they take forever? How do you have your schedule set? How many days? for how many hours? (do you use 2 jobs?)

Also, what are the specs of your hardware? How much RAM?

Any suggestions would be appreciated.

mongie · Nov 13, 2012 8:37 pm

Oh, and also - Has anyone seen a difference between using Dedup Friendly compression in Veeam and No Compression?

jpeake · Post by **jpeake** » Nov 13, 2012 9:28 pm this post

I just started using Win 2012 dedupe also. When I was running it in lab, with fewer machines in a backup job the results were great. Now in the real world the results are not. I think the problem is that I had all my servers in a single backup job, and the resulting files were too large for Win Dedupe engine to ever get through in it's scheduled window.

I have a group of Linux server, much smaller, and see about 80% savings there.

On my Windows server job, it showed 0% savings

So just yesterday I split my Windows machines up into 4 different jobs. Will see how that goes. Trying to keep each full backup file under 1TB. I know I will lose a little Veeam dedupe there

You can run manual jobs, using power shell, or just modify the scheduled tasks that are setup when you enable dedupe. See the cmdlets here:
http://technet.microsoft.com/en-us/libr ... 48450.aspx

My Veeam server has 8 X 2.5Ghz cores and 44GB of RAM. Storage is a Dell MD3600i. I was running dedupe as a backgound task (Windows marks this as low priority) and as a normal job from 12AM - 4PM each week day, and all day on Sunday. It will use up to 50% of RAM during those times.

The fastest rate I would see was about 100 GB/hr for dedupe processing. After a week, it still hadn't got through three 3TB+ files. So that really screws the stats. The % saved in the GUI is based on the size of the volume, even files that have not fully processed yet.

If you run:
Get-DedupeStatus | fl
you can see more accurate stats. Look at the "Optimized..." counters to see how well it is really performing

Resource monitor will show you that it's working also. Look under the "DISK" tab at the system and fsdmhost process. It will also show reads and writes to the ChuckStore folders

jpeake · Post by **jpeake** » Nov 13, 2012 9:33 pm this post

I tried dedupe friendly compression and saw about 10-15% compression rate from Veeam. But that was when all my Win Servers were in a massive job, so dedupe never finished.

Now that I have split to separate jobs, I turned compression off.

mongie · Post by **mongie** » Nov 13, 2012 9:35 pm this post

Thanks for the reply...

How did you get the stat of 100GB/hr?

I've read up on most of the powershell commands, but that "fl" command is very helpful.

jpeake · Post by **jpeake** » Nov 13, 2012 9:40 pm this post

that rate was just an average. My Linux servers job is 49GB for a full and it took about a half hour to finish processing each of them

jpeake · Post by **jpeake** » Nov 13, 2012 9:52 pm this post

dedupe just finished a pass on my Linux Servers folder.

OptimizedFileSize is 177GB
SavedSpace is 157GB
OptimzedFileSavingsRate is 88%

I have mine set to 5 days, so will take another week before I can tell how it works with smaller files on the Windows Servers jobs. And another week after that for it to really shine with another set of fulls. I am hoping for big things. It seems to be pretty solid, just maybe needs smaller (under 1TB) files. Or I just need to be more patient.

It's great that you can pretty much just set-it-and-forget-it though.

mongie · Post by **mongie** » Nov 13, 2012 9:56 pm this post

It looks like it has finished at least one full pass of one of my servers, because all data has been "optimised", but the problem is, its only showing an optimised savings rate of 16% (on 27TB of data.)

If I run another job now, it will take hours to finish - I basically lose track of jobs because I can't tell when one finishes and one starts.

jpeake · Post by **jpeake** » Nov 13, 2012 10:36 pm this post

that seems pretty low. Is that with compression off in Veeam? And LAN target? How many full backups are in that 27TB?

mongie · Post by **mongie** » Nov 13, 2012 11:00 pm this post

23 in total, I think there are 2-3 for each backup job. Some are pretty big too (7.8TB).

jpeake · Post by **jpeake** » Nov 13, 2012 11:52 pm this post

if you look at the file properties on one of those big files, what attributes does it show? I think if the dedupe process has finished, the file attributes will have PL, (sparse and sym link)

Just curious if those big monsters have actually been completed. I couldn;t find any docs from MS regarding this, or if there is a file size limit for dedupe.

mongie · Post by **mongie** » Nov 14, 2012 12:41 am this post

Some of them have been, some havent.

Size: 8.33, Size on Disk: 7.6
Size: 6.6, Size on Disk: 4.2

Not too bad...

I've re-done my scheduled tasks and I'm going to try to get a full pass on both. I found that I could increase memory usage up to 80% from the scheduled tasks, and I've disabled the time restriction... so we'll see what happens I guess.

Nov 14, 2012 1:15 am

It's VERY critical to understand that Windows 2012 dedupe is not really designed for high data ingest rates. In general, the Microsoft recommendation is 100GB/hr, so that means, assuming you use the default 8 hour dedupe process, you can only process about 800GB a day. You can tweak the default job schedule to run more, but even at 24 hours that's only 2.4TB/day. How big are your full backups?

Windows dedupe can scale past 100GB/hr by using multiple datastores and running dedupe jobs concurrently (each dedupe job will only use a single core) but of course that would be a separate dedupe pool.

In other words, comments that it "works great" don't really take into account the impacts of scaling beyond smallish repositories (say 10TB or less), and with your repositories (30TB and 60TB) I'd have to assume that you are ingesting a significant amount of data. I doubt that you have completed a full pass at this point based on your savings rate.

I'm working on a whitepaper with some guidelines but it probably won't be ready for a few more weeks as it takes time to test various setups, but for data of your size, it would likely involve splitting up the volumes into smaller chunks (perhaps 16TB each) and running a dedupe job on each. Note that this might not save significant space compared to Veeam compression with reverse incremental since you have to always have enough free space to store at least one pass of uncompressed full backups. The primary use case is for long term archiva; (months) in which case the Windows dedupe can be a huge win.

mongie · Post by **mongie** » Nov 14, 2012 3:02 am this post

I figured out how to make it run at a higher priority, and I'm now seeing _some_ progress...

Running with 80% memory (39GB) and high priority has started to yield some reduction in data... I'll see how it goes.

mongie · Post by **mongie** » Nov 14, 2012 4:42 am this post

Looking in Resource Monitor shows that fsdmhost.exe is reading from a VBK at ~ 14MB/s on both servers.

It appears to be writing at around 30MB/s at the same time.

I guess that gives a processing rate of ~ 50G/hr. Not BRILLIANT, but not horrible either.

Post by **Gostev** » Nov 14, 2012 2:03 pm this post

mongie wrote:I have two repository servers at the moment. One ~30TB and one ~60TB.
I find that both background processing jobs, and scheduled jobs take FOREVER.

Quote from Veeam forums digest 3 weeks ago:

There seem to be a lot of anxiety among our customers and partners regarding using Windows Server 2012 deduplication in conjuction with Veeam B&R. Many of you are already testing it, and sharing some pretty impressive results. Now, while I do think the technology is great and all, and have been promoting it myself heavily - I want to make sure you do not start implementing it for production use without fully understanding its scalability aspects. For example, from what I gather, you cannot just rip and replace single-pool deduplicating storage appliance with WS2012 deduplicating file server. Because, while Microsoft deduplication is designed to scale towards a datacenter environment, you can only scale it in a very specific way. Here is a great resource to start with > Plan to Deploy Data Deduplication.

TheJourney · Post by **TheJourney** » Nov 15, 2012 11:11 pm this post

Testing win 2012 dedup. This is many different Veeam jobs.

Code: Select all

Volume                             : H:
VolumeId                           : \\?\Volume{f7ceba25-cc4d-48be-adfe-38855f2681c1}\
Capacity                           : 8.95 TB
FreeSpace                          : 4.81 TB
UsedSpace                          : 4.14 TB
UnoptimizedSize                    : 22.43 TB
SavedSpace                         : 18.29 TB
SavingsRate                        : 81 %
OptimizedFilesCount                : 275
OptimizedFilesSize                 : 22.44 TB
OptimizedFilesSavingsRate          : 81 %
InPolicyFilesCount                 : 275
InPolicyFilesSize                  : 22.44 TB
LastOptimizationTime               : 11/15/2012 3:45:34 PM
LastOptimizationResult             : 0x00000000
LastOptimizationResultMessage      : The operation completed successfully.
LastGarbageCollectionTime          : 11/10/2012 4:56:39 PM
LastGarbageCollectionResult        : 0x00000000
LastGarbageCollectionResultMessage : The operation completed successfully.
LastScrubbingTime                  : 11/10/2012 5:10:26 PM
LastScrubbingResult                : 0x00000000
LastScrubbingResultMessage         : The operation completed successfully.

ryan1212 · Post by **ryan1212** » Feb 26, 2013 5:52 pm this post

Has anyone measured the difference between "Dedup friendly" and "no compression" in regards to de-duplication with server 2012?

Post by **tsightler** » Feb 26, 2013 7:04 pm this post

I would expect it to be very similar to other dedupe appliances. Typically "dedupe friendly" compression provides only a 10-20% reduction in the initial size of the VBK and VIB files, while costing roughly that same amount dedupe savings, perhaps slightly more. Saving 10-20% may not sound like much, however, for customers backing up 10's or 100's of TB, this can be a significant savings in network bandwidth and it also generally makes for faster restores, and sometimes even slightly faster instant recovery since 10-20% less data must be read from the backup repository, so it can be a reasonable compromise. Effectively you are trading some hard disk space overall (because of less dedupe) for some up front network and disk bandwidth savings. If you're happy with your current performance and want to maximize dedupe, I would leave compression disabled.

baatch · Post by **baatch** » May 24, 2013 2:34 pm this post

Have read through most if this thread but can't seem wrap my head around the Server 2012 dedup function.

It says 100GB per Hour and that equals to 2,4 TB per day.

So if my full backup is 3,5 TB, should I not use the Server 2012 dedup function then?

Post by **Vitaliy S.** » May 24, 2013 3:54 pm this post

Why not? Since you're using forward incremental backup mode then subsequent incremental job runs will not touch this file, so it shouldn't be an issue to offload 3.5 TB backup file to a dedupe volume. It will just take a bit more time.

baatch · Post by **baatch** » May 24, 2013 4:00 pm this post

Ok so the dedup can resume on that big file even after it has stopped processing it because I set the dedup duration for only 24 hours?

Post by **Vitaliy S.** » May 24, 2013 4:09 pm this post

Hmm...good question. I believe it needs to be verified, but why not specify more than 24 hours?

baatch · Post by **baatch** » May 24, 2013 4:15 pm this post

Going to try have more that 24 hours. Wonder if I can set it to 0 ? Will it always let it run until it completes then?

Post by **Vitaliy S.** » May 24, 2013 4:22 pm this post

Wait a minute...are you referring to the file age parameter that can be set to 0? Just want to make sure we are on the same page right now.

b. File Age: Deduplication has a setting called MinimumFileAgeDays that controls how old a file should be before processing the file. The default setting is 5 days. This setting is configurable by the user and can be set to “0” to process files regardless of how old they are.

Here is the link for more info: http://blogs.technet.com/b/filecab/arch ... -2012.aspx

baatch · Post by **baatch** » May 24, 2013 5:15 pm this post

I was referring to the optimization throughput duration.

R&D Forums

Best Practice for MS Server 2012 DeDup Repo

Re: Best Practice for MS Server 2012 DeDup Repo

Re: Best Practice for MS Server 2012 DeDup Repo

Re: Best Practice for MS Server 2012 DeDup Repo

[MERGED] Windows 2012 Dedup Issues...

Re: Windows 2012 Dedup Issues...

Re: Windows 2012 Dedup Issues...

Re: Windows 2012 Dedup Issues...

Re: Windows 2012 Dedup Issues...

Re: Windows 2012 Dedup Issues...

Re: Windows 2012 Dedup Issues...

Re: Windows 2012 Dedup Issues...

Re: Windows 2012 Dedup Issues...

Re: Windows 2012 Dedup Issues...

Re: Windows 2012 Dedup Issues...

Re: Windows 2012 Dedup Issues...

Re: Windows 2012 Dedup Issues...

Re: Windows 2012 Dedup Issues...

Re: Windows 2012 Dedup Issues...

Re: Best Practice for MS Server 2012 DeDup Repo

Re: Best Practice for MS Server 2012 DeDup Repo

Re: Best Practice for MS Server 2012 DeDup Repo

Re: Best Practice for MS Server 2012 DeDup Repo

Re: Best Practice for MS Server 2012 DeDup Repo

Re: Best Practice for MS Server 2012 DeDup Repo

Re: Best Practice for MS Server 2012 DeDup Repo

Re: Best Practice for MS Server 2012 DeDup Repo

Re: Best Practice for MS Server 2012 DeDup Repo

Re: Best Practice for MS Server 2012 DeDup Repo

Re: Best Practice for MS Server 2012 DeDup Repo

Who is online