Page 1 of 7

Recommended Backup Job Settings for EMC Data Domain

Posted: Nov 06, 2014 8:01 pm
by brupnick
Good afternoon-

Now that v8 with DD Boost support has been released, are there new setting recommendations when using a Data Domain as the storage target? I've come across the following, but there might be more:

Storage Target Settings
  • Align backup file data blocks (Yes/No)
  • Decompress backup data blocks before storing (Yes/No)
Backup Job Settings
  • Enable inline deduplication (Yes/No)
  • Compression level
  • Storage optimization
Thank you!

Re: Recommended Backup Job Settings for Data Domain

Posted: Nov 07, 2014 8:53 am
by v.Eremin
The storage target settings are the following:
Align backup file data blocks: NO
Decompress backup data blocks before storing: YES

Backup repository settings:
Enable inline deduplication: YES
Compression level: Optimal
Storage optimization: Local target

Thanks.

Re: Recommended Backup Job Settings for Data Domain

Posted: Nov 07, 2014 12:25 pm
by brupnick
Thanks, Vladimir!

Re: Recommended Backup Job Settings for Data Domain

Posted: Nov 07, 2014 12:30 pm
by v.Eremin
You're welcome. Feel free to contact in case additional clarification is needed. Thanks.

Re: Recommended Backup Job Settings for Data Domain

Posted: Nov 07, 2014 2:38 pm
by v.Eremin
I've edited my answer. So, please double check the provided recommendations (more specifically, part about compression level). Thanks.

Re: Recommended Backup Job Settings for Data Domain

Posted: Nov 07, 2014 7:45 pm
by Gostev
Lately our architects started recommending Local (16+ TB backup files) storage optimization for deduplicating storage. This significantly improves restore performance without impacting backup performance too much. Please consider this as well. Thanks!

Re: Recommended Backup Job Settings for Data Domain

Posted: Nov 10, 2014 5:54 pm
by jj281
A couple questions... isn't this contradictory?

Decompress backup data blocks before storing: YES

Compression level: Optimal

Wouldn't this take extra CPU cycles on the Gateway to decompress, not to mention the wasted cycles on the Proxy to do it in the first place...? Also, if you're utilizing DDBoost, wouldn't it be better to do no compression since the only blocks process by the DD (vs Proxy/Gateway) that would eat CPU cycles (for Local Compress) would be the changed blocks?

Also, what's the point of Inline deduplication (again with DDBoost)? So it isn't transmitted to the Gateway from the Proxy (Transport)?? If that's the case, the assumption is the transmission of the block is the resource to be concerned about not the compute on the proxy right?

Re: Recommended Backup Job Settings for Data Domain

Posted: Nov 10, 2014 6:51 pm
by tsightler
jj281 wrote:A couple questions... isn't this contradictory?

Decompress backup data blocks before storing: YES
Compression level: Optimal

Wouldn't this take extra CPU cycles on the Gateway to decompress, not to mention the wasted cycles on the Proxy to do it in the first place...? Also, if you're utilizing DDBoost, wouldn't it be better to do no compression since the only blocks process by the DD (vs Proxy/Gateway) that would eat CPU cycles (for Local Compress) would be the changed blocks?
Certainly you can manually disable compression on your jobs if you like, indeed this will lower the amount of CPU used by the proxy and repository for compression/decompression slightly. However, in most real world deployment, proxy/repo CPU is rarely the bottleneck, especially when using the "optimial" compression setting. In far more cases network bandwidth is the bottleneck, and not just bandwdith, but the extra overhead of sending and receiving 2-3x as much data across the network. That's 2x as many CPU interrupts, and 2x as much data being copied around the network drivers, and all of that uses CPU as well.

In most environments the benefit of using compression between the agents is worth the extra CPU overhead, especially if the CPU capacity is otherwise available and especially in environments with 1GbE networks between proxy and gateway. Veeam optimial compression uses the LZ4 algorithm which is designed for high throughput, and is very light on CPU, especially for the decompression side (a single decompression thread on a single core can decompress GB/s). So indeed, while the overall CPU usage might go up some, the bandwidth savings of 2-3x is worth it for the vast majority of environments. This effectively turns a gateway with a single 10GbE port into a gateway with 20-30Gb of inbound bandwidth.

But of course every environment is different, and you may have plenty of bandwidth even without using compression on the jobs, and perhaps you are CPU constrained instead, in which case, yes, disabling compression at the job level might be beneficial. That's the problem with generic "one size fits all" recommended settings and it why the settings are there in the first place. If the exact same options worked perfectly for every environment you wouldn't need those knobs. :D
jj281 wrote:Also, what's the point of Inline deduplication (again with DDBoost)? So it isn't transmitted to the Gateway from the Proxy (Transport)?? If that's the case, the assumption is the transmission of the block is the resource to be concerned about not the compute on the proxy right?
I personally have no problem with disabling dedupe in Veeam, and I've changed that in the best practice papers and deployment guides I've written or had input on. But, it really makes very little difference to the backup process itself as the Veeam dedupe engine is very, very CPU light. Leaving it on can reduce the amount of traffic that has to be processed by DDboost overall and may reduce CPU and bandwidth very slightly. I always use this simple example:

Block1 -- AA:BB:CC:DD:EE:FF
Block2 -- BB:AA:DD:CC:FF:EE
Block3 -- AA:BB:CC:DD:EE:FF

So, using a simplistic explanation, DDboost will recognize that there are 6 unique data patterns in each block and reduce those down. This will occur whether Veeam dedupe is enabled or not, however, if Veeam dedupe is disabled DDboost would have to analyze the contents of the third block and thus use CPU to do it. On the other hand, if Veeam dedupe is enabled, DDboost never even see that third block because it would already be recognized by Veeam as an exact duplicate of Block1 and thus never even be written to the repository in the first placed so the DDboost could process it. The total data savings is exactly the same either way, but DDboost had less work to do because Veeam had eliminated that block from the stream already.

In previous documents I always recommended just leaving Veeam dedupe on, since it had almost no negative impact and could have a slight positive impact in saved bandwidth, but I've more recently started telling people to turn if off mainly because it just confuses people and leads to long discussions about something that will overall have very little impact one way or the other. There can also be a slight benefit to turning it off as there is less metadata, which can lead to less read operations from the DD during job startup, but once again, this is usually a minor impact.

Re: Recommended Backup Job Settings for Data Domain

Posted: Nov 11, 2014 4:25 pm
by jj281
Thanks for the detailed explanation, it does help. I know we're talking about slight degrees of resource consumption but its nice to know the reasoning and the more deep-dive aspects of Veeam.

Re: Recommended Backup Job Settings for EMC Data Domain

Posted: Jan 25, 2015 7:01 pm
by BeThePacket
Why would Veeam architects recommend Local target (16+ TB backup files) for storage optimization to a DD appliance? The block size of this option is 8MB vs LAN target (512KB) or WAN target (256KB). Last I checked, the smaller the block size of a file the better the dedupe rate, which is something for those of us with Data Domain products really want to get the most out of our investment.

The "change advanced settings to recommended for repository type" prompt is also extremely annoying when creating or modifying a job, since IMO it's suggestion is completely wrong.

Re: Recommended Backup Job Settings for EMC Data Domain

Posted: Jan 25, 2015 11:24 pm
by Gostev
BeThePacket wrote:Last I checked, the smaller the block size of a file the better the dedupe rate, which is something for those of us with Data Domain products really want to get the most out of our investment.
That is absolutely correct, but you are wrongly applying B&R block size to Data Domain dedupe engine, where it has no play.

Regardless of B&R block size, Data Domain will dedupe Veeam backup files with much smaller blocks (of variable length btw), getting you the best dedupe ratio possible. Because of that, smaller B&R block size will have no impact on Data Domain dedupe efficiency. Reading with larger block size on B&R side helps restore performance though.

Without Data Domain in the picture (e.g. when backing up to raw disk), for best dedupe ratio you indeed would want to go with small block sizes, as B&R will be the only dedupe engine.

Re: Recommended Backup Job Settings for EMC Data Domain

Posted: Jan 26, 2015 1:36 am
by BeThePacket
What you're saying is that regardless of the block size a backup file is being stored as on the DD, the space savings will be the same? What about the initial VBK? It would be great for EMC to validate the job settings being proposed/used.

Re: Recommended Backup Job Settings for EMC Data Domain

Posted: Jan 26, 2015 3:35 pm
by Gostev
Correct. And we did validate this as a part of the mandatory certification testing that EMC requires all backup vendors to perform.

[MERGED] Data Domain Backup Storage Optimization

Posted: May 05, 2015 1:05 pm
by jkowal99
Hello,
Trying to get a feel for setting up the best backup job for backing up to the Data Domain 200 device from VEEAM. In the advanced settings, under storage, there are some storage optimization options. The "recommended" option for the DD appliance is "Local Target (16 TB + backup files). The description says "Lowest deduplication ratio and larger incremental backups. Recommended for jobs producing full backup files larger then 16 TB". My questions is, if the VM's i'm backing up aren't anywhere near 16 TB, not even 1 TB, should i be choosing a different option with better deduplication? thanks

Re: Recommended Backup Job Settings for EMC Data Domain

Posted: May 05, 2015 2:48 pm
by foggy
Jeremy, please review the thread above for recommended settings and some deeper considerations. Should answer your questions. Thanks!