Recommended Backup Job Settings for EMC Data Domain

brupnick · Post by **brupnick** » Nov 06, 2014 8:01 pm this post

Good afternoon-

Now that v8 with DD Boost support has been released, are there new setting recommendations when using a Data Domain as the storage target? I've come across the following, but there might be more:

Storage Target Settings

Align backup file data blocks (Yes/No)
Decompress backup data blocks before storing (Yes/No)

Backup Job Settings

Enable inline deduplication (Yes/No)
Compression level
Storage optimization

Thank you!

Nov 07, 2014 8:53 am

The storage target settings are the following:
Align backup file data blocks: NO
Decompress backup data blocks before storing: YES

Backup repository settings:
Enable inline deduplication: YES
Compression level: Optimal
Storage optimization: Local target

Thanks.

brupnick · Post by **brupnick** » Nov 07, 2014 12:25 pm this post

Thanks, Vladimir!

Post by **veremin** » Nov 07, 2014 12:30 pm this post

You're welcome. Feel free to contact in case additional clarification is needed. Thanks.

Post by **veremin** » Nov 07, 2014 2:38 pm this post

I've edited my answer. So, please double check the provided recommendations (more specifically, part about compression level). Thanks.

Post by **Gostev** » Nov 07, 2014 7:45 pm this post

Lately our architects started recommending Local (16+ TB backup files) storage optimization for deduplicating storage. This significantly improves restore performance without impacting backup performance too much. Please consider this as well. Thanks!

jj281 · Post by **jj281** » Nov 10, 2014 5:54 pm this post

A couple questions... isn't this contradictory?

Decompress backup data blocks before storing: YES

Compression level: Optimal

Wouldn't this take extra CPU cycles on the Gateway to decompress, not to mention the wasted cycles on the Proxy to do it in the first place...? Also, if you're utilizing DDBoost, wouldn't it be better to do no compression since the only blocks process by the DD (vs Proxy/Gateway) that would eat CPU cycles (for Local Compress) would be the changed blocks?

Also, what's the point of Inline deduplication (again with DDBoost)? So it isn't transmitted to the Gateway from the Proxy (Transport)?? If that's the case, the assumption is the transmission of the block is the resource to be concerned about not the compute on the proxy right?

Nov 10, 2014 6:51 pm

jj281 wrote:A couple questions... isn't this contradictory?

Decompress backup data blocks before storing: YES
Compression level: Optimal

Wouldn't this take extra CPU cycles on the Gateway to decompress, not to mention the wasted cycles on the Proxy to do it in the first place...? Also, if you're utilizing DDBoost, wouldn't it be better to do no compression since the only blocks process by the DD (vs Proxy/Gateway) that would eat CPU cycles (for Local Compress) would be the changed blocks?

Certainly you can manually disable compression on your jobs if you like, indeed this will lower the amount of CPU used by the proxy and repository for compression/decompression slightly. However, in most real world deployment, proxy/repo CPU is rarely the bottleneck, especially when using the "optimial" compression setting. In far more cases network bandwidth is the bottleneck, and not just bandwdith, but the extra overhead of sending and receiving 2-3x as much data across the network. That's 2x as many CPU interrupts, and 2x as much data being copied around the network drivers, and all of that uses CPU as well.

In most environments the benefit of using compression between the agents is worth the extra CPU overhead, especially if the CPU capacity is otherwise available and especially in environments with 1GbE networks between proxy and gateway. Veeam optimial compression uses the LZ4 algorithm which is designed for high throughput, and is very light on CPU, especially for the decompression side (a single decompression thread on a single core can decompress GB/s). So indeed, while the overall CPU usage might go up some, the bandwidth savings of 2-3x is worth it for the vast majority of environments. This effectively turns a gateway with a single 10GbE port into a gateway with 20-30Gb of inbound bandwidth.

But of course every environment is different, and you may have plenty of bandwidth even without using compression on the jobs, and perhaps you are CPU constrained instead, in which case, yes, disabling compression at the job level might be beneficial. That's the problem with generic "one size fits all" recommended settings and it why the settings are there in the first place. If the exact same options worked perfectly for every environment you wouldn't need those knobs.

jj281 wrote:Also, what's the point of Inline deduplication (again with DDBoost)? So it isn't transmitted to the Gateway from the Proxy (Transport)?? If that's the case, the assumption is the transmission of the block is the resource to be concerned about not the compute on the proxy right?

I personally have no problem with disabling dedupe in Veeam, and I've changed that in the best practice papers and deployment guides I've written or had input on. But, it really makes very little difference to the backup process itself as the Veeam dedupe engine is very, very CPU light. Leaving it on can reduce the amount of traffic that has to be processed by DDboost overall and may reduce CPU and bandwidth very slightly. I always use this simple example:

Block1 -- AA:BB:CC:DD:EE:FF
Block2 -- BB:AA:DD:CC:FF:EE
Block3 -- AA:BB:CC:DD:EE:FF

So, using a simplistic explanation, DDboost will recognize that there are 6 unique data patterns in each block and reduce those down. This will occur whether Veeam dedupe is enabled or not, however, if Veeam dedupe is disabled DDboost would have to analyze the contents of the third block and thus use CPU to do it. On the other hand, if Veeam dedupe is enabled, DDboost never even see that third block because it would already be recognized by Veeam as an exact duplicate of Block1 and thus never even be written to the repository in the first placed so the DDboost could process it. The total data savings is exactly the same either way, but DDboost had less work to do because Veeam had eliminated that block from the stream already.

In previous documents I always recommended just leaving Veeam dedupe on, since it had almost no negative impact and could have a slight positive impact in saved bandwidth, but I've more recently started telling people to turn if off mainly because it just confuses people and leads to long discussions about something that will overall have very little impact one way or the other. There can also be a slight benefit to turning it off as there is less metadata, which can lead to less read operations from the DD during job startup, but once again, this is usually a minor impact.

jj281 · Post by **jj281** » Nov 11, 2014 4:25 pm this post

Thanks for the detailed explanation, it does help. I know we're talking about slight degrees of resource consumption but its nice to know the reasoning and the more deep-dive aspects of Veeam.

BeThePacket · Post by **BeThePacket** » Jan 25, 2015 7:01 pm this post

Why would Veeam architects recommend Local target (16+ TB backup files) for storage optimization to a DD appliance? The block size of this option is 8MB vs LAN target (512KB) or WAN target (256KB). Last I checked, the smaller the block size of a file the better the dedupe rate, which is something for those of us with Data Domain products really want to get the most out of our investment.

The "change advanced settings to recommended for repository type" prompt is also extremely annoying when creating or modifying a job, since IMO it's suggestion is completely wrong.

Post by **Gostev** » Jan 25, 2015 11:24 pm this post

BeThePacket wrote:Last I checked, the smaller the block size of a file the better the dedupe rate, which is something for those of us with Data Domain products really want to get the most out of our investment.

That is absolutely correct, but you are wrongly applying B&R block size to Data Domain dedupe engine, where it has no play.

Regardless of B&R block size, Data Domain will dedupe Veeam backup files with much smaller blocks (of variable length btw), getting you the best dedupe ratio possible. Because of that, smaller B&R block size will have no impact on Data Domain dedupe efficiency. Reading with larger block size on B&R side helps restore performance though.

Without Data Domain in the picture (e.g. when backing up to raw disk), for best dedupe ratio you indeed would want to go with small block sizes, as B&R will be the only dedupe engine.

BeThePacket · Post by **BeThePacket** » Jan 26, 2015 1:36 am this post

What you're saying is that regardless of the block size a backup file is being stored as on the DD, the space savings will be the same? What about the initial VBK? It would be great for EMC to validate the job settings being proposed/used.

Post by **Gostev** » Jan 26, 2015 3:35 pm this post

Correct. And we did validate this as a part of the mandatory certification testing that EMC requires all backup vendors to perform.

jkowal99 · Post by **jkowal99** » May 05, 2015 1:05 pm this post

Hello,
Trying to get a feel for setting up the best backup job for backing up to the Data Domain 200 device from VEEAM. In the advanced settings, under storage, there are some storage optimization options. The "recommended" option for the DD appliance is "Local Target (16 TB + backup files). The description says "Lowest deduplication ratio and larger incremental backups. Recommended for jobs producing full backup files larger then 16 TB". My questions is, if the VM's i'm backing up aren't anywhere near 16 TB, not even 1 TB, should i be choosing a different option with better deduplication? thanks

May 05, 2015 2:48 pm

Jeremy, please review the thread above for recommended settings and some deeper considerations. Should answer your questions. Thanks!

Post by **jsprinkleisg** » May 29, 2015 3:07 pm this post

Gostev wrote:Lately our architects started recommending Local (16+ TB backup files) storage optimization for deduplicating storage. This significantly improves restore performance without impacting backup performance too much.

Doesn't the larger block size result in reduced restore performance for operations such as FLR and Instant VM recovery, as seems to be the case in this post?

BeThePacket wrote: The "change advanced settings to recommended for repository type" prompt is also extremely annoying when creating or modifying a job, since IMO it's suggestion is completely wrong.

Though the prompt may not be completely wrong, I agree it is annoying, especially if the same advanced settings are not optimal in all situations.

Veeam's documentation on this subject is confusing. KB1956 links to KB1745, which is not consistent with the user guide and the UI prompt, while the white paper makes no mention of the advanced job settings at all. I would certainly like to see more consistency among Veeam's documentation.

Post by **foggy** » Jun 17, 2015 4:33 pm this post

jsprinkleisg wrote: Doesn't the larger block size result in reduced restore performance for operations such as FLR and Instant VM recovery, as seems to be the case in this post?

Reading with larger block requires less IOPS, which results in increased restore performance (comparing to using smaller block on the same storage). Though, even given the recommended settings, restore from dedupe storage is still expectedly slower than restore from raw storage with its corresponding recommended setting, just because of deduplication (which is discussed in the referred post).

jsprinkleisg wrote:Veeam's documentation on this subject is confusing. KB1956 links to KB1745, which is not consistent with the user guide and the UI prompt, while the white paper makes no mention of the advanced job settings at all. I would certainly like to see more consistency among Veeam's documentation.

As you can see from this thread, recommendations might change in time, so I believe we need to update the KB. Thanks!

SE-1 · Jul 04, 2015 5:24 am

Hello

We also use veeam in combination with a DD2500.

We had the setting also put to Local target (16+ TB backup files).

At a certain point we noticed that our incremental backups where very huge.

This was the total of all our jobs

Processed data GB:11587,1
Transferred data GB: 1859,2
Change ratio: 16,05%

At another veeam environment we where hitting 22% change ratio a day...

As we could not understand where this was coming from, we opened a case
It took support about a month to figure out that it was caused by the local target 16TB setting
after we have changed the setting to lan target and our incrementals dropped to 5-6%

Processed data GB: 12015,4
Transferred data GB: 656,4
Change ratio: 5.46%

At another veeam environment we changed the same and incremental dropped to 3-4%.

The problem with the huge incrementals was that our backup copy jobs had also to process almost 2TB of incrementals instead of 656GB...

Also we have disabled compression in the backup job, as the DD does the compression. Compression on Compression can't be good....
We have also disabled the setting decompress backup data blocks before storing on the repository. It has no point to enable this as there is no compression in the job...
Inline de-duplication is also disabled as the DD does the deduplication

There is a lot of discussion about DD in combination with veeam and there are a lot of recommendations.
But every tweak you do, seems to have negative impact on something else

Jul 04, 2015 9:54 pm

Indeed, it's really just a normal "pick any two" situation (out of 3 critical characteristics). Most things in the world work like that... not that you did not go through this when choosing your backup storage for example, which was the "pick any two" situation between Cost, Capacity and Performance (and you chose the first two). Now, you are effectively trying to get Performance back with Cost staying the same - which means you need to sacrifice Capacity! So, it all makes perfect sense

That said, we are constantly making changes and improvements to get as close as possible to the ideal balance of all 3 characteristics. For example, we will reduce the block size of 16+ TB setting in v9 based on lots of real-world data we have collected from the customer environments. And, there is one other optimization in the pipeline that I am really hoping will make it into v9, which can be a game changer with dedupe appliances.

btanguay · Post by **btanguay** » Aug 25, 2015 8:10 pm this post

Hi,

I'm having the same issue like everybody here. But after discussion with EMC i heard that the v9 should have some improvment in restore performance for FLR and Instant recovery by using multistream instead of singlestream with dd boost. Anybody have more info on that? I've already check the v9 post on the Veeam website, but not much information for now.
Thanks

Post by **foggy** » Sep 01, 2015 5:15 pm this post

Benoit, please stay tuned for future announcements.

jpveen · Post by **jpveen** » Sep 11, 2015 8:41 am this post

Hi all,
Backing up using Veeam8 with DDBoost and CIFS to DD produces very large files at the DD when backing up very large VM's (12TB full backup / 1,5TB incremental on a single VM).

DataDomain Mtree replication suffers from those large files as the mtree-replication works per file and cannot use multiple replication threads for a single file (proven and verified by EMC support on DDOS5.5).
In the environment with this large VM's this leads to a too large replication lag...

Question is how to reduce those filesizes, i.e. how to get Veeam to split the backups in smaller files. For example tell Veeam to create files with a max size of 100GB. Does anybody know how to achieve this. Will changing the "storage optimization" from Local target to LAN or WAN target help achieving this? Or is there somewhere a (hidden) setting to set a max backup filesize in Veeam?

Thanks for your thoughts on this!

Post by **foggy** » Sep 11, 2015 12:13 pm this post

There's no such a setting, however, to reduce the size of increment files you can switch the job to use smaller data blocks, which will, however, involve some processing overhead.

jpveen · Post by **jpveen** » Sep 11, 2015 3:03 pm this post

Ok, that was also my idea, however I can't find anywhere which storage optimization target setting (blocksize) results in which filesize.
The only thing I can find is the standard blocksize list Local16: 8MB, Local: 1MB, LAN: 512KB, WAN: 256KB....
But what will be the corresponding (max/split) filesizes?

Post by **foggy** » Sep 11, 2015 3:44 pm this post

There's no such an estimate, everything depends on the pattern of changes within a VM - with smaller block size you would not need to copy, say, entire 1MB block if only 1KB has changed in it, but 256KB only (if you switch to WAN target, for example).

Post by **tdewin** » Sep 11, 2015 3:51 pm this post

jpveen wrote:Ok, that was also my idea, however I can't find anywhere which storage optimization target setting (blocksize) results in which filesize.
The only thing I can find is the standard blocksize list Local16: 8MB, Local: 1MB, LAN: 512KB, WAN: 256KB....
But what will be the corresponding (max/split) filesizes?

Every job creates it own fileset. There is no splitting of files over a certain threshold. In short if you want more and smaller files, in this version, you can do so by splitting your source data up in more jobs. Changing the block size, could potentially help you in storing less data but doesn't make "more files".

jpveen · Post by **jpveen** » Sep 14, 2015 7:33 am this post

Ok, but a smaller job than 1 VM is not possible ;-(
So a 1 VM with 5TB VMDK will create a 5TB file....

(

Post by **foggy** » Sep 14, 2015 11:28 am this post

You could split the backup by backing up each VM disk (if there are several ones) in a different job, however this is not typically recommended as implies restoration issues.

tunturk · Post by **tunturk** » Dec 29, 2015 9:30 am this post

Dear All

I read all posts but a i cant decide our compression ratio in our test platform. Can you inform which is the correct configuration for compression ratio ? "Dedup Friendly" or "None " or " Optimal" for our Data Domain DDBoost target.

Many Thanks

Post by **veremin** » Dec 29, 2015 10:56 am this post

You should be ok with using Optimal level. Thanks.

R&D Forums

Recommended Backup Job Settings for EMC Data Domain

Re: Recommended Backup Job Settings for Data Domain

Re: Recommended Backup Job Settings for Data Domain

Re: Recommended Backup Job Settings for Data Domain

Re: Recommended Backup Job Settings for Data Domain

Re: Recommended Backup Job Settings for Data Domain

Re: Recommended Backup Job Settings for Data Domain

Re: Recommended Backup Job Settings for Data Domain

Re: Recommended Backup Job Settings for Data Domain

Re: Recommended Backup Job Settings for EMC Data Domain

Re: Recommended Backup Job Settings for EMC Data Domain

Re: Recommended Backup Job Settings for EMC Data Domain

Re: Recommended Backup Job Settings for EMC Data Domain

[MERGED] Data Domain Backup Storage Optimization

Re: Recommended Backup Job Settings for EMC Data Domain

Re: Recommended Backup Job Settings for Data Domain

Re: Recommended Backup Job Settings for Data Domain

Re: Recommended Backup Job Settings for EMC Data Domain

Re: Recommended Backup Job Settings for EMC Data Domain

Re: Recommended Backup Job Settings for EMC Data Domain

Re: Recommended Backup Job Settings for EMC Data Domain

Re: Recommended Backup Job Settings for EMC Data Domain

Re: Recommended Backup Job Settings for EMC Data Domain

Re: Recommended Backup Job Settings for EMC Data Domain

Re: Recommended Backup Job Settings for EMC Data Domain

Re: Recommended Backup Job Settings for EMC Data Domain

Re: Recommended Backup Job Settings for EMC Data Domain

Re: Recommended Backup Job Settings for EMC Data Domain

Re: Recommended Backup Job Settings for EMC Data Domain

Re: Recommended Backup Job Settings for EMC Data Domain

Who is online