Backup copy job data deduplication.

ClarkWGriswold · Aug 28, 2018 10:53 pm

I'm running Veeam B&R v9.5. I have two backup jobs that run hourly and backup to one iSCSI LUN (formatted ReFS), then two backup copy jobs that run to a separate LUN (also ReFS). As an example, one of my VMs is ~7TB including empty allocated space.

If I look at the backup copy restore points, my oldest full backup for the above VM shows the following:
Data size = 7.11 TB
Backup size = 3.79 TB
Deduplication = 1.9X
Compression = 1.0X

These numbers seem pretty inline with what I would expect.

This VM contains a lot of data that will not compress well (pictures and video). The rate of file change for this VM is probably less than 1% per week. It's virtually all static data that does not change. However, subsequent backup copies are almost exactly the same, with no further dedupe. They are all 7.11TB data, 3.85TB backup, 1.8X deduplication, and 1.0x compression. So, 6 weekly backups and a handful of VIBs are using nearly 24TB of data. Size on disk in Explorer shows the same thing. Every single weekly backup copy is over 4,000,000,000KB.

Given that this machine is backing up the same exact data over and over, I would expect every backup after the first full to be drastically smaller. Virtually every single block in the backup should get deduped, and a dedupe ratio of 200X or even higher should easily be possible. I see this exact same behavior for my other backup copy job too, with every single full (not incremental) backup of every single VM being the same size as the first, with seemingly no deduplication happening.

I checked the backup jobs themselves: "Enable inline data deduplication" is enabled, "Exclude swap file blocks" is enabled, "Exclude deleted file blocks" is enabled, and compression level is set to Dedupe-friendly. Storage optimization is set to Local target. There are no such settings for backup copy jobs, so I assume they inherit the parent job's settings.

Why aren't these backups deduping to one another? Do backup copies not dedupe?

Post by **foggy** » Aug 29, 2018 2:49 pm this post

There's no deduplication between different backup files, data is deduped within a backup file only. If you refer to not having spaceless fulls that come from ReFS, then check if FastClone is effectively being used during your jobs. Please note that you should have created new active full for it to take effect in case the chain wasn't initially created on ReFS.

ClarkWGriswold · Aug 29, 2018 6:05 pm

Data block alignment is enabled on the repository and Fast Clone is working on the synthetic full backups for both jobs. However, that doesn't seem to be doing us any good on our backup copies, which is how we do long term retention. Each of our monthly backup copies are full size on disk.

Is there anything we can do to allow backup copies to dedupe to each other? We used ReFS on our repository because Veeam touted new features like resiliency and Fast Clone, but Windows Server deduplication does not work on ReFS volumes and our disk space is disappearing fast, since every single backup is using very nearly the full amount of space that it consumes on the source VM.

Post by **foggy** » Aug 30, 2018 12:56 pm this post

Looks strange, I'd suggest asking support to take a closer look.

ClarkWGriswold · Sep 04, 2018 6:58 pm

Apparently, Veeam dedupe doesn't work this way:

"How deduplication functions within a backup or copy job is that it will create, deduplicate, and store our backup files. Deduplication is done per file, and the files will not deduplicate between each other, as deduplication is only applied within the blocks of the current backup file being written."

Veeam cannot dedupe between different backup jobs, cannot dedupe between different VMs in the same job if you are using per-VM backup files, and cannot even dedupe between the EXACT SAME VM AND THE EXACT SAME JOB, as it's limited to a single backup file. These limitations make Veeam's dedupe virtually worthless for my organization, as the amount of dedupe-able data in a single file is going to be minimal, and because we use per-VM backups.

To get any sort of ACTUAL dedupe requires either a deduplicating appliance or running something like Windows Server dedupe. Unfortunately, the latter doesn't work on ReFS (which we used because it was the recommended platform for B&R version 9). This is really frustrating. My company is now stuck between a rock and a hard place. Our backups are consuming a ton of space and the only way to get any deduplication is to move to a new applicance (not gonna happen) or to migrate all of our data to another location and reformat our current repositories using NTFS.

Very disappointing.

Post by **foggy** » Sep 05, 2018 5:51 pm this post

The quote is correct, it is how Veeam inline deduplication works. Please note that it is not a replacement for dedupe appliance or Windows Deduplication - Veeam dedupe uses large blocks sizes (1024, 512 or 256 KB) while appliances use small (4 KB) or variable block size to get maximum data reduction rates. Please review this FAQ section for some considerations regarding Veeam inline deduplication.

While Veeam inline deduplication works within a single file, FastClone allows not to store blocks shared between multiple files (of the particular backup chain) several times, which effectively means that all GFS restore points produced by your backup copy jobs, for example, shouldn't occupy any considerable space. If this doesn't occur, this should be investigated.

ClarkWGriswold · Sep 05, 2018 5:54 pm

foggy wrote:FastClone allows not to store blocks shared between multiple files (of the particular backup chain) several times, which effectively means that all GFS restore points produced by your backup copy jobs, for example, shouldn't occupy any considerable space. If this doesn't occur, this should be investigated.

This is good to hear, as this is what I expected and is not what's happening. Is this true for backup copies using GFS retention, or only for backup jobs? I was under the impression from all of the documentation that backup copies using GFS retention should dedupe using FastClone, but that is not what I'm seeing. I've got a case open and have provided logs, but haven't made any headway yet.

Post by **foggy** » Sep 06, 2018 3:16 pm this post

This is true for backup copy GFS restore points. Could you please share your case ID?

ClarkWGriswold · Sep 06, 2018 3:25 pm

Case # 03172999. I have a WebEx scheduled for today to investigate.

Post by **foggy** » Sep 06, 2018 3:27 pm this post

Ok, keep us posted on the results.

Post by **veremin** » Sep 07, 2018 6:38 pm this post

By the way, how exactly you determined that there were no saving for GFS restore points? By looking at the file system (it doesn't provide information regarding that)? Are you aware of this tool? Thanks.

ClarkWGriswold · Sep 07, 2018 7:18 pm

I keep trying to write up a full summary, but keep getting 403 Forbidden from the forum.

ClarkWGriswold · Sep 07, 2018 7:39 pm

Worked with support yesterday to resolve the issue and it turns out there was no issue. Everything is working as designed and files are deduping. It's just difficult to tell that it's happening. Veeam reports Fast Clone in my incrementals, but it's difficult to tell that it's working in GFS backup copies. Even the engineer I worked with was confused.

Because dedupe is happening at the file level, Veeam doesn't report any information for it. My jobs show that Fast Clone is happening, but the dedupe and compression statistics reported by Veeam do not take ReFS block clone into account. The dedupe number shown by Veeam is the dedupe of blocks to other blocks in that same job, before the stream is sent to disk. So, Veeam is saying I'm getting 1.8X dedupe, but that is before taking into account ReFS block clone.

Windows also doesn't report any info on the dedupe either. The only real way to see the info in Windows is to manually check the drive space itself, then check the individual folders and compare. All files in Windows show their full size and do not report their dedupe size. They also show the size on disk as full size, even if the file is 100% deduped.

Once we manually calculated the size of the files and compared them to the size of the disk, we found that block clone is definitely working. Otherwise, my backups would already be consuming over 100% of my available disk space. I was unaware of the Blockclone utility until yesterday and ran it after the engineer helped me sort out what was happening. I thought I'd been pretty thorough in researching Veeam and ReFS, but had not heard of the utility until I talked to support. It took quite a long time to run, but was incredibly useful. I found the following:

We've stored ~80TB of data so far, but have only written about 25TB to disk. Over 50TB of data has deduped using ReFS block clone. So, we are only using about 1/3rd of the space that would be used without ReFS (Honestly, I'm surprised the ratio isn't even more favorable). Several TB of data is referenced 6 times (1 block written, 5 pointers to the block). As our backups continue that number will get higher and our dedupe ratio should improve further. I expect that by next February (when we reach 1 year of monthly backups in this repository) that we will be deduping 80% of our data and only storing ~20% on disk (quite possibly less).

R&D Forums

Backup copy job data deduplication.

Re: Backup copy job data deduplication.

Re: Backup copy job data deduplication.

Re: Backup copy job data deduplication.

Re: Backup copy job data deduplication.

Re: Backup copy job data deduplication.

Re: Backup copy job data deduplication.

Re: Backup copy job data deduplication.

Re: Backup copy job data deduplication.

Re: Backup copy job data deduplication.

Re: Backup copy job data deduplication.

Re: Backup copy job data deduplication.

Re: Backup copy job data deduplication.

Who is online