- 
				acolad
- Novice
- Posts: 8
- Liked: 1 time
- Joined: Mar 08, 2017 6:40 am
- Full Name: Hervé Moser
- Contact:
Deduplication - Best Veeam settings for ZFS (FreeNAS)
Hello community,
It's a great day ! Management is about to order the full Availability Suite as our trial licence expires in about 12 days and our backups run smoothly since a couple of weeks. So much more smooth than with prior solutions we had but that's another story... THANK YOU Veaam !
I will start with a TL/DR but will give more information regarding our infrastructure later in this post.
As said, our backup strategy now more or less became mature (backups, backup copies and all that...) but I still have one big question open :
What are the best Veeam settings one can configure to deduplicate Veeam backups of VMware VMs the most efficient way on ZFS storage backup repository ?
Background to this question is that I currently notice a really ridiculous dedup ratio (1.03:1) for the about 10 VMs we backup (for the time being, we have around 70 to backup once fully in production). We do forward incremental with weekly synthetic, but I naturally first tried to get to a more pleasant dedup ratio before by doing several "Active Fulls", sadly not getting to the dedup ratio I was attending.
We had our partner on-site on day 1 to help us tweak Veeam with the best possible settings and this is what we have by now :
1- Completely Disabled Veeam compression (in jobs and also on the repo in case we should forget to disable it in future jobs we configure) as FreeNAS also compresses the data. Looking at different posts, I plan to switch to "dedupe friendly" instead.
2- Disabled Veeam's inline data deduplication as FreeNAS's ZFS does dedup itself. Here I also plan to re-enable Veeam's dedup to take advantage of "source-dedup" (less traffic and so on...)
3- Left FreeNAS pretty to its default, i.e. default blocksize (means variable blocksize)
I must say I'm pretty sadly surprised that two or three consecutive "Active Fulls" didn't achieve to get us a decent dedup ratio. ZFS dedup on FreeNAS is quite efficient with what we experience on another FreeNAS box we also use for backup purposes but there exclusively using scripts like Robocopy or the like. It seems there's something here we aren't doing right, and I'd gladly welcome any advice you could give us in this regard.
Regarding the details of our infrastructure :
- Primary storage is located on 4 ESXi with SimpliVity hyperconverged storage, roughly 50TB and 1TB RAM, vSphere Enterprise Plus
- The new FreeNAS we installed a month ago currently holds 12x 10TB NL-SAS disks in two striped RAIDZ2 vDevs of 6 disks, thus around 70TB of usable space (until we fill the remaining 12 slots with other disks once needed) and 256GB RAM
- All these devices talk to each other via 10GB optical links and CISCO Nexus top-of-rack switches
I'd like to add that we are well aware of the limitations of ZFS-dedup and particularly of the tremendous amounts of RAM it may need to perform well. The dedup-table (DDT) is something we will definitely keep an eye on. It may very well be that the DDT will grow out of control before we'll have the chance to add more disks, but it's so hard to predict that we really wan't to give it a try for the time being.
So we're currently really looking for some advice regarding the best possible settings (mainly points 1, 2 and 3 above) we could configure to leverage ZFS-dedup to the maximum. I naturally found some answers regarding DataDomain (we also had one 3 years ago but not anymore now) and ExaGrid but didn't find any really precise input regarding FreeNAS or even ZFS in general. And I think this info could potentially also benefit other users as FreeNAS definitely did great here with other workloads than Veeam. Thus I'm convinced it will also do with Veeam once we got to the right configuration.
Thanks a lot for reading and maybe also for your possible inputs !
As we will soon have our paid Veeam subscription ready, I'll perhaps also open a ticket regarding this question, but I thought asking the community won't harm...
Cheers
Hervé
			
			
									
						
										
						It's a great day ! Management is about to order the full Availability Suite as our trial licence expires in about 12 days and our backups run smoothly since a couple of weeks. So much more smooth than with prior solutions we had but that's another story... THANK YOU Veaam !
I will start with a TL/DR but will give more information regarding our infrastructure later in this post.
As said, our backup strategy now more or less became mature (backups, backup copies and all that...) but I still have one big question open :
What are the best Veeam settings one can configure to deduplicate Veeam backups of VMware VMs the most efficient way on ZFS storage backup repository ?
Background to this question is that I currently notice a really ridiculous dedup ratio (1.03:1) for the about 10 VMs we backup (for the time being, we have around 70 to backup once fully in production). We do forward incremental with weekly synthetic, but I naturally first tried to get to a more pleasant dedup ratio before by doing several "Active Fulls", sadly not getting to the dedup ratio I was attending.
We had our partner on-site on day 1 to help us tweak Veeam with the best possible settings and this is what we have by now :
1- Completely Disabled Veeam compression (in jobs and also on the repo in case we should forget to disable it in future jobs we configure) as FreeNAS also compresses the data. Looking at different posts, I plan to switch to "dedupe friendly" instead.
2- Disabled Veeam's inline data deduplication as FreeNAS's ZFS does dedup itself. Here I also plan to re-enable Veeam's dedup to take advantage of "source-dedup" (less traffic and so on...)
3- Left FreeNAS pretty to its default, i.e. default blocksize (means variable blocksize)
I must say I'm pretty sadly surprised that two or three consecutive "Active Fulls" didn't achieve to get us a decent dedup ratio. ZFS dedup on FreeNAS is quite efficient with what we experience on another FreeNAS box we also use for backup purposes but there exclusively using scripts like Robocopy or the like. It seems there's something here we aren't doing right, and I'd gladly welcome any advice you could give us in this regard.
Regarding the details of our infrastructure :
- Primary storage is located on 4 ESXi with SimpliVity hyperconverged storage, roughly 50TB and 1TB RAM, vSphere Enterprise Plus
- The new FreeNAS we installed a month ago currently holds 12x 10TB NL-SAS disks in two striped RAIDZ2 vDevs of 6 disks, thus around 70TB of usable space (until we fill the remaining 12 slots with other disks once needed) and 256GB RAM
- All these devices talk to each other via 10GB optical links and CISCO Nexus top-of-rack switches
I'd like to add that we are well aware of the limitations of ZFS-dedup and particularly of the tremendous amounts of RAM it may need to perform well. The dedup-table (DDT) is something we will definitely keep an eye on. It may very well be that the DDT will grow out of control before we'll have the chance to add more disks, but it's so hard to predict that we really wan't to give it a try for the time being.
So we're currently really looking for some advice regarding the best possible settings (mainly points 1, 2 and 3 above) we could configure to leverage ZFS-dedup to the maximum. I naturally found some answers regarding DataDomain (we also had one 3 years ago but not anymore now) and ExaGrid but didn't find any really precise input regarding FreeNAS or even ZFS in general. And I think this info could potentially also benefit other users as FreeNAS definitely did great here with other workloads than Veeam. Thus I'm convinced it will also do with Veeam once we got to the right configuration.
Thanks a lot for reading and maybe also for your possible inputs !
As we will soon have our paid Veeam subscription ready, I'll perhaps also open a ticket regarding this question, but I thought asking the community won't harm...
Cheers
Hervé
- 
				DonZoomik
- Service Provider
- Posts: 378
- Liked: 124 times
- Joined: Nov 25, 2016 1:56 pm
- Full Name: Mihkel Soomere
- Contact:
Re: Deduplication - Best Veeam settings for ZFS (FreeNAS)
We used ZFS with dedupe for a while and it was not a good experience. We had 256G RAM for 120T repository (with some quite big SSDs for metadata caching) and it performed just badly. Initial performance was tolerable but at one point it dropped off a cliff. We played around with various flags a lot (rebuilding several times) but performance never recovered to a tolerable level. And at this dropoff point, dedup tables still comfortably fit in memory.
One thing that really killed it, was file delete performance with very big files (multi-TB). It essentially kills the ZFS machine for hours, blocking almost all IO.
On hardware side, it had 16 cores of low-midrange broadwell, 24*8T SAS disks in 3*8 RAIDZ2. i didn't administrate the system but I was heavily involved in tuning.
IMHO if you want cheap more-less reliable dedupe, get Windows 2016 with NTFS. It has it's limitations (4T file limit- anything over that is skipped from processing, 64T file system limit - just create multiple and a SOBR). RAM requirements are much better (0,3G per T minimum documented, and it works reasonably at that), performance is usually better (as a virtue of post-processing), dedupe ratio is really good (variable block size), however you need more buffer space (as a con of post-processing). We seriously considered turning the ZFS machine to Windows but resorted to compress-only ZFS (with forever-incremental) for various reasons.
			
			
									
						
										
						One thing that really killed it, was file delete performance with very big files (multi-TB). It essentially kills the ZFS machine for hours, blocking almost all IO.
On hardware side, it had 16 cores of low-midrange broadwell, 24*8T SAS disks in 3*8 RAIDZ2. i didn't administrate the system but I was heavily involved in tuning.
IMHO if you want cheap more-less reliable dedupe, get Windows 2016 with NTFS. It has it's limitations (4T file limit- anything over that is skipped from processing, 64T file system limit - just create multiple and a SOBR). RAM requirements are much better (0,3G per T minimum documented, and it works reasonably at that), performance is usually better (as a virtue of post-processing), dedupe ratio is really good (variable block size), however you need more buffer space (as a con of post-processing). We seriously considered turning the ZFS machine to Windows but resorted to compress-only ZFS (with forever-incremental) for various reasons.
- 
				acolad
- Novice
- Posts: 8
- Liked: 1 time
- Joined: Mar 08, 2017 6:40 am
- Full Name: Hervé Moser
- Contact:
Re: Deduplication - Best Veeam settings for ZFS (FreeNAS)
Thank you for your input DonZoomik !  It's not really reassuring, but at least very informative. I fear the worst now...
We also gave Windows-NTFS-dedupe a try back with Windows 2012 R2, but at that time we didn't really like the post-processing side of things when we were successfully running a FreeNAS with inline-dedupe. In the worst case we'll naturally also have to switch to compression-only ZFS, but the 256GB RAM weren't purchased with that in mind if you see what I mean.
Fact is, aside from the backups, this repo also should provide archiving as long as possible (to be determined by the test results). Dedup would therefore have come in handy.
Is there maybe anyone else running a deduped ZFS repo (more) or less successfully ?
Thanks DonZoomik ! Your information is much appreciated.
			
			
									
						
										
						We also gave Windows-NTFS-dedupe a try back with Windows 2012 R2, but at that time we didn't really like the post-processing side of things when we were successfully running a FreeNAS with inline-dedupe. In the worst case we'll naturally also have to switch to compression-only ZFS, but the 256GB RAM weren't purchased with that in mind if you see what I mean.
Fact is, aside from the backups, this repo also should provide archiving as long as possible (to be determined by the test results). Dedup would therefore have come in handy.
Is there maybe anyone else running a deduped ZFS repo (more) or less successfully ?
Thanks DonZoomik ! Your information is much appreciated.
- 
				DonZoomik
- Service Provider
- Posts: 378
- Liked: 124 times
- Joined: Nov 25, 2016 1:56 pm
- Full Name: Mihkel Soomere
- Contact:
Re: Deduplication - Best Veeam settings for ZFS (FreeNAS)
Some more memories:
At my previous employer, I did initial Veeam PoC on Windows 2012R2 with NTFS dedupe (on an old SAN). It was considered a great success performance-wise, we could easily saturate FC links during backup windows. Restore was somewhat slower but still good. Single-treaded dedupe processing throughput was an issue, but with reasonable buffer space it was manageable.
Now I'm on iSCSI and 10G ethernet (Multipathing etc) and we could barely hit 300MB7s with dedupe, when performance was still good. I don't remember the details but performance dropped when pool was maybe 1/3 full. And it dropped to less than 100MB/s that was simply unacceptable. Compression-only resulted in saturated 10G links.
Lack of defrag (perhaps exaggerated by forever-incremental merges) has resulted in fragmentation on ZFS pools over a year - at least I think so. Backup validation and restore has become much slower over time (still acceptable though). IO profile looks like random with large blocks. Merge has also slowed down and seems read-limited.
We used an old Synology over iSCSI with WS2016 frontend during ZFS rebuilds as temporary space. It was network limited (4*1G if I remember correctly) but in practice worked much more predictably performance-wise.
I'd say - create 2 pools. One for archive where performance doesn't matter. And another for Veeam with possibly no compression at all. I think Veeam compression could be better (larger blocks). If you have multiple proxies, it could also be more scalable and save network bandwidth, if it is a problem. No facts, just opinions.
			
			
									
						
										
						At my previous employer, I did initial Veeam PoC on Windows 2012R2 with NTFS dedupe (on an old SAN). It was considered a great success performance-wise, we could easily saturate FC links during backup windows. Restore was somewhat slower but still good. Single-treaded dedupe processing throughput was an issue, but with reasonable buffer space it was manageable.
Now I'm on iSCSI and 10G ethernet (Multipathing etc) and we could barely hit 300MB7s with dedupe, when performance was still good. I don't remember the details but performance dropped when pool was maybe 1/3 full. And it dropped to less than 100MB/s that was simply unacceptable. Compression-only resulted in saturated 10G links.
Lack of defrag (perhaps exaggerated by forever-incremental merges) has resulted in fragmentation on ZFS pools over a year - at least I think so. Backup validation and restore has become much slower over time (still acceptable though). IO profile looks like random with large blocks. Merge has also slowed down and seems read-limited.
We used an old Synology over iSCSI with WS2016 frontend during ZFS rebuilds as temporary space. It was network limited (4*1G if I remember correctly) but in practice worked much more predictably performance-wise.
I'd say - create 2 pools. One for archive where performance doesn't matter. And another for Veeam with possibly no compression at all. I think Veeam compression could be better (larger blocks). If you have multiple proxies, it could also be more scalable and save network bandwidth, if it is a problem. No facts, just opinions.
- 
				acolad
- Novice
- Posts: 8
- Liked: 1 time
- Joined: Mar 08, 2017 6:40 am
- Full Name: Hervé Moser
- Contact:
Re: Deduplication - Best Veeam settings for ZFS (FreeNAS)
Thank you DonZoomik for your further insights on that matter !  We definitely plan to segregate backups from archives to mitigate the possible issues you explained and that very well also might arise here.
For the time being, we're nevertheless in the process of clarifying which Veeam parameters would be best to allow for satisfactory ZFS deduplication ratio. Isn't there some fact-sheet that could define if Veeam dedup is better left on or off and which compression should be chosen to leverage ZFS's inline deduplication mechanism ?
Thanks for the help and wishing a pleasant day.
			
			
									
						
										
						For the time being, we're nevertheless in the process of clarifying which Veeam parameters would be best to allow for satisfactory ZFS deduplication ratio. Isn't there some fact-sheet that could define if Veeam dedup is better left on or off and which compression should be chosen to leverage ZFS's inline deduplication mechanism ?
Thanks for the help and wishing a pleasant day.
- 
				DonZoomik
- Service Provider
- Posts: 378
- Liked: 124 times
- Joined: Nov 25, 2016 1:56 pm
- Full Name: Mihkel Soomere
- Contact:
Re: Deduplication - Best Veeam settings for ZFS (FreeNAS)
I poked my grey matter a bit more about your questions and these were results of tests:
ZFS deduplication is fixed block and *very* sensitive to block boundary alignment. Due to fixed block size, to achieve reasonable dedupe ratios, you should limit/lock ZFS block size to something like 16K or less (ideally 4K). Variable block size makes it even worse, as same data can be represented in (example, worst case) 4*4K or 1*16K (depending on how it's written) - however dedupe doesn't see them as same. If for example 16K block shifts in backup data, you could have 2 new unique 128K blocks in ZFS. These are just examples on how ZFS will bite you.
However lowering block size limit will blow up dedupe tables badly - we stuck to 128K in the end just to fit in memory. I think Gostev recommended maximum of 16K as well in some long forgotten thread. 4K will make inter-VM dedupe a real option (realistically mitigating hard-alignment problem), however at huge unrealistic memory cost.
On Veeam side, Veeam dedupe size doesn't really matter unless you're using larger than 128K ZFS blocks. Last time I checked, 1M blocks were experimental. However, smallest Veeam dedup block ise 256K, so it's usually irrelevant.
In repository settings set:
Align backup file data blocks - enabled - mitigates alignment issues so Veeam can write data in larger better aligned blocks.
Decompress backup data blocks before storing - enabled - any compression will mess hard with ZFS dedupe, even RLE
Use per-VM backup files - enabled - mitigates file delete blocking problem somewhat
Still, you'll get nowhere close to dedupe ratio of NTFS or appliances unless you heavily limit ZFS block sizes (thus blowing up memory requirements).
			
			
									
						
										
						ZFS deduplication is fixed block and *very* sensitive to block boundary alignment. Due to fixed block size, to achieve reasonable dedupe ratios, you should limit/lock ZFS block size to something like 16K or less (ideally 4K). Variable block size makes it even worse, as same data can be represented in (example, worst case) 4*4K or 1*16K (depending on how it's written) - however dedupe doesn't see them as same. If for example 16K block shifts in backup data, you could have 2 new unique 128K blocks in ZFS. These are just examples on how ZFS will bite you.
However lowering block size limit will blow up dedupe tables badly - we stuck to 128K in the end just to fit in memory. I think Gostev recommended maximum of 16K as well in some long forgotten thread. 4K will make inter-VM dedupe a real option (realistically mitigating hard-alignment problem), however at huge unrealistic memory cost.
On Veeam side, Veeam dedupe size doesn't really matter unless you're using larger than 128K ZFS blocks. Last time I checked, 1M blocks were experimental. However, smallest Veeam dedup block ise 256K, so it's usually irrelevant.
In repository settings set:
Align backup file data blocks - enabled - mitigates alignment issues so Veeam can write data in larger better aligned blocks.
Decompress backup data blocks before storing - enabled - any compression will mess hard with ZFS dedupe, even RLE
Use per-VM backup files - enabled - mitigates file delete blocking problem somewhat
Still, you'll get nowhere close to dedupe ratio of NTFS or appliances unless you heavily limit ZFS block sizes (thus blowing up memory requirements).
- 
				DonZoomik
- Service Provider
- Posts: 378
- Liked: 124 times
- Joined: Nov 25, 2016 1:56 pm
- Full Name: Mihkel Soomere
- Contact:
Re: Deduplication - Best Veeam settings for ZFS (FreeNAS)
I looked up some old blog posts for reference.DonZoomik wrote:Still, you'll get nowhere close to dedupe ratio of NTFS or appliances unless you heavily limit ZFS block sizes (thus blowing up memory requirements).
Dedupe efficieny at different block sizes: https://trae.sk/view/26/ https://trae.sk/view/33/
However at example block size of 2K, you'd need ~11TB(!) of memory for 70TB pool. (320B per 2K). At 128K it'd be 175G but dedupe ratio will be quite bad.
Veeam with NTFS dedupe: https://www.craigrodgers.co.uk/index.ph ... on-part-3/
I remember similar results with our 2012R2 POC that had for irrelevant reason 3*16T pools (we used 64K uncompressed , make sure to look at all semibroken PowerBI graphs) - NTFS deduplication engine is quite good, even with compressed data ("optimized" compression results) and you turn some knobs to make it process data more agressively.
I talked to some fellow engineers who also worked on this project and consensus was: don't try it, you're very likely to hit serious problems - if not immediatly then a few weeks or months in. Fellow engineer reminded me that we had also one integrity problem where we needed to force-mount volume just to evac data (don't remember the underlying issue) and again pointed (that I wrote about previously) at blocking (deleting 4T file would almost completely block filesystem for 4-5 hours). It might have been related to our setup (Debian 9 with ZoL 0.7.something) but overall filesystem is the same so I doubt it.
Again we might have been stupid or poking the wrong buttons and some consultant might know how to work with it. But from my experience, for Veeam I strongly reccommend to avoid ZFS dedup (compression-only is tolerable) or go with applicance (StoreOnce/Data Domain...) or NTFS.
We do still use dedupe successfully but for completely different applications (almost static archive pools with near-zero performance requirements, just as you're planning).
- 
				acolad
- Novice
- Posts: 8
- Liked: 1 time
- Joined: Mar 08, 2017 6:40 am
- Full Name: Hervé Moser
- Contact:
Re: Deduplication - Best Veeam settings for ZFS (FreeNAS)
Sorry for the late response. We're currently super-busy with different projets going on. You know what it's like...
Thanks a lot for the extensive research and the interesting links you posted DonZoomik ! I completely understand the point you discussed with your fellow engineer and can only agree. The big game-changing info you underline is that I really wasn't aware that ZFS deduplication happens on fixed block length. I always had thought it was variable, but your links naturally confirm your sayings.
In the meantime, we had our Veeam configured exactly the way you described in your second last post. Now I'm waiting for some "Fulls" (active or incremental) to see if that finally makes any difference but I doubt it. The "issue" is elsewhere how you described it very clearly.
Anyway, we indeed plan to have our "landing zone" (for Veeam backups) on a dataset without deduplication and will see what happens with the "archive zone" (for BackupCopy jobs) being deduped. I think I will give a try to changing the ZFS record size just to get familiar with the consequences, but we'll indeed plan our strategy according to your different pointers.
What's sure is that we now got a much more precise overview of the Veeam options to use in our setup ! Thanks you very much for all the very appreciated help in that matter ! I'll keep this post updated with our findings in a couple of weeks or months. Now it's soon holiday time and that will also do some good !!!
Thanks ya all !!!
			
			
									
						
										
						Thanks a lot for the extensive research and the interesting links you posted DonZoomik ! I completely understand the point you discussed with your fellow engineer and can only agree. The big game-changing info you underline is that I really wasn't aware that ZFS deduplication happens on fixed block length. I always had thought it was variable, but your links naturally confirm your sayings.
In the meantime, we had our Veeam configured exactly the way you described in your second last post. Now I'm waiting for some "Fulls" (active or incremental) to see if that finally makes any difference but I doubt it. The "issue" is elsewhere how you described it very clearly.
Anyway, we indeed plan to have our "landing zone" (for Veeam backups) on a dataset without deduplication and will see what happens with the "archive zone" (for BackupCopy jobs) being deduped. I think I will give a try to changing the ZFS record size just to get familiar with the consequences, but we'll indeed plan our strategy according to your different pointers.
What's sure is that we now got a much more precise overview of the Veeam options to use in our setup ! Thanks you very much for all the very appreciated help in that matter ! I'll keep this post updated with our findings in a couple of weeks or months. Now it's soon holiday time and that will also do some good !!!
Thanks ya all !!!
- 
				capitangiaco
- Novice
- Posts: 4
- Liked: 1 time
- Joined: Jan 29, 2013 7:16 am
- Full Name: Giacomo Marconi
- Contact:
Re: Deduplication - Best Veeam settings for ZFS (FreeNAS)
I needed a long term repo (third copy), I started with freenas but i shifted to linux+zfs using only zfs compression.
zfs dedup destroy the performance (in my case speed drops from 300MB/s to 10MB/s), the maximum discomfort happens when you try to delete big deduplicated backups, the system become stuck for a long time.
And, in my case, while Veeam can reach deduplication factor like 3x, zfs dedup can reach only 1.02x
PS: i shifted to linux cause veeam datamover doesn't like samba.
			
			
									
						
										
						zfs dedup destroy the performance (in my case speed drops from 300MB/s to 10MB/s), the maximum discomfort happens when you try to delete big deduplicated backups, the system become stuck for a long time.
And, in my case, while Veeam can reach deduplication factor like 3x, zfs dedup can reach only 1.02x
PS: i shifted to linux cause veeam datamover doesn't like samba.
- 
				acolad
- Novice
- Posts: 8
- Liked: 1 time
- Joined: Mar 08, 2017 6:40 am
- Full Name: Hervé Moser
- Contact:
Re: Deduplication - Best Veeam settings for ZFS (FreeNAS)
I can only second capitangiaco's point about zfs dedupe completely destroying performance. Dunno if this should be attributed to FreeNAS solely or if it's ZFS in general, but it's gently come to a point where I'm wondering why this option even exists and keeps maintained. We disabled dedupe months ago when our box was almost crowling, even not able to manage a single backup stream !  And the huge amount of RAM it's claimed everywhere you should have available for ZFS dedupe to work alright wasn't the culprit here, for sure. We have 256GB RAM in our FreeNAS and our dedupe-table wasn't even 60GB before we nearly had to destroy the volume starting from scratch with dedupe OFF. What a pity !
			
			
									
						
										
						Who is online
Users browsing this forum: Baidu [Spider] and 16 guests