REFS performance issue "workaround"

Availability for the Always-On Enterprise

Re: REFS performance issue "workaround"

Veeam Logoby kubimike » Fri Jun 30, 2017 2:06 pm

@tom
Thanks for the explanation I just see a problem with that logic and heres why
But that's entirely what block clone is, each VBK is sharing any unchanged blocks from the prior VBK, when you delete the older VBK, that just means the blocks aren't shared anymore.

I create a new backup job set for 99 restore points It creates its first VBK, lets call this file BACKUP1.VBK. Time warp into the future, I now am up to 99 restore points, lets call this file BACKUP99.VBK. Are you saying file BACKUP99.VBK still has references to file BACKUP1.VBK ? If thats true it could never be deleted.

BACKUP1.VBK has blocks A1 B2 C3 D4, for simplicity we can say that BACKUP99.VBK has A2 B3 C4 D4, the only common block between BACKUP1 and BACKUP99 is D4. if Backup1.VBK was deleted due to retention where would BACKUP99.vbk reference block D4 ?
kubimike
Expert
 
Posts: 243
Liked: 23 times
Joined: Fri Feb 03, 2017 2:34 pm
Full Name: MikeO

Re: REFS performance issue "workaround"

Veeam Logoby tsightler » Fri Jun 30, 2017 2:45 pm 2 people like this post

kubimike wrote:I create a new backup job set for 99 restore points It creates its first VBK, lets call this file BACKUP1.VBK. Time warp into the future, I now am up to 99 restore points, lets call this file BACKUP99.VBK. Are you saying file BACKUP99.VBK still has references to file BACKUP1.VBK ? If thats true it could never be deleted.

BACKUP1.VBK has blocks A1 B2 C3 D4, for simplicity we can say that BACKUP99.VBK has A2 B3 C4 D4, the only common block between BACKUP1 and BACKUP99 is D4. if Backup1.VBK was deleted due to retention where would BACKUP99.vbk reference block D4 ?

ReFS tracks which specific cloned blocks are referenced down to the cluster size (64K for Veeam current best practice for ReFS). If BACKUP1.vbk and BACKUP99.vbk each have block D4, then block D4 would be referenced by every single backup 1-99. When you delete BACKUP1.vbk, ReFS uses the file metadata to determine what blocks are referenced by that file and only frees (deletes) blocks that have no more references in any other file. If you delete BACKUP1.VBK, only blocks that are totally unique to that file will actually be freed, but ReFS has to update the reference count for every block that is still used by any other backup file.

Perhaps an even simpler example:

BACKUP1 - A1 B1 C1 D1

Now I use a command line tool to block clone that file to a file called BACKUP2, so BACKUP2 is this:

BACKUP2 - A1 B1 C1 D1

In other words both BACKUP1 and BACKUP2 are each referencing the exact same blocks on disk, ReFS simply keeps track that each block is referenced more than once. That's where the space savings for synthetic fulls come from, the files are sharing the same blocks between each other, otherwise ReFS would have no space savings benefits for synthetic fulls over any other filesystem.

But that doesn't keep me from deleting the file BACKUP1. If I delete BACKUP1, it will free up exactly zero disk space, because BACKUP2 is still referencing all of those blocks, so ReFS could not free them. Deleting the file simply removes the file from the directory, and reduces the reference count for each block from 2 to 1 (previously 2 files referenced each block now only 1 file does). If you then also delete BACKUP2, the reference count for those blocks drops to 0, which means no files on the system are referencing them, so ReFS can free that space.

In simple terms, just because I use block clone to share data between backup files, doesn't mean I can't delete any file at any time, but when I do, ReFS has to update the reference count for every single block that is shared with other files. Actually, it's technically at the cluster level, so that 5TB file would require updating potentially 80,000,000 clusters worth of metadata when using 64K clusters.
tsightler
Veeam Software
 
Posts: 4841
Liked: 1787 times
Joined: Fri Jun 05, 2009 12:57 pm
Full Name: Tom Sightler

Re: REFS performance issue "workaround"

Veeam Logoby kubimike » Fri Jun 30, 2017 2:54 pm

YESSSSSSSSSSSS we are in total alignment now. A+ Now how to get all of this to work under the conditions I need :)
Now that I have my backup copies jiving I can test out that msft refs.sys and play with the 'RefsProcessedDeleteQueueEntryCountThreshold' + 'RefsDisableCachedpins(not sure on this one). Are you using this driver during your testing ?
kubimike
Expert
 
Posts: 243
Liked: 23 times
Joined: Fri Feb 03, 2017 2:34 pm
Full Name: MikeO

Re: REFS performance issue "workaround"

Veeam Logoby jazzoberoi » Sun Jul 02, 2017 11:42 pm

tsightler wrote:ReFS drastically changes the equation, since we don't have to move blocks, so the time is highly reduced, but it also reduces the benefit, since the new "chain" is still 100% dependent on blocks from the prior chain as it's sharing all of those blocks. This provides effectively no benefit from an integrity perspective, because the VBK you create today is still dependent on blocks from VBK files you created weeks/months ago.


HI Tsightler,

Is Reverse incremental a better option then ? Since ReFS does not have that bad a hit on merging anymore ?
jazzoberoi
Enthusiast
 
Posts: 64
Liked: 13 times
Joined: Wed Oct 08, 2014 9:07 am
Full Name: Jazz Oberoi

Re: REFS performance issue "workaround"

Veeam Logoby ian0x0r » Mon Jul 03, 2017 8:19 am

Not that you would, but I assume manually deleting VBK files via windows explorer, in this example, would also keep track of the block clones?

Ian
https://www.snurf.co.uk
ian0x0r
Veeam Vanguard
 
Posts: 189
Liked: 32 times
Joined: Thu Nov 11, 2010 11:53 am
Location: UK
Full Name: Ian Sanderson

Re: REFS performance issue "workaround"

Veeam Logoby tdewin » Mon Jul 03, 2017 9:28 am

once the block clone call is made, the filesystem is in control. So yes, if you would delete a file, it is ReFS its responsibility to track block usage by different files
tdewin
Veeam Software
 
Posts: 1091
Liked: 380 times
Joined: Fri Mar 02, 2012 1:40 pm
Full Name: Timothy Dewin

Re: REFS performance issue "workaround"

Veeam Logoby kubimike » Mon Jul 03, 2017 1:55 pm

here is something I just thought of. Using the BACKUP1 - BACKUP99 example again. Lets just say this took 3 months to get to BACKUP99. BACKUP1 being the active full, and everything up `till backup 99 has been incrementals and synthetic fulls. Being that a majority of the blocks reference BACKUP1, what happens when another active full starts lets with BACKUP100. Going forward will the filesystem map and new backups IE BACKUP101 - BACKUP1xx to the new active full created at BACKUP100 ?
kubimike
Expert
 
Posts: 243
Liked: 23 times
Joined: Fri Feb 03, 2017 2:34 pm
Full Name: MikeO

Re: REFS performance issue "workaround"

Veeam Logoby tsightler » Mon Jul 03, 2017 3:19 pm

kubimike wrote:Now that I have my backup copies jiving I can test out that msft refs.sys and play with the 'RefsProcessedDeleteQueueEntryCountThreshold' + 'RefsDisableCachedpins(not sure on this one). Are you using this driver during your testing ?


My testing has been focused almost exclusively on consistently reproducing the issue on standard Windows without any private hotfixes. Only once we consistently reproduce the issue can we truly test if the available fixes, and what specific settings, actually help the problem. Unfortunately, this has been a far more elusive goal than one would hope.
tsightler
Veeam Software
 
Posts: 4841
Liked: 1787 times
Joined: Fri Jun 05, 2009 12:57 pm
Full Name: Tom Sightler

Re: REFS performance issue "workaround"

Veeam Logoby tsightler » Mon Jul 03, 2017 3:41 pm

jazzoberoi wrote:Is Reverse incremental a better option then ? Since ReFS does not have that bad a hit on merging anymore ?


Very limited testing with reverse incremental but, at least in theory, I would expect it to be better, simply because it only deletes a single set of restore points daily, and those points are small. To this point, the single most reliable mode for ReFS seems to be forever forward incremental, with a regular maintenance scheduled. Forever forward makes the minimum use of block clone (only for accelerating the merge processes of the oldest VIB into the VBK), and thus has the minimum amount of data that has to be persistently tracked by the ReFS filesystem.
tsightler
Veeam Software
 
Posts: 4841
Liked: 1787 times
Joined: Fri Jun 05, 2009 12:57 pm
Full Name: Tom Sightler

Re: REFS performance issue "workaround"

Veeam Logoby tsightler » Mon Jul 03, 2017 3:44 pm

kubimike wrote:here is something I just thought of. Using the BACKUP1 - BACKUP99 example again. Lets just say this took 3 months to get to BACKUP99. BACKUP1 being the active full, and everything up `till backup 99 has been incrementals and synthetic fulls. Being that a majority of the blocks reference BACKUP1, what happens when another active full starts lets with BACKUP100. Going forward will the filesystem map and new backups IE BACKUP101 - BACKUP1xx to the new active full created at BACKUP100 ?


Since the active full would contain new copies of every block, future synthetics would reference blocks in the new active full. While it may seem obvious, I want to make sure to note that running an active full requires the additional space to store these new fulls since they will not share any blocks with the prior backups on disk.
tsightler
Veeam Software
 
Posts: 4841
Liked: 1787 times
Joined: Fri Jun 05, 2009 12:57 pm
Full Name: Tom Sightler

Re: REFS performance issue "workaround"

Veeam Logoby DaveWatkins » Mon Jul 03, 2017 5:14 pm

This is something I've wondered about a few times. It's good to know an active full doesn't use block cloning at all
DaveWatkins
Expert
 
Posts: 271
Liked: 68 times
Joined: Sun Dec 13, 2015 11:33 pm

Re: REFS performance issue "workaround"

Veeam Logoby dellock6 » Tue Jul 11, 2017 8:00 am

Back from vacation, thanks to everyone who posted his repository setup, keep them coming!

Thanks,
Luca
Luca Dell'Oca
EMEA Cloud Architect @ Veeam Software

@dellock6
http://www.virtualtothecore.com
vExpert 2011-2012-2013-2014-2015-2016
Veeam VMCE #1
dellock6
Veeam Software
 
Posts: 5117
Liked: 1360 times
Joined: Sun Jul 26, 2009 3:39 pm
Location: Varese, Italy
Full Name: Luca Dell'Oca

Re: REFS performance issue "workaround"

Veeam Logoby Delo123 » Tue Jul 11, 2017 8:26 am

We do synthetic fulls every week and active fulls every month, maybe that's why we did not see any performance issues until now, even not when we delete files. Currently 180TB allocated on the ReFS repository
Delo123
Expert
 
Posts: 351
Liked: 101 times
Joined: Fri Dec 28, 2012 5:20 pm
Full Name: Guido Meijers

Re: REFS performance issue "workaround"

Veeam Logoby mkretzer » Tue Jul 11, 2017 8:39 am

@Delo123
Very interesting. Do you use per-VM? What is your biggtest backup file?
I find it interesting that such a big repo does not show the same issues we have with a similar big repo...

Especially with you active fulls: Do you have merges running at the same time as the active fulls? In our case that lead to active fulls running at 1-Digit MB/s values.
mkretzer
Expert
 
Posts: 337
Liked: 74 times
Joined: Thu Dec 17, 2015 7:17 am

Re: REFS performance issue "workaround"

Veeam Logoby Delo123 » Tue Jul 11, 2017 9:52 am

Yes, per VM jobs. Biggest Vbk is 6.8TB. I scheduled everything so active fulls and synthetic fulls never run at the same day. Then again we only have 250 Vm's thus only 8 Jobs. Speeds vary especially with incremental runs without much changed data troughput is average, but always over 200MB/s.
Yesterday I did see a "Backup files health check has been completed) with a red x and a duration of 3 hours on one of the big backup files with the error being "ChannelError: ConnectionReset", currently investigation by what that is caused... This morning however the same backup file heatlh check completed in 6 hours.
Delo123
Expert
 
Posts: 351
Liked: 101 times
Joined: Fri Dec 28, 2012 5:20 pm
Full Name: Guido Meijers

PreviousNext

Return to Veeam Backup & Replication



Who is online

Users browsing this forum: Bing [Bot], Exabot [Bot], mdanzheev, Stoo, Vitaliy S., Yahoo [Bot] and 77 guests