ReFS Summary...

Availability for the Always-On Enterprise

ReFS Summary...

Veeam Logoby SyNtAxx » Thu Jul 06, 2017 2:11 pm

I've been following the threads on ReFS as best I can. I understand the filesystem should be formatted at 64kb clusters but there appeared to be additional issues still when using large clusters.

Can we get a complete summary of the ReFS situation? Is it stable enough to use ? Is there a 'fix' ?

thanks,

Nick
SyNtAxx
Expert
 
Posts: 127
Liked: 13 times
Joined: Fri Jan 02, 2015 7:12 pm

Re: ReFS Summary...

Veeam Logoby Gostev » Thu Jul 06, 2017 9:03 pm 1 person likes this post

Hi, Nick

Indeed, it seems stable enough for most users with 64KB volumes. Now we can be sure about that, because after nearly two months of purposely killing multiple ReFS repositories in our labs with dozens of continuous jobs producing and deleting hundreds of restore points, and a specially created Clonezilla tool on top of those, it's been stable for all of our test backup repositories but one. This is all without any private fixes or tweaks, just vanilla Windows Server 2016 with all updates.

Even the repository for which we've finally reproduced the issue has been working fine for weeks, until a few days ago. Which is a good news, as this finally lets us analyze one, trying to understand what is different about this particular repository - as well as test some possible workarounds we can implement from our side (we have some ideas on what can potentially help). Of course, our main hope is that Microsoft fixes this on their side - I know they've been working hard investigating this one (I asked them for another update a few days ago).

The issue for sure happens only during the retention processing, when backup files are being deleted from the disk - some ReFS metadata update operation seems to be "long running" and preventing other I/O to the same volume - this is the essence of the issue.

One recommendation we can give based on our observations so far is to avoid scheduling synthetic fulls too often (or disable them completely), and don't use per-VM chains. Both measures allow to reduce the amount of files with cloned blocks that are deleted at once. In fact, one of the workarounds we're testing right now is simply "throttling" backup file deletions by adding a timeout after issuing each file deletion command. And I've already heard that the first results were promising.

Thanks!
Gostev
Veeam Software
 
Posts: 21364
Liked: 2336 times
Joined: Sun Jan 01, 2006 1:01 am
Full Name: Anton Gostev

Re: ReFS Summary...

Veeam Logoby kubimike » Fri Jul 07, 2017 3:08 am

only way I was able to fix retention processing was with the private fix. Not sure how you got it to work otherwise.
kubimike
Expert
 
Posts: 229
Liked: 22 times
Joined: Fri Feb 03, 2017 2:34 pm
Full Name: MikeO

Re: ReFS Summary...

Veeam Logoby mkretzer » Fri Jul 07, 2017 8:11 am

What do i have to do to get the private fix from MS? Is there a ticket number i can tell them?
mkretzer
Expert
 
Posts: 304
Liked: 67 times
Joined: Thu Dec 17, 2015 7:17 am

Re: ReFS Summary...

Veeam Logoby kubimike » Fri Jul 07, 2017 11:30 am

Veeam has it. Start a ticket in their Helpdesk.
kubimike
Expert
 
Posts: 229
Liked: 22 times
Joined: Fri Feb 03, 2017 2:34 pm
Full Name: MikeO

Re: ReFS Summary...

Veeam Logoby mkretzer » Fri Jul 07, 2017 2:59 pm

@kubimike No. I did just that and they told me i have to get the patch from MS, they do not have the hotfix. Case 02179620.
mkretzer
Expert
 
Posts: 304
Liked: 67 times
Joined: Thu Dec 17, 2015 7:17 am

Re: ReFS Summary...

Veeam Logoby nmdange » Fri Jul 07, 2017 4:33 pm

One thing I'm curious about... if you enable periodic synthetic fulls and also enable the option "transform previous backup chains into rollbacks" is that transform process accelerated by ReFS? And if so, would that improve the issue with deletes like it does with forward-forever incremental? With the previous chain converted into rollbacks, a single rollback is deleted every day instead of an entire chain once a week.
nmdange
Expert
 
Posts: 180
Liked: 53 times
Joined: Thu Aug 20, 2015 9:30 pm

Re: ReFS Summary...

Veeam Logoby Gostev » Fri Jul 07, 2017 10:09 pm

Yes, it is accelerated by ReFS.
Gostev
Veeam Software
 
Posts: 21364
Liked: 2336 times
Joined: Sun Jan 01, 2006 1:01 am
Full Name: Anton Gostev

Re: ReFS Summary...

Veeam Logoby mkretzer » Sat Jul 08, 2017 6:41 am

@Gostev but it would be even worse for the merge and delete issues, correct? The full backup file will still have to be deleted after the modification and there is even more load on the REFS.
mkretzer
Expert
 
Posts: 304
Liked: 67 times
Joined: Thu Dec 17, 2015 7:17 am

Re: ReFS Summary...

Veeam Logoby Gostev » Sat Jul 08, 2017 7:27 pm

There's no "merge issues" that I am aware of, and the "delete issue" does not seem to happen from any deletion - but rather only when the job has to delete multiple backup files at once according to retention.
Gostev
Veeam Software
 
Posts: 21364
Liked: 2336 times
Joined: Sun Jan 01, 2006 1:01 am
Full Name: Anton Gostev

Re: ReFS Summary...

Veeam Logoby kubimike » Sat Jul 08, 2017 8:23 pm

Wrong.
kubimike
Expert
 
Posts: 229
Liked: 22 times
Joined: Fri Feb 03, 2017 2:34 pm
Full Name: MikeO

Re: ReFS Summary...

Veeam Logoby Gostev » Sat Jul 08, 2017 10:04 pm

I've certainly been wrong before, but this is what we're seeing in our lab where the issue reproduces consistently. Even adding 1 second sleep between individual back file deletions solves the issue when deleting the same bunch of files. And this is in the clean test (before each test, we roll back the repository server to a snapshot containing the state of reliable issue reproduction in case of mass backup file deletion). As the next step, we will validate this workaround with a couple of users our support has been closely engaged with, and I will update.
Gostev
Veeam Software
 
Posts: 21364
Liked: 2336 times
Joined: Sun Jan 01, 2006 1:01 am
Full Name: Anton Gostev

Re: ReFS Summary...

Veeam Logoby kubimike » Sat Jul 08, 2017 11:20 pm

In my real world lab I couldn't delete one file from a job that contained 5TB VBKs. Mind you, I had my retention set to (200) Everyday I would have to extend my retention period out another few digits to prevent a retention cleanup from happening. Thankfully I had plenty of disk space or else it would have all been over. That all changed last week when I loaded that test refs driver. Im now back peddling my job, everyday I reduce it by 2 retention points to clean up my disk and get my space back. I thought I was going to lose my backups again either by running out of space or having to switch to NTFS. Fix couldn't come soon enough for me.

In case you missed it this will fill in an questions https://forums.veeam.com/veeam-backup-replication-f2/refs-performance-issue-workaround-t43892.html#p246449
kubimike
Expert
 
Posts: 229
Liked: 22 times
Joined: Fri Feb 03, 2017 2:34 pm
Full Name: MikeO

Re: ReFS Summary...

Veeam Logoby mkretzer » Sun Jul 09, 2017 6:34 am

@Gostev: Sure there are merge issues. As soon as there have been deletions on the REFS the filesystem never recovers from that and merges are as slow as NTFS with the difference that you cannot write much to a REFS during merge!

I would like to show you the problem from our job statistics you can contact me directly!!
mkretzer
Expert
 
Posts: 304
Liked: 67 times
Joined: Thu Dec 17, 2015 7:17 am

Re: ReFS Summary...

Veeam Logoby MSMSMSMSMS » Mon Jul 10, 2017 8:29 am 3 people like this post

Although following problem is not directly related to ReFS cluster size, it is still Veeam/ReFS related, so I am mentioning it here. Veeam doesn't have ReFS equivalent to their NTFS BitLooker technology, so it is backing up dirty ReFS blocks. We are seeing that our Exchange VM's with ReFS volumes, have backups that are almost twice the size of data that is visible on file system. E.g. our databases + OS is 8 TB, our Veeam full backup is 16 TB.
MSMSMSMSMS
Novice
 
Posts: 5
Liked: 3 times
Joined: Tue Mar 28, 2017 9:14 am

Next

Return to Veeam Backup & Replication



Who is online

Users browsing this forum: No registered users and 24 guests