ReFS Summary...

SyNtAxx · Post by **SyNtAxx** » Jul 06, 2017 2:11 pm this post

I've been following the threads on ReFS as best I can. I understand the filesystem should be formatted at 64kb clusters but there appeared to be additional issues still when using large clusters.

Can we get a complete summary of the ReFS situation? Is it stable enough to use ? Is there a 'fix' ?

thanks,

Nick

Jul 06, 2017 9:03 pm

Hi, Nick

Indeed, it seems stable enough for most users with 64KB volumes. Now we can be sure about that, because after nearly two months of purposely killing multiple ReFS repositories in our labs with dozens of continuous jobs producing and deleting hundreds of restore points, and a specially created Clonezilla tool on top of those, it's been stable for all of our test backup repositories but one. This is all without any private fixes or tweaks, just vanilla Windows Server 2016 with all updates.

Even the repository for which we've finally reproduced the issue has been working fine for weeks, until a few days ago. Which is a good news, as this finally lets us analyze one, trying to understand what is different about this particular repository - as well as test some possible workarounds we can implement from our side (we have some ideas on what can potentially help). Of course, our main hope is that Microsoft fixes this on their side - I know they've been working hard investigating this one (I asked them for another update a few days ago).

The issue for sure happens only during the retention processing, when backup files are being deleted from the disk - some ReFS metadata update operation seems to be "long running" and preventing other I/O to the same volume - this is the essence of the issue.

One recommendation we can give based on our observations so far is to avoid scheduling synthetic fulls too often (or disable them completely), and don't use per-VM chains. Both measures allow to reduce the amount of files with cloned blocks that are deleted at once. In fact, one of the workarounds we're testing right now is simply "throttling" backup file deletions by adding a timeout after issuing each file deletion command. And I've already heard that the first results were promising.

Thanks!

kubimike · Post by **kubimike** » Jul 07, 2017 3:08 am this post

only way I was able to fix retention processing was with the private fix. Not sure how you got it to work otherwise.

Post by **mkretzer** » Jul 07, 2017 8:11 am this post

What do i have to do to get the private fix from MS? Is there a ticket number i can tell them?

kubimike · Post by **kubimike** » Jul 07, 2017 11:30 am this post

Veeam has it. Start a ticket in their Helpdesk.

Post by **mkretzer** » Jul 07, 2017 2:59 pm this post

@kubimike No. I did just that and they told me i have to get the patch from MS, they do not have the hotfix. Case 02179620.

nmdange · Post by **nmdange** » Jul 07, 2017 4:33 pm this post

One thing I'm curious about... if you enable periodic synthetic fulls and also enable the option "transform previous backup chains into rollbacks" is that transform process accelerated by ReFS? And if so, would that improve the issue with deletes like it does with forward-forever incremental? With the previous chain converted into rollbacks, a single rollback is deleted every day instead of an entire chain once a week.

Post by **Gostev** » Jul 07, 2017 10:09 pm this post

Yes, it is accelerated by ReFS.

Post by **mkretzer** » Jul 08, 2017 6:41 am this post

@Gostev but it would be even worse for the merge and delete issues, correct? The full backup file will still have to be deleted after the modification and there is even more load on the REFS.

Post by **Gostev** » Jul 08, 2017 7:27 pm this post

There's no "merge issues" that I am aware of, and the "delete issue" does not seem to happen from any deletion - but rather only when the job has to delete multiple backup files at once according to retention.

kubimike · Post by **kubimike** » Jul 08, 2017 8:23 pm this post

Wrong.

Post by **Gostev** » Jul 08, 2017 10:04 pm this post

I've certainly been wrong before, but this is what we're seeing in our lab where the issue reproduces consistently. Even adding 1 second sleep between individual back file deletions solves the issue when deleting the same bunch of files. And this is in the clean test (before each test, we roll back the repository server to a snapshot containing the state of reliable issue reproduction in case of mass backup file deletion). As the next step, we will validate this workaround with a couple of users our support has been closely engaged with, and I will update.

kubimike · Post by **kubimike** » Jul 08, 2017 11:20 pm this post

In my real world lab I couldn't delete one file from a job that contained 5TB VBKs. Mind you, I had my retention set to (200) Everyday I would have to extend my retention period out another few digits to prevent a retention cleanup from happening. Thankfully I had plenty of disk space or else it would have all been over. That all changed last week when I loaded that test refs driver. Im now back peddling my job, everyday I reduce it by 2 retention points to clean up my disk and get my space back. I thought I was going to lose my backups again either by running out of space or having to switch to NTFS. Fix couldn't come soon enough for me.

In case you missed it this will fill in an questions veeam-backup-replication-f2/refs-perfor ... ml#p246449

Post by **mkretzer** » Jul 09, 2017 6:34 am this post

@Gostev: Sure there are merge issues. As soon as there have been deletions on the REFS the filesystem never recovers from that and merges are as slow as NTFS with the difference that you cannot write much to a REFS during merge!

I would like to show you the problem from our job statistics you can contact me directly!!

MSMSMSMSMS · Jul 10, 2017 8:29 am

Although following problem is not directly related to ReFS cluster size, it is still Veeam/ReFS related, so I am mentioning it here. Veeam doesn't have ReFS equivalent to their NTFS BitLooker technology, so it is backing up dirty ReFS blocks. We are seeing that our Exchange VM's with ReFS volumes, have backups that are almost twice the size of data that is visible on file system. E.g. our databases + OS is 8 TB, our Veeam full backup is 16 TB.

Post by **m.novelli** » Jul 10, 2017 10:39 am this post

MSMSMSMSMS wrote:Although following problem is not directly related to ReFS cluster size, it is still Veeam/ReFS related, so I am mentioning it here. Veeam doesn't have ReFS equivalent to their NTFS BitLooker technology, so it is backing up dirty ReFS blocks. We are seeing that our Exchange VM's with ReFS volumes, have backups that are almost twice the size of data that is visible on file system. E.g. our databases + OS is 8 TB, our Veeam full backup is 16 TB.

Wow

sg_sc · Jul 10, 2017 5:26 pm

Nice to know, but has nothing to do with ReFS for backup repository.

ReFS 64K with enough RAM, do not use per-VM and no more then weekly synthetic fulls, runs good.

Post by **BGA-Robert** » Jul 10, 2017 5:46 pm this post

Thanks for the update!

We've been holding on deploying to ReFS. This would be the back end repo for our Cloud Connect target. I feel like I'm hearing that could fail. I understand synthetic fulls and lots of deletes could be bad. That sounds like what we do.

Seems like when this topic comes up, I'm hearing "ReFS is good to go. Except for one more issue..."
I'm confused...

Any recommendation for a service provider's Cloud Connect repo???

sg_sc · Post by **sg_sc** » Jul 10, 2017 6:54 pm this post

As a service provider I would suggest to stick with NTFS and maybe test some select clients on ReFS.

Post by **ferrus** » Jul 13, 2017 10:08 am this post

I've been a Veeam user for a couple of years now, on Windows 2012 and NTFS.
The common recommendation for Tier 1 backups in this configuration, from these forums and consultants we dealt with - was a Forever Forward Incremental strategy.
Now ReFS appears to have changed the game significantly, in terms of the footprint of both storage and backup window.

There has been much discussion on the forum about the stability and configuration of ReFS, but I can't find anything in terms of any effect on the recommended Backup Job Configuration.
Previously we avoided synthetic fulls - as a rolling 28 day Forever Forward strategy was much more efficient; but are Synthetic Fulls, or even Active Fulls the better choice now?
(Presuming ReFS IS now the officially recommended option

)

Post by **Gostev** » Jul 13, 2017 11:43 am this post

You can find backup job configuration recommendations in the first post. Thanks!

Jul 13, 2017 12:35 pm

BGA-Robert wrote:Any recommendation for a service provider's Cloud Connect repo???

I use ReFS for my Cloud Connect Repos. There are some limitations though, for example you can't use quotas in Veeam and use Fast Cloning to save space (I give customers individual VHDs instead). ReFS doesn't support mount points, so the number of VHDs are limited per server.

I also had issues with the Repo Server keeling over, until I gave it tons of memory.

That said, it is ticking over nicely now, and the Fast Cloning is worth the effort (IMO)

Jul 13, 2017 2:32 pm

Gostev wrote:You can find backup job configuration recommendations in the first post. Thanks!

Didn't find much relevant information there (other than the basics), but there's a a lot of interesting discussion in this thread - veeam-backup-replication-f2/refs-perfor ... 92-30.html

If I've understood the thread correctly, with ReFS there's no data-integrity benefit from running synthetic fulls - as they just reference blocks from previous restore points, rather than recreating them.
Forever Forward Incremental remains the most efficient method in terms of least block clone usage, and Active Full backups reset the block clone usage - hopefully returning fast-clone performance.

So my question is - is there still a reason for using synthetic full backups with ReFS? I can't see one from reading that thread.

Perhaps the strategy we're currently on - 28 day Forever Forward Incremental (without Fulls), is actually still the most efficient on ReFS.

Post by **SBarrett847** » Jul 13, 2017 2:34 pm this post

ferrus wrote: So my question is - is there still a reason for using synthetic full backups with ReFS? I can't see one from reading that thread.
.

GFS storage space saving is the only reason really. However if the Backup is un-encrypted, NTFS and Dedup might yield similar results.

SyNtAxx · Post by **SyNtAxx** » Jul 14, 2017 3:05 am this post

I think I'll be holding off on REFS until it is stable and a bit more mature.

Post by **Gostev** » Jul 16, 2017 7:20 pm this post

You are right. We will look into this once ReFS gets better adoption as it comes to application servers specifically - right now, its use cases there are pretty limited. I think ReFS will sky rocket once Microsoft will enable the ability to boot from ReFS volumes, until then NTFS will still be the king inside VMs.

Post by **ferrus** » Jul 21, 2017 1:46 pm this post

I feel as though a statement of Veeam's current position on ReFS could be beneficial.

My gut feeling at the moment is to hold off on ReFS across most of our repository's - apart from one which has it's own major performance issues. Currently the only job on that repo is 85% into a 44 hour merge - so even a broken ReFS may be better than the status quo.

Post by **mkretzer** » Jul 21, 2017 2:06 pm this post

@ferrus: Did you install the newest update? I am interested if it restores the good merge speed for you too....

Post by **ferrus** » Jul 21, 2017 2:41 pm this post

Haven't migrated yet - that's the 2012/NTFS duration.

I'm planning on migrating one Proxy/Repository server over, within the next 1-4 weeks.

sg_sc · Post by **sg_sc** » Aug 14, 2017 5:18 pm this post

I feel it would be better to not discuss ReFS for repo's and ReFS as filesystem for production VM's next to each other, it will just confuse everyone.

R&D Forums

ReFS Summary...

Re: ReFS Summary...

Re: ReFS Summary...

Re: ReFS Summary...

Re: ReFS Summary...

Re: ReFS Summary...

Re: ReFS Summary...

Re: ReFS Summary...

Re: ReFS Summary...

Re: ReFS Summary...

Re: ReFS Summary...

Re: ReFS Summary...

Re: ReFS Summary...

Re: ReFS Summary...

Re: ReFS Summary...

Re: ReFS Summary...

Re: ReFS Summary...

Re: ReFS Summary...

Re: ReFS Summary...

[MERGED] Optimum Backup Job strategy, post ReFS

Re: ReFS Summary...

Re: ReFS Summary...

Re: ReFS Summary...

Re: ReFS Summary...

Re: ReFS Summary...

Re: ReFS Summary...

Re: ReFS Summary...

Re: ReFS Summary...

Re: ReFS Summary...

Re: ReFS Summary...

Who is online