Comprehensive data protection for all workloads
SyNtAxx
Expert
Posts: 149
Liked: 15 times
Joined: Jan 02, 2015 7:12 pm
Contact:

ReFS Summary...

Post by SyNtAxx »

I've been following the threads on ReFS as best I can. I understand the filesystem should be formatted at 64kb clusters but there appeared to be additional issues still when using large clusters.

Can we get a complete summary of the ReFS situation? Is it stable enough to use ? Is there a 'fix' ?

thanks,

Nick
Gostev
Chief Product Officer
Posts: 32737
Liked: 7958 times
Joined: Jan 01, 2006 1:01 am
Location: Baar, Switzerland
Contact:

Re: ReFS Summary...

Post by Gostev » 1 person likes this post

Hi, Nick

Indeed, it seems stable enough for most users with 64KB volumes. Now we can be sure about that, because after nearly two months of purposely killing multiple ReFS repositories in our labs with dozens of continuous jobs producing and deleting hundreds of restore points, and a specially created Clonezilla tool on top of those, it's been stable for all of our test backup repositories but one. This is all without any private fixes or tweaks, just vanilla Windows Server 2016 with all updates.

Even the repository for which we've finally reproduced the issue has been working fine for weeks, until a few days ago. Which is a good news, as this finally lets us analyze one, trying to understand what is different about this particular repository - as well as test some possible workarounds we can implement from our side (we have some ideas on what can potentially help). Of course, our main hope is that Microsoft fixes this on their side - I know they've been working hard investigating this one (I asked them for another update a few days ago).

The issue for sure happens only during the retention processing, when backup files are being deleted from the disk - some ReFS metadata update operation seems to be "long running" and preventing other I/O to the same volume - this is the essence of the issue.

One recommendation we can give based on our observations so far is to avoid scheduling synthetic fulls too often (or disable them completely), and don't use per-VM chains. Both measures allow to reduce the amount of files with cloned blocks that are deleted at once. In fact, one of the workarounds we're testing right now is simply "throttling" backup file deletions by adding a timeout after issuing each file deletion command. And I've already heard that the first results were promising.

Thanks!
kubimike
Veteran
Posts: 395
Liked: 56 times
Joined: Feb 03, 2017 2:34 pm
Full Name: MikeO
Contact:

Re: ReFS Summary...

Post by kubimike »

only way I was able to fix retention processing was with the private fix. Not sure how you got it to work otherwise.
mkretzer
Veeam Legend
Posts: 1289
Liked: 464 times
Joined: Dec 17, 2015 7:17 am
Contact:

Re: ReFS Summary...

Post by mkretzer »

What do i have to do to get the private fix from MS? Is there a ticket number i can tell them?
kubimike
Veteran
Posts: 395
Liked: 56 times
Joined: Feb 03, 2017 2:34 pm
Full Name: MikeO
Contact:

Re: ReFS Summary...

Post by kubimike »

Veeam has it. Start a ticket in their Helpdesk.
mkretzer
Veeam Legend
Posts: 1289
Liked: 464 times
Joined: Dec 17, 2015 7:17 am
Contact:

Re: ReFS Summary...

Post by mkretzer »

@kubimike No. I did just that and they told me i have to get the patch from MS, they do not have the hotfix. Case 02179620.
nmdange
Veteran
Posts: 536
Liked: 149 times
Joined: Aug 20, 2015 9:30 pm
Contact:

Re: ReFS Summary...

Post by nmdange »

One thing I'm curious about... if you enable periodic synthetic fulls and also enable the option "transform previous backup chains into rollbacks" is that transform process accelerated by ReFS? And if so, would that improve the issue with deletes like it does with forward-forever incremental? With the previous chain converted into rollbacks, a single rollback is deleted every day instead of an entire chain once a week.
Gostev
Chief Product Officer
Posts: 32737
Liked: 7958 times
Joined: Jan 01, 2006 1:01 am
Location: Baar, Switzerland
Contact:

Re: ReFS Summary...

Post by Gostev »

Yes, it is accelerated by ReFS.
mkretzer
Veeam Legend
Posts: 1289
Liked: 464 times
Joined: Dec 17, 2015 7:17 am
Contact:

Re: ReFS Summary...

Post by mkretzer »

@Gostev but it would be even worse for the merge and delete issues, correct? The full backup file will still have to be deleted after the modification and there is even more load on the REFS.
Gostev
Chief Product Officer
Posts: 32737
Liked: 7958 times
Joined: Jan 01, 2006 1:01 am
Location: Baar, Switzerland
Contact:

Re: ReFS Summary...

Post by Gostev »

There's no "merge issues" that I am aware of, and the "delete issue" does not seem to happen from any deletion - but rather only when the job has to delete multiple backup files at once according to retention.
kubimike
Veteran
Posts: 395
Liked: 56 times
Joined: Feb 03, 2017 2:34 pm
Full Name: MikeO
Contact:

Re: ReFS Summary...

Post by kubimike »

Wrong.
Gostev
Chief Product Officer
Posts: 32737
Liked: 7958 times
Joined: Jan 01, 2006 1:01 am
Location: Baar, Switzerland
Contact:

Re: ReFS Summary...

Post by Gostev »

I've certainly been wrong before, but this is what we're seeing in our lab where the issue reproduces consistently. Even adding 1 second sleep between individual back file deletions solves the issue when deleting the same bunch of files. And this is in the clean test (before each test, we roll back the repository server to a snapshot containing the state of reliable issue reproduction in case of mass backup file deletion). As the next step, we will validate this workaround with a couple of users our support has been closely engaged with, and I will update.
kubimike
Veteran
Posts: 395
Liked: 56 times
Joined: Feb 03, 2017 2:34 pm
Full Name: MikeO
Contact:

Re: ReFS Summary...

Post by kubimike »

In my real world lab I couldn't delete one file from a job that contained 5TB VBKs. Mind you, I had my retention set to (200) Everyday I would have to extend my retention period out another few digits to prevent a retention cleanup from happening. Thankfully I had plenty of disk space or else it would have all been over. That all changed last week when I loaded that test refs driver. Im now back peddling my job, everyday I reduce it by 2 retention points to clean up my disk and get my space back. I thought I was going to lose my backups again either by running out of space or having to switch to NTFS. Fix couldn't come soon enough for me.

In case you missed it this will fill in an questions veeam-backup-replication-f2/refs-perfor ... ml#p246449
mkretzer
Veeam Legend
Posts: 1289
Liked: 464 times
Joined: Dec 17, 2015 7:17 am
Contact:

Re: ReFS Summary...

Post by mkretzer »

@Gostev: Sure there are merge issues. As soon as there have been deletions on the REFS the filesystem never recovers from that and merges are as slow as NTFS with the difference that you cannot write much to a REFS during merge!

I would like to show you the problem from our job statistics you can contact me directly!!
MSMSMSMSMS
Novice
Posts: 5
Liked: 3 times
Joined: Mar 28, 2017 9:14 am
Contact:

Re: ReFS Summary...

Post by MSMSMSMSMS » 3 people like this post

Although following problem is not directly related to ReFS cluster size, it is still Veeam/ReFS related, so I am mentioning it here. Veeam doesn't have ReFS equivalent to their NTFS BitLooker technology, so it is backing up dirty ReFS blocks. We are seeing that our Exchange VM's with ReFS volumes, have backups that are almost twice the size of data that is visible on file system. E.g. our databases + OS is 8 TB, our Veeam full backup is 16 TB.
m.novelli
Veeam ProPartner
Posts: 598
Liked: 117 times
Joined: Dec 29, 2009 12:48 pm
Full Name: Marco Novelli
Location: Asti - Italy
Contact:

Re: ReFS Summary...

Post by m.novelli »

MSMSMSMSMS wrote:Although following problem is not directly related to ReFS cluster size, it is still Veeam/ReFS related, so I am mentioning it here. Veeam doesn't have ReFS equivalent to their NTFS BitLooker technology, so it is backing up dirty ReFS blocks. We are seeing that our Exchange VM's with ReFS volumes, have backups that are almost twice the size of data that is visible on file system. E.g. our databases + OS is 8 TB, our Veeam full backup is 16 TB.
Wow :shock:
Ciao,

Marco
sg_sc
Enthusiast
Posts: 61
Liked: 8 times
Joined: Mar 29, 2016 4:22 pm
Full Name: sg_sc
Contact:

Re: ReFS Summary...

Post by sg_sc » 1 person likes this post

Nice to know, but has nothing to do with ReFS for backup repository.

ReFS 64K with enough RAM, do not use per-VM and no more then weekly synthetic fulls, runs good.
BGA-Robert
Service Provider
Posts: 60
Liked: 8 times
Joined: Feb 03, 2016 5:06 pm
Full Name: Robert Wakefield
Contact:

Re: ReFS Summary...

Post by BGA-Robert »

Thanks for the update!

We've been holding on deploying to ReFS. This would be the back end repo for our Cloud Connect target. I feel like I'm hearing that could fail. I understand synthetic fulls and lots of deletes could be bad. That sounds like what we do.

Seems like when this topic comes up, I'm hearing "ReFS is good to go. Except for one more issue..."
I'm confused... :roll:

Any recommendation for a service provider's Cloud Connect repo???
sg_sc
Enthusiast
Posts: 61
Liked: 8 times
Joined: Mar 29, 2016 4:22 pm
Full Name: sg_sc
Contact:

Re: ReFS Summary...

Post by sg_sc »

As a service provider I would suggest to stick with NTFS and maybe test some select clients on ReFS.
ferrus
Veeam ProPartner
Posts: 301
Liked: 44 times
Joined: Dec 03, 2015 3:41 pm
Location: UK
Contact:

[MERGED] Optimum Backup Job strategy, post ReFS

Post by ferrus »

I've been a Veeam user for a couple of years now, on Windows 2012 and NTFS.
The common recommendation for Tier 1 backups in this configuration, from these forums and consultants we dealt with - was a Forever Forward Incremental strategy.
Now ReFS appears to have changed the game significantly, in terms of the footprint of both storage and backup window.

There has been much discussion on the forum about the stability and configuration of ReFS, but I can't find anything in terms of any effect on the recommended Backup Job Configuration.
Previously we avoided synthetic fulls - as a rolling 28 day Forever Forward strategy was much more efficient; but are Synthetic Fulls, or even Active Fulls the better choice now?
(Presuming ReFS IS now the officially recommended option :?: )
Gostev
Chief Product Officer
Posts: 32737
Liked: 7958 times
Joined: Jan 01, 2006 1:01 am
Location: Baar, Switzerland
Contact:

Re: ReFS Summary...

Post by Gostev »

You can find backup job configuration recommendations in the first post. Thanks!
SBarrett847
Service Provider
Posts: 315
Liked: 41 times
Joined: Feb 02, 2016 5:02 pm
Full Name: Stephen Barrett
Contact:

Re: ReFS Summary...

Post by SBarrett847 » 1 person likes this post

BGA-Robert wrote:Any recommendation for a service provider's Cloud Connect repo???
I use ReFS for my Cloud Connect Repos. There are some limitations though, for example you can't use quotas in Veeam and use Fast Cloning to save space (I give customers individual VHDs instead). ReFS doesn't support mount points, so the number of VHDs are limited per server.

I also had issues with the Repo Server keeling over, until I gave it tons of memory.

That said, it is ticking over nicely now, and the Fast Cloning is worth the effort (IMO)
ferrus
Veeam ProPartner
Posts: 301
Liked: 44 times
Joined: Dec 03, 2015 3:41 pm
Location: UK
Contact:

Re: ReFS Summary...

Post by ferrus » 1 person likes this post

Gostev wrote:You can find backup job configuration recommendations in the first post. Thanks!
Didn't find much relevant information there (other than the basics), but there's a a lot of interesting discussion in this thread - veeam-backup-replication-f2/refs-perfor ... 92-30.html

If I've understood the thread correctly, with ReFS there's no data-integrity benefit from running synthetic fulls - as they just reference blocks from previous restore points, rather than recreating them.
Forever Forward Incremental remains the most efficient method in terms of least block clone usage, and Active Full backups reset the block clone usage - hopefully returning fast-clone performance.

So my question is - is there still a reason for using synthetic full backups with ReFS? I can't see one from reading that thread.

Perhaps the strategy we're currently on - 28 day Forever Forward Incremental (without Fulls), is actually still the most efficient on ReFS.
SBarrett847
Service Provider
Posts: 315
Liked: 41 times
Joined: Feb 02, 2016 5:02 pm
Full Name: Stephen Barrett
Contact:

Re: ReFS Summary...

Post by SBarrett847 »

ferrus wrote: So my question is - is there still a reason for using synthetic full backups with ReFS? I can't see one from reading that thread.
.
GFS storage space saving is the only reason really. However if the Backup is un-encrypted, NTFS and Dedup might yield similar results.
SyNtAxx
Expert
Posts: 149
Liked: 15 times
Joined: Jan 02, 2015 7:12 pm
Contact:

Re: ReFS Summary...

Post by SyNtAxx »

I think I'll be holding off on REFS until it is stable and a bit more mature.
Gostev
Chief Product Officer
Posts: 32737
Liked: 7958 times
Joined: Jan 01, 2006 1:01 am
Location: Baar, Switzerland
Contact:

Re: ReFS Summary...

Post by Gostev »

You are right. We will look into this once ReFS gets better adoption as it comes to application servers specifically - right now, its use cases there are pretty limited. I think ReFS will sky rocket once Microsoft will enable the ability to boot from ReFS volumes, until then NTFS will still be the king inside VMs.
ferrus
Veeam ProPartner
Posts: 301
Liked: 44 times
Joined: Dec 03, 2015 3:41 pm
Location: UK
Contact:

Re: ReFS Summary...

Post by ferrus »

I feel as though a statement of Veeam's current position on ReFS could be beneficial.

My gut feeling at the moment is to hold off on ReFS across most of our repository's - apart from one which has it's own major performance issues. Currently the only job on that repo is 85% into a 44 hour merge - so even a broken ReFS may be better than the status quo.
mkretzer
Veeam Legend
Posts: 1289
Liked: 464 times
Joined: Dec 17, 2015 7:17 am
Contact:

Re: ReFS Summary...

Post by mkretzer »

@ferrus: Did you install the newest update? I am interested if it restores the good merge speed for you too....
ferrus
Veeam ProPartner
Posts: 301
Liked: 44 times
Joined: Dec 03, 2015 3:41 pm
Location: UK
Contact:

Re: ReFS Summary...

Post by ferrus »

Haven't migrated yet - that's the 2012/NTFS duration.

I'm planning on migrating one Proxy/Repository server over, within the next 1-4 weeks.
sg_sc
Enthusiast
Posts: 61
Liked: 8 times
Joined: Mar 29, 2016 4:22 pm
Full Name: sg_sc
Contact:

Re: ReFS Summary...

Post by sg_sc »

I feel it would be better to not discuss ReFS for repo's and ReFS as filesystem for production VM's next to each other, it will just confuse everyone.
Post Reply

Who is online

Users browsing this forum: Egor Yakovlev, Google [Bot] and 25 guests