Comprehensive data protection for all workloads
Locked
YoMarK
Enthusiast
Posts: 55
Liked: 8 times
Joined: Jul 13, 2009 12:50 pm
Full Name: Mark
Location: The Netherlands
Contact:

Re: REFS 4k horror story

Post by YoMarK »

Our Veeam repository's(SAN with one 2008R2 and one Windows 2012R2 server) with ~50TB of Veeam backups on it is going to need replacement very soon.

Same question as David, but I'm also wondering if it's smart to replace it with one repository with Windows 2016 server with ReFS( 64K ) environment.
Is it ready for prime time with volumes of 70-80TB?
Delo123
Veteran
Posts: 361
Liked: 109 times
Joined: Dec 28, 2012 5:20 pm
Full Name: Guido Meijers
Contact:

Re: REFS 4k horror story

Post by Delo123 » 2 people like this post

Hi Mark,

we are using 2016 REFS 64K repository for some 2 months now. 189TB Volume with currently 23,4TB of Data on it until now. No glitch, everything looking stable.
However we still make a second backup to 2012R2 Dedupe volumes and another copy job to be sure... I wouldn't trust ReFS 100% yet if that's your "only" repository, too early...
Squish
Novice
Posts: 6
Liked: 2 times
Joined: Feb 16, 2017 12:25 pm
Full Name: Ondřej Kraus
Contact:

Re: REFS 4k horror story

Post by Squish » 2 people like this post

Same here, wondering if I should switch to ReFS or not. Looking forward for today's veeam webinar: "Scaling backup repositories with Veeam & Microsoft ReFS"
alesovodvojce
Enthusiast
Posts: 61
Liked: 9 times
Joined: Nov 29, 2016 10:09 pm
Contact:

Re: REFS 4k horror story

Post by alesovodvojce »

Confirming that bug still exists despite the update. It killed our VBR server right now again, whilst we have updated all servers 20 hours ago with patch that seems to target also ReFS. But not this issue. We are going to invest time to find other solutions. Regretting we had chosen ReFS filesystem.
mkretzer
Veeam Legend
Posts: 1140
Liked: 387 times
Joined: Dec 17, 2015 7:17 am
Contact:

Re: REFS 4k horror story

Post by mkretzer »

@alesovodvojce Killed how? Bluescreen or hang/blackscreen?
kubimike
Veteran
Posts: 373
Liked: 41 times
Joined: Feb 03, 2017 2:34 pm
Full Name: MikeO
Contact:

Re: REFS 4k horror story

Post by kubimike »

@ alesovodvojce Im assuming you're 4k ?
graham8
Enthusiast
Posts: 59
Liked: 20 times
Joined: Dec 14, 2016 1:56 pm
Contact:

Re: REFS 4k horror story

Post by graham8 »

I'm with you alesovodvojce. I switched to Veeam + Server 2016 + ReFS because I wanted a more supported/mainstream (space-efficient) solution that didn't rely as much on me personally to support unix systems and elaborate custom scripting, but what was in place before (ZFS + ZFS Send + GFS Snapshots) is looking better and better, in spite of less people being able to support it.

I guess it'll get resolved eventually, but it's disheartening when you make a move to try to be more responsible and put things in a more mainstream, supported place with major companies like Microsoft (obviously this isn't Veeam's fault of course) and end up with something far hackier and flakier than the "custom scripted with free stuff" setup that was in place before.
alesovodvojce
Enthusiast
Posts: 61
Liked: 9 times
Joined: Nov 29, 2016 10:09 pm
Contact:

Re: REFS 4k horror story

Post by alesovodvojce » 1 person likes this post

ReFS 4k and VM halt resulting force poweroff - that is the answer for @mkretzer @kubimike question

Btw after months of investigation and lately finding out it is a bug in Microsoft's product,
to lower the chance of trouble frequency, if you are on ReFS:
- use 64k cluster size. I have no exp, but IT pros here experience less troubles with it now
- use minimum concurrent jobs. Our experience. The less jobs are running in parallel, the less stress of ReFS resulting timeouts, resulting hangs
- impose rate limits on storage. In veeam. For backup repository, in properties, you can impose read/write limiting.
We have imposed 1/10th of speed = 70MBs. This avoids stresses of repo also.

In our case, first this bug "kills" the guest with remote repo. At the bkf file consolidation phase. This waiting transaction then stresses and later "kills" primary backup server, that initiated backup copy job. When we impose concurrency, rate limits, and disable backup jobs, thd ReFS can work for week without hang, even more.

I am writing here to warn you, and in a hope that somebody might cross a solution...
Mike Resseler
Product Manager
Posts: 8044
Liked: 1263 times
Joined: Feb 08, 2013 3:08 pm
Full Name: Mike Resseler
Location: Belgium
Contact:

Re: REFS 4k horror story

Post by Mike Resseler »

Hi Alesovodvojce,

Thank you for your golden tips! They are really appreciated. As some stated here, we are waiting for MSFT to fix this. This seems to be the only real solution to wait for. That and your tips (and please use 64K if you are starting now).

@graham8: As said, this is really painful but I still believe (but please let it be fixed soon :-)) that ReFS has a bright future, not only in combination with Veeam but also as a file system that hosts VMs (the checkpoint merge is extremely impressive and fast).

From our side, Gostev will continue to "bug" MSFT for this for sure.

This thread will be continued...
mkretzer
Veeam Legend
Posts: 1140
Liked: 387 times
Joined: Dec 17, 2015 7:17 am
Contact:

Re: REFS 4k horror story

Post by mkretzer »

One more thing; our feeling is that it works better without per-VM chains because at least the filesystem does not have to track so many files. Overall slowness of the filesystem seems better that way.
Gostev
Chief Product Officer
Posts: 31457
Liked: 6648 times
Joined: Jan 01, 2006 1:01 am
Location: Baar, Switzerland
Contact:

Re: REFS 4k horror story

Post by Gostev »

All, great news! It appears that KB4013429 does include the fix for this issue, however you need to enable the newly added registry value to activate this new behavior. This makes sense, we also like to introduce major behavior modifiers this way before making them default.

I have the instructions, but I was asked not to share them broadly, because the ReFS dev team is planning to release the detailed blog post today that will have full context and details. If you absolutely cannot wait, shoot me a PM.
kubimike
Veteran
Posts: 373
Liked: 41 times
Joined: Feb 03, 2017 2:34 pm
Full Name: MikeO
Contact:

Re: REFS 4k horror story

Post by kubimike »

@Gostev great news!
Ctek
Service Provider
Posts: 83
Liked: 13 times
Joined: Nov 11, 2015 3:50 pm
Location: Canada
Contact:

Re: REFS 4k horror story

Post by Ctek »

As a side question, why would someone use 4K Allocation size for ReFS repository? Doesn't 64K makes more sense since pretty much all the files in there will be of considerable size. Am I missing something? Unless it was left at default while creating the repository partitions.
VMCE
HJAdams123
Enthusiast
Posts: 72
Liked: 16 times
Joined: Jul 16, 2012 1:54 pm
Full Name: Harold Adams
Contact:

Re: REFS 4k horror story

Post by HJAdams123 »

I guess I can wait until the blog post. Gostev, I hope you will give us a link to that blog post on this thread? (Thanks as always)
Gostev
Chief Product Officer
Posts: 31457
Liked: 6648 times
Joined: Jan 01, 2006 1:01 am
Location: Baar, Switzerland
Contact:

Re: REFS 4k horror story

Post by Gostev »

Sure thing, I will share the link unless someone beats met to that.

@Ctek because of 10% more used space with 64K clusters due to alignment requirement of BlockClone API. Otherwise you're right, 64KB makes sense considering the workload and very large volume sizes.
skumflum
Service Provider
Posts: 33
Liked: 1 time
Joined: Jun 13, 2016 6:51 am
Full Name: Søren Emig
Contact:

Re: REFS 4k horror story

Post by skumflum »

Greate news :D

However, I am a little confused. I have concluded (from reading this forum) that 64K has the same problem, although the likelihood of running into the bug is less. Am I correct?

….will this hotfix+registry fix this as well?

I hope so since I am planning to build a large REFS repository 8)
Mike Resseler
Product Manager
Posts: 8044
Liked: 1263 times
Joined: Feb 08, 2013 3:08 pm
Full Name: Mike Resseler
Location: Belgium
Contact:

Re: REFS 4k horror story

Post by Mike Resseler »

@Soren,

Yes, the likelihood will be lower to run into this bug. And yes, this hotfix + registry should fix both so...
kubimike
Veteran
Posts: 373
Liked: 41 times
Joined: Feb 03, 2017 2:34 pm
Full Name: MikeO
Contact:

Re: REFS 4k horror story

Post by kubimike »

@Mike Resseler "Fix both soon" can you explain what you mean by that please?
Gostev
Chief Product Officer
Posts: 31457
Liked: 6648 times
Joined: Jan 01, 2006 1:01 am
Location: Baar, Switzerland
Contact:

Re: REFS 4k horror story

Post by Gostev »

He did not say what you are quoting ;) the issue may impact any server regardless of ReFS cluster size, so the fix applies to all ReFS deployments.
kubimike
Veteran
Posts: 373
Liked: 41 times
Joined: Feb 03, 2017 2:34 pm
Full Name: MikeO
Contact:

Re: REFS 4k horror story

Post by kubimike »

Huh? I read what he typed seems so :) So for us that have 64k size will we need any registry tweaks or can we just install the latest KB ? TIA
Gostev
Chief Product Officer
Posts: 31457
Liked: 6648 times
Joined: Jan 01, 2006 1:01 am
Location: Baar, Switzerland
Contact:

Re: REFS 4k horror story

Post by Gostev »

KB + registry tweaks
kubimike
Veteran
Posts: 373
Liked: 41 times
Joined: Feb 03, 2017 2:34 pm
Full Name: MikeO
Contact:

Re: REFS 4k horror story

Post by kubimike »

@Gostev, coo and where can I find these top secret registry fixes please sir?
DaveWatkins
Veteran
Posts: 370
Liked: 97 times
Joined: Dec 13, 2015 11:33 pm
Contact:

Re: REFS 4k horror story

Post by DaveWatkins »

Gostev wrote:All, great news! It appears that KB4013429 does include the fix for this issue, however you need to enable the newly added registry value to activate this new behavior. This makes sense, we also like to introduce major behavior modifiers this way before making them default.

I have the instructions, but I was asked not to share them broadly, because the ReFS dev team is planning to release the detailed blog post today that will have full context and details. If you absolutely cannot wait, shoot me a PM.
I guess we'll find out with the blog post but I'm confused that the fix for something that's causing blue screens and system crashes requires the user to make a registry change. Assuming that's the actual case, it seems insane to me
ds2
Enthusiast
Posts: 82
Liked: 19 times
Joined: Jul 16, 2015 6:31 am
Full Name: Rene Keller
Contact:

Re: REFS 4k horror story

Post by ds2 »

I can't unterstand why this is such a secret.

If it is a fix for a issue, why it isn't enabled by default? Why there is a need of changing a reg-key?

I'm afraid that there will be side effects by enabaling the key.
alesovodvojce
Enthusiast
Posts: 61
Liked: 9 times
Joined: Nov 29, 2016 10:09 pm
Contact:

Re: REFS 4k horror story

Post by alesovodvojce »

We are trying the fix as a remedy and evaluating it to avoid further speculations. Thanks for offering private sharing of the fix.

As we are really affected, please anyone go ahead to share vital information regarding remedies. While I understand others' frustration (mine is big also), it does not help and there will be enough time to discuss opinions later. Thanks and wish us a brighter backups soon!
Gostev
Chief Product Officer
Posts: 31457
Liked: 6648 times
Joined: Jan 01, 2006 1:01 am
Location: Baar, Switzerland
Contact:

Re: REFS 4k horror story

Post by Gostev »

All, here is the official KB article from Microsoft > FIX: Heavy memory usage in ReFS on Windows Server 2016 and Windows 10
Please don't forget to install KB4013429 before applying the registry values, and remember to reboot the server after doing so.

Finally, please do remember to share what option has worked for you!
skumflum
Service Provider
Posts: 33
Liked: 1 time
Joined: Jun 13, 2016 6:51 am
Full Name: Søren Emig
Contact:

Re: REFS 4k horror story

Post by skumflum »

@Gostev

Thank you :D

I would great if Veeam could spell out some recommendation on this one, perhaps in conjunction with some guidelines for building a repository server.

We are planning a 350TB REFS repository and I’m a little scared moving forward
WimVD
Service Provider
Posts: 60
Liked: 19 times
Joined: Dec 23, 2014 4:04 pm
Contact:

Re: REFS 4k horror story

Post by WimVD »

Was expecting a blogpost from Microsoft with some in-depth explanation and guidance on the different options. Is this still upcoming?
Richardrichard
Novice
Posts: 5
Liked: never
Joined: Mar 07, 2017 5:57 am
Full Name: Rich
Contact:

Re: REFS 4k horror story

Post by Richardrichard »

@skumflum

I'm in exactly the same position, weighing up different options for similarly sized repository and would lean towards ReFS if this is fixed
Gostev
Chief Product Officer
Posts: 31457
Liked: 6648 times
Joined: Jan 01, 2006 1:01 am
Location: Baar, Switzerland
Contact:

Re: REFS 4k horror story

Post by Gostev »

@Søren, Richard - we've tested 500TB repository before this fix was available, and it worked like charm. Don't be scared just based on this forum thread - keep in mind huge amount of users Veeam has, and the fact that people without issues rarely or never come to forums to share that all works well for them ;) I can tell you compared to the number of customers that are testing/using ReFS, the number of actual issues is fairly low as long as there's enough RAM on the repository server and 64K clusters.

@WimVD my bad for calling it blog post, I expected some article - I did not know what format it will be published in.
Locked

Who is online

Users browsing this forum: Google [Bot], Semrush [Bot] and 139 guests