Comprehensive data protection for all workloads
Locked
WinstonWolf
Expert
Posts: 187
Liked: 4 times
Joined: Jan 06, 2011 8:33 am
Contact:

Re: REFS issues (server lockups, high CPU, high RAM)

Post by WinstonWolf » Jun 26, 2018 12:09 pm

gm2783 wrote:Hi all

see below the commands to export the REFS driver:

mkdir C:\%Path%
expand -f:* C:\%Path%\windows10.0-kb4093120-x64_72c7d6ce20eb42c0df760cd13a917bbc1e57c0b7.msu C:\%Path%\Expand
expand -f:*.cab C:\%Path%\Expand\*.cab c:\%Path%\expand\2
expand -f:refs.sys c:\%Path%\expand\2\*.cab c:\%Path%\expand\2\
expand -f:refsv1.sys c:\%Path%\expand\2\*.cab c:\%Path%\expand\2\

Microsoft®Update Catalog:
https://www.catalog.update.microsoft.co ... =KB4093120
How can i change the refs.sys after this in an running System ? Thanks

mkaec
Expert
Posts: 302
Liked: 65 times
Joined: Jul 16, 2015 1:31 pm
Full Name: Marc K
Contact:

Re: REFS issues (server lockups, high CPU, high RAM)

Post by mkaec » Jun 26, 2018 2:02 pm

WinstonWolf wrote:How can i change the refs.sys after this in an running System ? Thanks
If it is a VM, you could power off the VM and mount the VHDX in the host. If it is a physical system, you could boot into a WinPE environment. Another option would be to populate the PendingFileRenameOperations registry value, hope you got it right, and reboot.

Jo_Seph_B
Lurker
Posts: 2
Liked: 1 time
Joined: Jun 26, 2018 2:37 pm
Full Name: Joseph Baldwin
Contact:

Re: REFS issues (server lockups, high CPU, high RAM)

Post by Jo_Seph_B » Jun 26, 2018 2:45 pm 1 person likes this post

We're in the process of migrating back to NTFS. It might be slow but at least it'll be reliable.

I appreciate this is mostly an MS issue but my biggest gripe with Veeam is the fact they still recommend ReFS as the best file system to use. The software even recommends it if I try and create an NTFS repository! Veeam need to react quicker to tell customers to stop using a technology if its not reliable and this is a prime example.

Even if this gets fixed my confidence is shot using ReFS. 4 months of looking daft in front of a customer because backups are failing just isn't an acceptable place to be, once migrated we'll have to manually check every backup for consistency as I need to be sure we've not lost data. I'm disappointed in both MS pushing a technology that was obviously not ready, and Veeam for not stopping customers deploying it with ReFS as the file system of choice and even pushing it as the best file system to be on. Its a shame as Veeam is the best backup product I've ever used and we've been a customer for 6+ years. I'm not going to lie at renewal time I'll be taking a look at the market again, which I've even felt the need to do before with Veeam.

Gostev
SVP, Product Management
Posts: 23860
Liked: 3209 times
Joined: Jan 01, 2006 1:01 am
Location: Baar, Switzerland
Contact:

Re: REFS issues (server lockups, high CPU, high RAM)

Post by Gostev » Jun 26, 2018 3:23 pm

Jo_Seph_B wrote:Veeam need to react quicker to tell customers to stop using a technology if its not reliable and this is a prime example.
Wait, how is this particular issue a prime example when it is NOT a reliability issue? All this latest regression does is make ReFS as slow as NTFS (which you're migrating back to) :D actually, still significantly faster - especially on low-end backup repositories. Or, are you talking about some other issue?

LBegnaud
Service Provider
Posts: 18
Liked: 7 times
Joined: Jan 24, 2018 12:08 am
Contact:

Re: REFS issues (server lockups, high CPU, high RAM)

Post by LBegnaud » Jun 26, 2018 3:40 pm 1 person likes this post

I agree with Gostev. For smaller environments, the ReFS issues are negligible / non-existent, and in larger systems ReFS is the only sane way to do things (ReFS + reverse incremental is approximately one million times better than the old way of doing it). It has been a long time since ReFS has had show-stopping issues for us.

For context, we have about 12 remote B&R servers running WS2016 ReFS repos and one primary datacenter B&R server with ~500 VMs.

antipolis
Enthusiast
Posts: 71
Liked: 9 times
Joined: Oct 26, 2016 9:17 am
Contact:

Re: REFS issues (server lockups, high CPU, high RAM)

Post by antipolis » Jun 26, 2018 3:50 pm

actually I think jo_seph_b has a point here...

while I'm not considering moving back to ntfs myself, waiting 6+ months for some fixes from MS was already terrible by itself; but having a regression like this only 2 months after getting said fixes, and then having to wait two more months... ?? (unless of course you manually rollback the driver... which I did) that's really really awful for a technology pushed by both MS and veeam as production ready (and I'm not intending to point any fingers here...)

great fun when you find out on monday morning that the weekend patching caused regression and half your backup jobs are still running...

then again... yeah... ReFS benefits are so huge that I will just deal with it but meh

Gostev
SVP, Product Management
Posts: 23860
Liked: 3209 times
Joined: Jan 01, 2006 1:01 am
Location: Baar, Switzerland
Contact:

Re: REFS issues (server lockups, high CPU, high RAM)

Post by Gostev » Jun 26, 2018 4:04 pm

Well, to be fair to Microsoft, the initial ReFS issue was extremely complex (as we found out after every aspect of it was finally understood), so yes that fix took very long. But for example, with this latest performance regression, we had a private hotfix from Microsoft within 3 days of reporting the issue back in May. It was simply too late already to include one into their June updates - I get it, we have these situations ourselves when a hotfix is too late to make the immediate update (code freeze has already happened) - just bad timing. And the issue is not critical enough for them to ship an out of band patch - there's no data corruption problem.

Overall, bugs and teething issues around new technologies happen to every vendor. This is absolutely normal, and so early adopters will always struggle. I recommend those who are mad at Microsoft for ReFS teething issues to look no further than VVols, which VMware has been pushing hard for the last few years "as the best file system to be on", quoting Joseph. Well guess what, it is only now that we know that almost everyone who is using VVols have their VMs, backups and replicas corrupted. Which is arguably 10x bigger issue that everything ReFS experienced to date - ironically, just about everything EXCEPT actual data corruption. Something we really want to see from a file system, by the way!

But somehow, this is considered "okay" - I mean, look at the VMware subforum. No one there blames Veeam for supporting VVols since day 1 and promoting this support in its release documentation, just like we did with out ReFS support. Neither anybody feels the need to look for replacing Veeam because our backups and replicas were equally impacted by this terrible VVols bug. Double standards at its best?

Mgamerz
Enthusiast
Posts: 74
Liked: 10 times
Joined: Sep 29, 2017 8:07 pm
Contact:

Re: REFS issues (server lockups, high CPU, high RAM)

Post by Mgamerz » Jun 26, 2018 5:08 pm

For replacing the file, for those above who asked, I rebooted server using shift + restart, and rebooted to command prompt in the options. Then I renamed refs.sys in C:\Windows\system32\drivers to something like refs_june2018.sys, and put in the one from february (pulled from april cumulative update using expand instructions above). Just rebooted it, and so far it seems to be working.

FrancWest
Enthusiast
Posts: 81
Liked: 5 times
Joined: Sep 17, 2017 3:20 am
Full Name: Franc
Contact:

Re: REFS issues (server lockups, high CPU, high RAM)

Post by FrancWest » Jun 26, 2018 5:15 pm

No need, you can simply rename the original file to .old or something and copy the extracted refs.sys to the C:\Windows\System32\drivers folder.

For some reason, it's not locked even when the driver is loaded.

after replacing the driver, reboot the server to have it activated.

Franc.

kubimike
Expert
Posts: 328
Liked: 37 times
Joined: Feb 03, 2017 2:34 pm
Full Name: MikeO
Contact:

Re: REFS issues (server lockups, high CPU, high RAM)

Post by kubimike » Jun 26, 2018 6:48 pm

I still have my server in test mode, still on the private fix release months ago. I knew this wasn't over. No crashes with the private fix, no speed problems. Geeez :roll:

KFM
Service Provider
Posts: 13
Liked: 1 time
Joined: May 14, 2013 1:46 am
Full Name: KFM
Contact:

Re: REFS issues (server lockups, high CPU, high RAM)

Post by KFM » Jun 27, 2018 1:56 am

SBarrett847 wrote:I see this hard lockup if my Repo VM hasn't been assigned enough memory - increase memory and the issue doesn't occur for me.
Hi Stephen,

Thanks for your suggestion but my repo VMs all have 8 vCPU and 32 GB RAM. It locks up even when it's not doing any ingestion of data and the server is simply idling. i.e. no backups/restores/jobs are running to it during the deletion of the files. A long time ago I made sure that if I was doing any bulk big deletes that there would be no jobs running to minimise the impact to them if I had to reboot the repos.

billcouper
Service Provider
Posts: 62
Liked: 17 times
Joined: Dec 18, 2017 8:58 am
Full Name: Bill Couper
Contact:

Re: REFS issues (server lockups, high CPU, high RAM)

Post by billcouper » Jun 27, 2018 3:26 am

@KFM
I also have issues deleting files causing repository server to freeze/lock up with REFS volume. But only if I use the Veeam console to do it!

Every time I have used the Veeam console to delete backup chains the associated repository server locks solid. 100% cpu for hours and hours if you let it before hitting reset.

I have found the only reliable way to delete files is through the operating system. I just login to a repository server and delete the files/folders/whatever I need using Explorer, then run a rescan on the associated sobr in Veeam. When I delete files the server runs a high cpu/ram for a while and in disk management and if I keep refreshing I can see the amount of free space going up slowly. This always works. I have never had a repo server freeze doing it through Explorer.

gm2783
Service Provider
Posts: 6
Liked: 2 times
Joined: Apr 10, 2017 12:42 pm
Full Name: Giuseppe Marchese
Location: Mägenwil
Contact:

Re: REFS issues (server lockups, high CPU, high RAM)

Post by gm2783 » Jun 27, 2018 7:17 am 1 person likes this post

I've replaced the REFS drivers yesterday:
refs.sys
refsv1.sys

Our B2D are now 10x faster during fast cloning. :D

REFS Driver 10.0.14393.2312
Fast Cloning Time:
06:19 h

REFS Driver 10.0.14393.2097
Fast Cloning Time:
00:31 h

In this job there are 5 Fileserver. The Total size is 5.3 TB
The VIB's are aboout 80 ~ 90 GB.

Just for your information...

Jo_Seph_B
Lurker
Posts: 2
Liked: 1 time
Joined: Jun 26, 2018 2:37 pm
Full Name: Joseph Baldwin
Contact:

Re: REFS issues (server lockups, high CPU, high RAM)

Post by Jo_Seph_B » Jun 27, 2018 3:17 pm

Gostev wrote:Wait, how is this particular issue a prime example when it is NOT a reliability issue? All this latest regression does is make ReFS as slow as NTFS (which you're migrating back to) :D actually, still significantly faster - especially on low-end backup repositories. Or, are you talking about some other issue?
Feel free to track my issue over the last 4-6 months (0306889 and 0274109) since go live on a green field setup in Jan following all Veeams best practice guides on ReFS. 6 months down the road and the server still crashes with 100% CPU once a week, failed jobs constantly having to be reran. Veeam support really not able to offer much at all.

We resell Veeam, multiple other customers I've not even needed to log into for 6 months on NTFS. I get the success email each day and move on.

With ReFS I'm working 12 hour weekends to get consistent backups Rescanning datastores, moving files around that have got removed. even doing a ReFS disk restore as it failed. Separate repository still on ReFS also resulted in server crashes just moved the issue.

Tried turning off the ReFS features and it still fails. Everything results in a fail. Veeam support just pointing the finger elsewhere and not even willing to really guide or recommend when we asked if we should switch back, constantly having to chase for information as slow to respond. In the end we decided to get shot of ReFS, needless to say we'll be in touch further down the road about this at a higher level than a forum or ticket.

As above I know the underlying issue isn't Veeams fault. My grip is the fact Veeam are still using ReFS, if I deploy a repository on NTFS the software actually recommends I use ReFS!! Really! Its been unreliable since day 1 so why is Veeam software still suggesting I switch to it. Thats my issue here. Veeam support for ReFS should have been pulled months ago when the issues started flooding in, until MS resolved it and you guys checked it worked properly. Take a look at this thread, the impact wasn't small!

Not intersted in the vvol story, nice as it is, this thread is for Veeam and ReFS issues stop pushing focus on other issues not related.

Gostev
SVP, Product Management
Posts: 23860
Liked: 3209 times
Joined: Jan 01, 2006 1:01 am
Location: Baar, Switzerland
Contact:

Re: REFS issues (server lockups, high CPU, high RAM)

Post by Gostev » Jun 27, 2018 3:41 pm

Jo_Seph_B wrote:Feel free to track my issue over the last 4-6 months (0306889 and 0274109) since go live on a green field setup in Jan following all Veeams best practice guides on ReFS. 6 months down the road and the server still crashes with 100% CPU once a week, failed jobs constantly having to be reran.
OK, so you're talking about the original issue fixed in February ReFS.sys update - not about the latest performance regression being discussed on the last few pages.
Jo_Seph_B wrote:My grip is the fact Veeam are still using ReFS, if I deploy a repository on NTFS the software actually recommends I use ReFS!! Really! Its been unreliable since day 1 so why is Veeam software still suggesting I switch to it. Thats my issue here. Veeam support for ReFS should have been pulled months ago when the issues started flooding in, until MS resolved it and you guys checked it worked properly.
Well, then you have to know that the majority most of our customers actually had success with ReFS since day 1. I have noted this multiple times in the beginning of this thread, just like the fact that it was actually quite hard for us to reproduce the issue in our own lab to demonstrate one to Microsoft. This is because there were too many variables for running into the issue, which became apparent when the issues was fully understood - namely the usage of per-VM chains, backup modes with periodic fulls, 4KB ReFS cluster size, backup repository with low RAM size etc.

If ReFS was unreliable for the majority of users, of course we'd remove the recommendation from the UI. But instead, we opted to tweak one U2 to suggest 64KB cluster size as this was clearly one of the culprits - and continued working with those customers for whom the integration did not work reliable. As you can imagine, simply removing the recommendation would be even easier for us to do, if there were good reasons to do this.

I realize these facts probably do not change anything for you specifically, but I wanted to provide the bigger picture behind this so that you understand why we did not pull ReFS support completely. If we did, we would not be able to iterate on this integration to get it usable for everyone. Just like, for example, VMware would not be able to keep iterating on VSAN and get it to where it is today, if it pulled one completely during the reliability chaos period of initial VSAN releases.

Locked

Who is online

Users browsing this forum: Baidu [Spider], Google [Bot] and 14 guests