Comprehensive data protection for all workloads
Locked
Nick-SAC
Enthusiast
Posts: 76
Liked: 16 times
Joined: Oct 27, 2017 5:42 pm
Full Name: Nick
Contact:

Re: REFS issues (server lockups, high CPU, high RAM)

Post by Nick-SAC »

“I decided to go for the latest update (KB4457131, refs.sys 2457) and after 2 days I can definitely say that it´s slower.”
With the way these KB Updates are now packaged it’s often difficult, if not impossible, to come to valid cause & effect conclusions because they are changing multiple variables simultaneously, e.g., in the below description (from TenForums) several of the items listed at least look like they could have an a effect on performance and/or stability.
September 11, 2018 KB4457131 (OS Build 14393.2485)
Applies to: Windows 10 version 1607, Windows Server 2016

Improvements and fixes

This update includes quality improvements. No new operating system features are being introduced in this update. Key changes include:

Security updates to Internet Explorer, Microsoft Edge, Microsoft scripting engine, Microsoft Graphics Component, Windows media, Windows Shell, Device Guard, Windows Hyper-V, Windows catacenter networking, Windows kernel, Windows virtualization and kernel, Microsoft JET Database Engine, Windows MSXML, and Windows Server.
I mean, after all, once you start fooling with the catacenter... :wink:

Nick
sullie
Lurker
Posts: 1
Liked: never
Joined: Sep 19, 2018 1:30 pm
Full Name: Kevin Sullivan
Contact:

Re: REFS issues (server lockups, high CPU, high RAM)

Post by sullie »

I have followed this thread for several months now and I wanted to say "thank you" for starting it and I wanted to pass along my information. I switched to VEEAM last fall from another backup software. I started off with a brand new server (Dell PE 730xd) w/128GB RAM and 35TB of storage on 10k rpm drives with the backup drives running REFS and I started successfully backing up data without issue around October 2017 until June 2018. In June 2018, I ran into the issue of the backup running along fine until it hit the "Synthetic full fast clone" part of the job. When it got to that point, my backup times went from about 90 minutes (24 servers, about 5.9 TB processed on average running over two 10GB twinax connections using MS NIC teaming) to about 12 or 14 hours and these are nightly Synthetic Full backup jobs with the main issue being the job getting stuck on the "Synthetic full backup created successfully (fast clone) part. I observed the weekend backup jobs (Active Full backup jobs) going from around 5.5 hours to well over 48 hours.

I called VEEAM support (again this is back in June when the issue was new) and after the technician looked over the server jobs and what was happening, I was advised to remove the last Windows Server patch, reboot and update the ticket. I did that and that resolved the issue. I figured the issue would be resolved with July updates. I applied July updates, the issue returned, I uninstalled July patches and the issue (of long backups) disappeared again.

In August, again, I applied August patches, the issue returned, I removed the patches and I got a BSOD upon reboot. At that point I also had an 'oh crap' moment when I realized that I was not backing up my OS partition on my server. That's 100% on me, I've been doing this long enough that I should have taken that into consideration. I called Dell support, they assisted me in getting that Windows patch cleaned up and we got my server back up and running. The next day I immediately implemented an OS backup procedure.

September updates have come out, I verified I had good OS partition backups, I held my breath, applied the following patches (KB4457131 and KB4091664), I rebooted the server, the server came back up w/no issues (victory 1) and last night I observed normal backup times for nightly synthetic full backups (90 minutes) - (victory 2) - so I "think/hope/pray" all is well for the time being. I will update this thread if that changes but I wanted to relay my experience on here and thank everyone for updating their experiences.
Gostev
Chief Product Officer
Posts: 31624
Liked: 7120 times
Joined: Jan 01, 2006 1:01 am
Location: Baar, Switzerland
Contact:

Re: REFS issues (server lockups, high CPU, high RAM)

Post by Gostev »

KB4343884 is looking good so far in our testing in the specific scenario that reliably caused server lockups in early days. According to RAMMap, metafile never grows over 10GB even during large file deletions. We're planning to test the same on the large scale and with reduced physical RAM next.
Mgamerz
Expert
Posts: 159
Liked: 28 times
Joined: Sep 29, 2017 8:07 pm
Contact:

Re: REFS issues (server lockups, high CPU, high RAM)

Post by Mgamerz »

Does KB4457131 (sept 11 update) include this? I may not be having enough sleep but this Aug 30 one doesn't look like its a cumulative one.
Gostev
Chief Product Officer
Posts: 31624
Liked: 7120 times
Joined: Jan 01, 2006 1:01 am
Location: Baar, Switzerland
Contact:

Re: REFS issues (server lockups, high CPU, high RAM)

Post by Gostev »

AFAIK, all Windows Updates are still cumulative these days? They are only starting to talk about "delta updates" coming in the future.
Mgamerz
Expert
Posts: 159
Liked: 28 times
Joined: Sep 29, 2017 8:07 pm
Contact:

Re: REFS issues (server lockups, high CPU, high RAM)

Post by Mgamerz »

Cool, thanks. I think I have started installing the sept updates on a few of our (not veeam) servers. I've learned to wait on installing updates on the veeam server.
mmonroe
Enthusiast
Posts: 75
Liked: 3 times
Joined: Jun 16, 2010 8:16 pm
Full Name: Monroe
Contact:

Re: REFS issues (server lockups, high CPU, high RAM)

Post by mmonroe »

I just built a fresh WinSer 2016 server for testing. I updated the system and see where KB4132216 was automatically installed. I created a ReFS D-drive for my Veeam repostory according to the best practices. I checked the driver version and see this: 10.0.143936.1613.

I understand that is is not the latest as provided with KB4343884 - 10.0.14393.2457. I checked that on the MSFT web site and it says that KB4132216 must be loaded - check. I pulled down the Server 2016 64-bit version of KB4343884. When I run it I get a pop-up "This is not applicable to your computer". I have double checked and am am pretty sure I pulled down the correct file.

I suspect I am doing something silly/goofy. Any tricks on getting KB4343884 - 10.0.14393.2457 to load on a fresh 2016 box?
mmonroe
Enthusiast
Posts: 75
Liked: 3 times
Joined: Jun 16, 2010 8:16 pm
Full Name: Monroe
Contact:

Re: REFS issues (server lockups, high CPU, high RAM)

Post by mmonroe »

It looks like the server has a newer release (KB4457127) that supercedes KB4343887 and that is why it fails when I try to load it.

Maybe I am not checking the ReFS driver version in the correct place? I checked the properties in the Disk Manager and see 10.0.14393.1613.

Do I need to check it some place else? I just need to make sure the latest ReFS driver 2457 or newer is in place according to Gostev.

Thanks!
nmdange
Veteran
Posts: 527
Liked: 142 times
Joined: Aug 20, 2015 9:30 pm
Contact:

Re: REFS issues (server lockups, high CPU, high RAM)

Post by nmdange »

You need to look at the actual file C:\Windows\system32\drivers\refs.sys
mmonroe
Enthusiast
Posts: 75
Liked: 3 times
Joined: Jun 16, 2010 8:16 pm
Full Name: Monroe
Contact:

Re: REFS issues (server lockups, high CPU, high RAM)

Post by mmonroe »

Thanks, checking the file shows 10.0.14393.2515. Looks like I am good.
mkretzer
Veeam Legend
Posts: 1192
Liked: 412 times
Joined: Dec 17, 2015 7:17 am
Contact:

Re: REFS issues (server lockups, high CPU, high RAM)

Post by mkretzer »

.2515? Sounds like a new version!
JimmyO
Enthusiast
Posts: 55
Liked: 9 times
Joined: Apr 27, 2014 8:19 pm
Contact:

Re: REFS issues (server lockups, high CPU, high RAM)

Post by JimmyO »

mkretzer wrote: Sep 23, 2018 9:03 am .2515? Sounds like a new version!
Correct - it´s the latest "2018-09 Cumulative Update for Windows Server 2016 for x64-based Systems (KB4457127)". I´ve just installed it - let´s see how it goes...
ejenner
Veteran
Posts: 636
Liked: 100 times
Joined: Mar 23, 2018 4:43 pm
Full Name: EJ
Location: London
Contact:

Re: REFS issues (server lockups, high CPU, high RAM)

Post by ejenner »

How many people who are seeing crashes have a CCM (System Center Configuration Manager) agent on their machines?

I have two identical repository servers. Everything is the same apart from two things. The server which is crashing has the very latest ReFS driver and a CCM agent.

As an experiment a few months ago, I removed CCM agent from one of the repositories. That server has stopped crashing. We have hundreds of servers which don't crash and they all run the same set of system management agents.

If anybody is still having trouble it would be interesting to see if CCM has anything to do with it.
JimmyO
Enthusiast
Posts: 55
Liked: 9 times
Joined: Apr 27, 2014 8:19 pm
Contact:

Re: REFS issues (server lockups, high CPU, high RAM)

Post by JimmyO »

I have the agent - no crashes..
ejenner
Veteran
Posts: 636
Liked: 100 times
Joined: Mar 23, 2018 4:43 pm
Full Name: EJ
Location: London
Contact:

Re: REFS issues (server lockups, high CPU, high RAM)

Post by ejenner »

Mine are very occasional. It's not as if the repositories crash every night. One of them has been totally stable for months. The one which has STOP errors is only protecting 900GB of data because I've not put much on it yet. Since it has been in service it has only crashed 4 times. The other server was crashing more often but hasn't crashed since the CCM agent was removed and now that is the only difference in the configuration between the two ReFS repositories.
kubimike
Veteran
Posts: 391
Liked: 56 times
Joined: Feb 03, 2017 2:34 pm
Full Name: MikeO
Contact:

Re: REFS issues (server lockups, high CPU, high RAM)

Post by kubimike »

Let us know JimmyO ! Just for the record which issue were you having here please ?
kubimike
Veteran
Posts: 391
Liked: 56 times
Joined: Feb 03, 2017 2:34 pm
Full Name: MikeO
Contact:

Re: REFS issues (server lockups, high CPU, high RAM)

Post by kubimike »

@Gostev I see KB4457127 includes Refs.sys,10.0.14393.2515 . Any ideas what was fixed in this version of the driver ? KB4343884 = 2457
mkretzer
Veeam Legend
Posts: 1192
Liked: 412 times
Joined: Dec 17, 2015 7:17 am
Contact:

Re: REFS issues (server lockups, high CPU, high RAM)

Post by mkretzer » 1 person likes this post

kubimike wrote: Sep 24, 2018 8:30 pm @Gostev I see KB4457127 includes Refs.sys,10.0.14393.2515 . Any ideas what was fixed in this version of the driver ? KB4343884 = 2457
Info from MS support: "most of the memory usage issues have been corrected in this version".

Nevertheless we had our second BSOD today with REFS. Our NTFS repo had one BSOD in years with ~320 TB ob backups and the REFS repo had 2 BSOD in 3 weeks with ~20 TB of backups.
JimmyO
Enthusiast
Posts: 55
Liked: 9 times
Joined: Apr 27, 2014 8:19 pm
Contact:

Re: REFS issues (server lockups, high CPU, high RAM)

Post by JimmyO » 2 people like this post

kubimike wrote: Sep 24, 2018 8:27 pm Let us know JimmyO ! Just for the record which issue were you having here please ?
Until refs.sys .2097 (-ish) was released I had huge performance issues, merging backups could take more than 40 hours, but now the same job takes 2 hours.
I also had issues with unresponsive disks in windows, but no crashes. It may be related to the fact that I always had 384GB of RAM in my Repos.

Guess I was one of the early adopters of ReFS and about a year+ ago I was working closely with MS to resolve the issues. At some poing I actually "gave up" and converted all my repost back to NTFS (took 2 weeks), but now I feel confident that ReFS is the way to go. Actually - It´s the ONLY way since merging backups on NTFS would simply take too long and render VBR unusable (daily backup job on NTFS would take more than 24 hours).

Also - the latest refs.sys .2515 seems stable...
kubimike
Veteran
Posts: 391
Liked: 56 times
Joined: Feb 03, 2017 2:34 pm
Full Name: MikeO
Contact:

Re: REFS issues (server lockups, high CPU, high RAM)

Post by kubimike »

w00T! Awesome, maybe I'll upgrade Im still on the 2nd beta release lol. No issues with it I just don't want to be on an experimental driver. :shock:
bryand82487
Influencer
Posts: 14
Liked: 2 times
Joined: Dec 13, 2017 3:31 pm
Full Name: Bryan Dennison
Contact:

Re: REFS issues (server lockups, high CPU, high RAM)

Post by bryand82487 »

We had issues in the very beginning with REFS like most, but at some point the REFS driver was replaced and this resolved our issues with the backup repo locking up during merges due to CPU/Memory. The working REFS driver version I was on was 10.0.14393.2248. I just installed KB4284880 a week ago and the REFS driver was replaced with 10.0.14393.2312 which causes issues. Only one of my backup jobs were affected which was a 7 TB full and it went from being able to do fast clone merges of 50-70 GB incrementals in 10-15 minutes prior to this update to 4-5 hours after KB4284880 was installed. I didn't notice this was happening until two days ago when it tried to merge an incremental that was 1.2 TB's and the merge had been running for 15 hours and was only at 41%. I thought it was just hung so I rebooted the server and performed a retry on the job and same thing happened. I uninstalled KB4284880, verified my REFS driver version reverted back, then kicked off another retry and this time it finished in 45 minutes. The strange thing with the merge issue this time around was completely opposite of the original CPU/Memory REFS lockup issues back at the first of this year. This time the backup repo didn't lock up, and it was hardly using any resources at all. The CPU was holding steady at 15% and memory at around 2GB. We use fibre attached storage presented as RDM's to the backup repo VM and these luns were also doing next to nothing from a performance standpoint. I just read up a few posts that the September cumulative update is looking good, but I just wanted to share my experience with this in case there are any others who like us typically stay a few months behind on Windows updates.
Gostev
Chief Product Officer
Posts: 31624
Liked: 7120 times
Joined: Jan 01, 2006 1:01 am
Location: Baar, Switzerland
Contact:

Re: REFS issues (server lockups, high CPU, high RAM)

Post by Gostev »

Bryan, actually this is a well known performance regression in the corresponding Windows Update. It was much discussed earlier in this thread - back in June after the update was released. You definitely want to skip summer updates and go straight to the most actual. Especially since September update from a couple of weeks ago patches the worst security issue of the year so far (privilege escalation via Task Scheduler). Thanks!
kubimike
Veteran
Posts: 391
Liked: 56 times
Joined: Feb 03, 2017 2:34 pm
Full Name: MikeO
Contact:

Re: REFS issues (server lockups, high CPU, high RAM)

Post by kubimike »

wow omg, is this it?! its been so long in the making. I have a thumbdrive on the way to use veeam agent for windows to do a bare metal of my OS before installing the latest patch ! 8)
bryand82487
Influencer
Posts: 14
Liked: 2 times
Joined: Dec 13, 2017 3:31 pm
Full Name: Bryan Dennison
Contact:

Re: REFS issues (server lockups, high CPU, high RAM)

Post by bryand82487 »

Gostev wrote: Sep 26, 2018 7:52 pm Bryan, actually this is a well known performance regression in the corresponding Windows Update. It was much discussed earlier in this thread - back in June after the update was released. You definitely want to skip summer updates and go straight to the most actual. Especially since September update from a couple of weeks ago patches the worst security issue of the year so far (privilege escalation via Task Scheduler). Thanks!
Gostev

Thanks for the clarification! I followed and posted in this thread near the beginning, but thought we were past all the REFS issues at this point. Since this thread has now reached 82 pages it would probably be extremely beneficial to others if there was some type of REFS cheat sheet compiled from this thread of all the most up to date need to know regarding this topic posted on the Veeam website.
Gostev
Chief Product Officer
Posts: 31624
Liked: 7120 times
Joined: Jan 01, 2006 1:01 am
Location: Baar, Switzerland
Contact:

Re: REFS issues (server lockups, high CPU, high RAM)

Post by Gostev »

@MikeO not sure what are you talking about.

@Bryan, yes I will be locking this thread down soon anyway, since current state of ReFS is very far from where this thread started, especially after the latest Windows updates.
kubimike
Veteran
Posts: 391
Liked: 56 times
Joined: Feb 03, 2017 2:34 pm
Full Name: MikeO
Contact:

Re: REFS issues (server lockups, high CPU, high RAM)

Post by kubimike »

@Gostev, I was celebrating because it seems .2515 is the one I want to run after hearing the news from other folks that have been through the ringer with this driver. That brings me to another question, can I just run windows update and expect it will stamp out my beta driver ?
Gostev
Chief Product Officer
Posts: 31624
Liked: 7120 times
Joined: Jan 01, 2006 1:01 am
Location: Baar, Switzerland
Contact:

Re: REFS issues (server lockups, high CPU, high RAM)

Post by Gostev »

Good question. I would certainly expect it to do so, however I am ReFS MVP - not Windows Update MVP :D
kubimike
Veteran
Posts: 391
Liked: 56 times
Joined: Feb 03, 2017 2:34 pm
Full Name: MikeO
Contact:

Re: REFS issues (server lockups, high CPU, high RAM)

Post by kubimike »

So what do you guys tell customers that were on the beta driver then ? lol
Gostev
Chief Product Officer
Posts: 31624
Liked: 7120 times
Joined: Jan 01, 2006 1:01 am
Location: Baar, Switzerland
Contact:

Re: REFS issues (server lockups, high CPU, high RAM)

Post by Gostev »

I would just run Windows Update and made sure the driver is updated by verifying its version.
ejenner
Veteran
Posts: 636
Liked: 100 times
Joined: Mar 23, 2018 4:43 pm
Full Name: EJ
Location: London
Contact:

Re: REFS issues (server lockups, high CPU, high RAM)

Post by ejenner »

One of my repositories crashed on 3/10/18.

I still think it's worth pursuing the possibility of incompatibility with the System Centre agent.

So what I've done today is to remove CCM from the crashing repository and put it back onto the stable repository. Then I can see if the currently stable repository starts to crash and the crashing one stops crashing.
Locked

Who is online

Users browsing this forum: Baidu [Spider] and 87 guests