Comprehensive data protection for all workloads
Locked
suprnova
Enthusiast
Posts: 38
Liked: never
Joined: Apr 08, 2016 5:15 pm
Contact:

Re: REFS issues (server lockups, high CPU, high RAM)

Post by suprnova »

Gostev wrote:If I were you, I would just start from clean OS install - this is the only way to really make sure you're using the patch in the way that was tested by Microsoft QC.
Interestingly enough the one that froze on the block clone synthetic full was a fresh OS build with all updates and the full backup was from 2/23. This one also did not have any registry keys when it froze.

Gostev
SVP, Product Management
Posts: 30027
Liked: 5936 times
Joined: Jan 01, 2006 1:01 am
Location: Baar, Switzerland
Contact:

Re: REFS issues (server lockups, high CPU, high RAM)

Post by Gostev »

suprnova wrote:Interestingly enough the one that froze on the block clone synthetic full was a fresh OS build with all updates and the full backup was from 2/23. This one also did not have any registry keys when it froze.
So it sounds like the patch made zero difference for you, which is totally unexpected based on prior results of other affected users, which is what makes me suspect it simply did not install properly. I assume you did reboot after installing? May be it is worth having Microsoft support check if it did install properly... because the only other guess I have is that this may be something server-specific, such as lack of RAM?

Gostev
SVP, Product Management
Posts: 30027
Liked: 5936 times
Joined: Jan 01, 2006 1:01 am
Location: Baar, Switzerland
Contact:

Re: REFS issues (server lockups, high CPU, high RAM)

Post by Gostev »

jameskilbynet wrote:We are still seeing some stability issues post this patch. Ours is 160TB REFS volume with approx 80TB in use. We have 128Gb of ram and this is a storage space ( mirror setup) with Nvme cache. We see issues with large data ingestion ie active full or evac of another repo towards the REFS one. We will open another call with Veeam/MS tomorrow
James, actually the issues discussed in this thread are caused by ReFS block cloning API, so they only appear during either synthetic fulls (in the form of very slow performance), or - more typically - during mass deletion of backup files constructed from cloned blocks (in the form of server lockups).

It sounds like your issue happens during the streaming write I/O during creation of brand new files (active fulls or backup copies), and so it is completely unrelated to this discussion. Your issue hasn't been reported by anyone on these forums before, so it looks like some S2D-specific problem. So, please create a separate topic to track this particular issue, once you open the support cases.

Gostev
SVP, Product Management
Posts: 30027
Liked: 5936 times
Joined: Jan 01, 2006 1:01 am
Location: Baar, Switzerland
Contact:

Re: REFS issues (server lockups, high CPU, high RAM)

Post by Gostev »

@suprnova just one more thought, since this thread may have introduced some confusion earlier... please make sure you're installing KB4077525 and not the other one mentioned here by someone a few days ago?

dellock6
Veeam Software
Posts: 6065
Liked: 1875 times
Joined: Jul 26, 2009 3:39 pm
Full Name: Luca Dell'Oca
Location: Varese, Italy
Contact:

Re: REFS issues (server lockups, high CPU, high RAM)

Post by dellock6 »

operations wrote:Would be also nice to know the repo size and larget VM size for those that upgraded.
It may be useful at some point (suggestion to @gostev) to lock this thread, that has become by now huge and messy, and open a new one where we can collect information about working setup now that it seems we have a stable driver. I'd really like to see there short posts with just the useful information like storage size, type, used space, memory and cpu configuration, and the time it takes to run both backups and fastclones. In this way, we can collect enough information "from the field" and tune even better our best practice guide, and new users can look at these information to get ideas about their new repositories.

Luca
Luca Dell'Oca
Principal EMEA Cloud Architect @ Veeam Software

@dellock6
https://www.virtualtothecore.com/
vExpert 2011 -> 2022
Veeam VMCE #1

jozne
Influencer
Posts: 17
Liked: 7 times
Joined: Apr 18, 2012 6:55 pm
Full Name: Jari Haikonen
Contact:

Re: REFS issues (server lockups, high CPU, high RAM)

Post by jozne » 1 person likes this post

Just installed this patch (KB407725) to 8 servers that were affected by the issues. The last one went from 100% cpu (could not boot if data disk was attached) to this after
installation, so "something" is going on but it is not completely frozen as it was before update. Waiting for a while to see if it settles, if not I'll once again drop the data disk,
check that windows is "sane", re-apply the patch if needed, check the registry settings once again (should already be in place) and re-attach the disk.

Image

Gostev
SVP, Product Management
Posts: 30027
Liked: 5936 times
Joined: Jan 01, 2006 1:01 am
Location: Baar, Switzerland
Contact:

Re: REFS issues (server lockups, high CPU, high RAM)

Post by Gostev »

dellock6 wrote:It may be useful at some point (suggestion to @gostev) to lock this thread, that has become by now huge and messy, and open a new one where we can collect information about working setup now that it seems we have a stable driver. I'd really like to see there short posts with just the useful information like storage size, type, used space, memory and cpu configuration, and the time it takes to run both backups and fastclones. In this way, we can collect enough information "from the field" and tune even better our best practice guide, and new users can look at these information to get ideas about their new repositories.
Yes, I will do this 1-2 weeks from now, after receiving the first wave of confirmations of improvements, such as the previous post.

suprnova
Enthusiast
Posts: 38
Liked: never
Joined: Apr 08, 2016 5:15 pm
Contact:

Re: REFS issues (server lockups, high CPU, high RAM)

Post by suprnova »

Gostev wrote:@suprnova just one more thought, since this thread may have introduced some confusion earlier... please make sure you're installing KB4077525 and not the other one mentioned here by someone a few days ago?
Yes that's the one, available from built-in Windows updates in control panel.

Rush700
Service Provider
Posts: 4
Liked: 1 time
Joined: Feb 27, 2018 8:02 pm
Full Name: Derek Zylka
Contact:

Re: REFS issues (server lockups, high CPU, high RAM)

Post by Rush700 » 1 person likes this post

No luck here. I installed the patch yesterday and found the server hung up this morning with the CPU at 100%.

Gostev
SVP, Product Management
Posts: 30027
Liked: 5936 times
Joined: Jan 01, 2006 1:01 am
Location: Baar, Switzerland
Contact:

Re: REFS issues (server lockups, high CPU, high RAM)

Post by Gostev »

suprnova wrote:Yes that's the one, available from built-in Windows updates in control panel.
Actually, the update you need is NOT available through Windows Update (and it won't be published there any time soon). You can only get it as a standalone package from Microsoft Update Catalog at this time, just as KB4077525 explains at the bottom. I guess that explains why you're seeing no changes :wink:

suprnova
Enthusiast
Posts: 38
Liked: never
Joined: Apr 08, 2016 5:15 pm
Contact:

Re: REFS issues (server lockups, high CPU, high RAM)

Post by suprnova »

Interesting...not sure what everyone else's experience was, but I've installed it on 9 repos and every time it's available through Windows Update. I also just downloaded it manually like you suggested, but I'm getting that it's already installed. Also, my refs.sys driver was updated 2/22.

Gostev
SVP, Product Management
Posts: 30027
Liked: 5936 times
Joined: Jan 01, 2006 1:01 am
Location: Baar, Switzerland
Contact:

Re: REFS issues (server lockups, high CPU, high RAM)

Post by Gostev »

Strange indeed, because I was explicitly warned that this update won't be published on Windows Update and needs to be downloaded separately. And even the wording is different for this patch and February cumulative update:
KB4074590 (February) wrote:This update will be downloaded and installed automatically from Windows Update. To get the standalone package for this update, go to the Microsoft Update Catalog website.
KB4077525 (ReFS) wrote:To get the standalone package for this update, go to the Microsoft Update Catalog website.
Anyway, at this point I exhausted all my ideas on why you're not seeing any difference after applying this update. If you want to continue troubleshooting yourself, then perhaps try to reformat the volume with the ReFS patch installed (as it might simply be corrupted during the freezes caused by the original driver). But I would rather open a case with Microsoft, as they may have already seen the same while giving out a pre-release driver to the affected customers.

jzilak
Influencer
Posts: 19
Liked: 1 time
Joined: May 10, 2017 9:01 am
Full Name: Josef Zilak
Contact:

Re: REFS issues (server lockups, high CPU, high RAM)

Post by jzilak »

Hello,
in KB4077525 are included all performance patches developed from last 6 months. But there is still one regression about memory usage. Large systems still need at least 64 GB RAM to workaround system crashes during hi IO peak operations. Fix for this regression will released few months later.

GA for KB4077525 will be available with WS 2016 CU 2018-03.
Anyway, base code was backported from RS4, so RS4 or WS 1803 when released, can be used as repository server with many other ReFS improvements not yet available for RS1.

regards
josef

jozne
Influencer
Posts: 17
Liked: 7 times
Joined: Apr 18, 2012 6:55 pm
Full Name: Jari Haikonen
Contact:

Re: REFS issues (server lockups, high CPU, high RAM)

Post by jozne » 1 person likes this post

Rush700 wrote:No luck here. I installed the patch yesterday and found the server hung up this morning with the CPU at 100%.
Yup same thing for us. Complained about Repo not being in sync, so I rescanned and rebooted and after 1-2min the system hang up with 100% cpu usage.

Now I'm begint to think that should change the backup type for our onsite jobs. They are currently using incremental with synthetic transforms, but we only
need 40 days of backups onsite. Which method would be the best working atm with ReFS since of the problems? Reverse incremental? How about health checks,
should those be turned on/off then? I've seen problems during performing health checks too at ReFs system.

jozne
Influencer
Posts: 17
Liked: 7 times
Joined: Apr 18, 2012 6:55 pm
Full Name: Jari Haikonen
Contact:

Re: REFS issues (server lockups, high CPU, high RAM)

Post by jozne »

btw. Can someone explain why synthetic fulls leveraging ReFS GFS api work in instant using fast clone (https://www.veeam.com/blog/advanced-ref ... suite.html)
but GFS using synthetic full to cloud repository take forever?

Just checked the speed that GFS is creating synthetic full with fast clone and it's only about 45MB/s. Does it not leverage the ReFS api on the cloud side?

mweissen13
Service Provider
Posts: 85
Liked: 45 times
Joined: Dec 28, 2017 3:22 pm
Full Name: Michael Weissenbacher
Contact:

Re: REFS issues (server lockups, high CPU, high RAM)

Post by mweissen13 »

Gostev wrote:Actually, the update you need is NOT available through Windows Update (and it won't be published there any time soon). You can only get it as a standalone package from Microsoft Update Catalog at this time, just as KB4077525 explains at the bottom. I guess that explains why you're seeing no changes :wink:
On our systems the update KB4077525 was available through regular Windows Updates (via 'Check online for updates from Microsoft Update'). But not via WSUS, where we had to manually add it through the Update Catalog. Maybe this is causing the confusion?

Rush700
Service Provider
Posts: 4
Liked: 1 time
Joined: Feb 27, 2018 8:02 pm
Full Name: Derek Zylka
Contact:

Re: REFS issues (server lockups, high CPU, high RAM)

Post by Rush700 »

KB4077525 was installed automatically via Windows Update without WSUS for me as well. The catalog states it is the 2018-02 Cumulative update. I checked and it does not appear to have updated the ReFS driver. In fact the driver for that particular disk says 6/21/2006 and is version 10.0.14393.1613. Is there another place I should be checking the ReFS driver version?

jayscarff
Service Provider
Posts: 111
Liked: 11 times
Joined: Nov 15, 2016 6:56 pm
Location: Cayman Islands
Contact:

Re: REFS issues (server lockups, high CPU, high RAM)

Post by jayscarff »

This is where i got my download from - https://www.catalog.update.microsoft.co ... ?q=2018-02
Jason
VMCE v9

Cullan
Service Provider
Posts: 136
Liked: 19 times
Joined: May 21, 2014 8:47 am
Location: New Zealand
Contact:

Re: REFS issues (server lockups, high CPU, high RAM)

Post by Cullan »

Out of curiosity is everyone running Veeam 9.5 U3?
I was holding off updating to this version due to rumors of ReFS fast clone being disabled.

Can we confirm the Veeam recommended version of B&R when using ReFS Reps?

Gostev
SVP, Product Management
Posts: 30027
Liked: 5936 times
Joined: Jan 01, 2006 1:01 am
Location: Baar, Switzerland
Contact:

Re: REFS issues (server lockups, high CPU, high RAM)

Post by Gostev »

Cullan wrote:I was holding off updating to this version due to rumors of ReFS fast clone being disabled.
Not true.
Cullan wrote:Can we confirm the Veeam recommended version of B&R when using ReFS Reps?
Advanced ReFS integration was added in 9.5 release.

wlcu
Novice
Posts: 9
Liked: never
Joined: Apr 13, 2017 7:05 pm
Full Name: WLCU
Contact:

Re: REFS issues (server lockups, high CPU, high RAM)

Post by wlcu »

For those unware, I believe, the 2018-02 Cumulative Update requires your server to have a reg key (either manually entered or automatically created by your AV software) to mitigate any incompatibilities between the Update and the AV software, otherwise the update will not be presented to the server. If you look at the KB article, it explains on the bottom. The update will not appear in WU unless the key exists. If you don't see it in your registry, add it and do another check. The update will appear.

Key="HKEY_LOCAL_MACHINE"Subkey="SOFTWARE\Microsoft\Windows\CurrentVersion\QualityCompat"
Value Name="cadca5fe-87d3-4b96-b7fb-a231484277cc"

A better explanation can be read here: https://support.microsoft.com/en-us/hel ... s-software

suprnova
Enthusiast
Posts: 38
Liked: never
Joined: Apr 08, 2016 5:15 pm
Contact:

Re: REFS issues (server lockups, high CPU, high RAM)

Post by suprnova »

Rush700 wrote:KB4077525 was installed automatically via Windows Update without WSUS for me as well. The catalog states it is the 2018-02 Cumulative update. I checked and it does not appear to have updated the ReFS driver. In fact the driver for that particular disk says 6/21/2006 and is version 10.0.14393.1613. Is there another place I should be checking the ReFS driver version?
That is really strange, my refs.sys was updated in all cases to 10.0.14393.2097

tsightler
VP, Product Management
Posts: 5964
Liked: 2817 times
Joined: Jun 05, 2009 12:57 pm
Full Name: Tom Sightler
Contact:

Re: REFS issues (server lockups, high CPU, high RAM)

Post by tsightler » 2 people like this post

Rush700 wrote:KB4077525 was installed automatically via Windows Update without WSUS for me as well. The catalog states it is the 2018-02 Cumulative update. I checked and it does not appear to have updated the ReFS driver. In fact the driver for that particular disk says 6/21/2006 and is version 10.0.14393.1613. Is there another place I should be checking the ReFS driver version?
I believe you may be looking in device manager at the disk driver, which is indeed that version (and that old date), but this is not the filesystem driver. Look in C:\Windows\System32\Drivers for Refs.sys and see what version that file is, it should show 10.0.14393.2097 and be dated 2/12/2018.

Phorward
Influencer
Posts: 15
Liked: never
Joined: Oct 30, 2013 1:13 pm
Full Name: Hans Hedman
Contact:

Re: REFS issues (server lockups, high CPU, high RAM)

Post by Phorward »

Has the update been withdrawn for Windows Server 2016? I can find it for Windows 10 1607 but it doesn't show up for Win 2016 on Windows Update Catalog and the links on Google pointing to Windows Update Catalog are dead.

adapterer
Expert
Posts: 227
Liked: 45 times
Joined: Oct 12, 2015 11:24 pm
Contact:

Re: REFS issues (server lockups, high CPU, high RAM)

Post by adapterer »

I noticed this and for what it's worth the filenames and sizes were exactly the same for W10 and 2016.

Phorward
Influencer
Posts: 15
Liked: never
Joined: Oct 30, 2013 1:13 pm
Full Name: Hans Hedman
Contact:

Re: REFS issues (server lockups, high CPU, high RAM)

Post by Phorward »

Ah, I tried the Win 10 update and failed but I notice now that I had the x86 version. The Win 10 x64 update worked.

JimmyO
Enthusiast
Posts: 55
Liked: 9 times
Joined: Apr 27, 2014 8:19 pm
Contact:

Re: REFS issues (server lockups, high CPU, high RAM)

Post by JimmyO » 3 people like this post

Installed KB4077525 on one of my smaller backup servers with 58TB volume (backup file: 6TB full, 200GB increment) and merge time went from 8h to 1h after first daily backup.
Of course - the server has just been restarted, but first impressions are good :)

Bacon
Novice
Posts: 3
Liked: 1 time
Joined: Jan 26, 2018 9:35 am
Full Name: Alexander
Contact:

Re: REFS issues (server lockups, high CPU, high RAM)

Post by Bacon » 1 person likes this post

I still have the problems with the KB4077525. I think it even got worse on the second destination server. the job crashes during the job and not after.

operations
Service Provider
Posts: 12
Liked: never
Joined: Nov 25, 2017 6:49 pm
Full Name: operations
Contact:

Re: REFS issues (server lockups, high CPU, high RAM)

Post by operations »

From what I understand the rollup that contains the fix for ReFS also contains patches for meltdown and spectre, has anyone tried to extract the patch or updates and just install the ReFS fix so as to avoid the melt/spec patches ?

Gostev
SVP, Product Management
Posts: 30027
Liked: 5936 times
Joined: Jan 01, 2006 1:01 am
Location: Baar, Switzerland
Contact:

Re: REFS issues (server lockups, high CPU, high RAM)

Post by Gostev »

Actually, this is an interesting thought. The pre-release driver was provided by itself, and I know for the fact that it helped every single person who installed it... but now we're installing a huge patch to get the same driver, so it is certainly a possibility something else in this patch is misbehaving. Although this is not very likely, because in that case it would misbehave for everyone. Usually, when experiences are so dramatically different, this indicates some environment-specific trigger such as computer configuration (lack of system resources) or 3rd party software conflict (where antivirus is the primary suspect). I hope those still having issues after installing the patch are able to investigate them with Microsoft.

Locked

Who is online

Users browsing this forum: thang.nguyen and 29 guests