Comprehensive data protection for all workloads
Locked
bryand82487
Influencer
Posts: 14
Liked: 2 times
Joined: Dec 13, 2017 3:31 pm
Full Name: Bryan Dennison
Contact:

[MERGED] REFS 3.1 backup repo locking up high CPU

Post by bryand82487 »

I've seen where there have been a lot of memory issues using REFS, but I'm having a high CPU issue that's locking the backup repo server up requiring me to reboot it. I have one job that is causing the issue and has been running fine since we setup the REFS 3.1 repo until the retention policy kicked in and it's now trying to remove backup files at the end of the job. The job shows that the backup completed successfully and the old backup files were removed, but the backup files still remain in the backup folder for the job on the repository. Every night when the job runs now it tries to remove those same backup files and locks up the backup repo server. I've monitored the backup repo during this process and as soon as the Veeam job shows that its removing restore points per the retention policy the process SYSTEM on the backup repository server jumps up to around 75% and shortly after the server locks up and I'm not able to access it or do anything with it other than reboot it. I've left it that way over night thinking maybe the CPU would eventually come back down and it doesn't. I've doubled the CPU's and this didn't help either. My backup repository server is a virtual Server 2016 VM. I'm using REFS 3.1 with 64k blocks. The backup repository is 4 luns presented from a NImble array over fiber channel. The luns are presented to the backup repository server as Raw Device Mappings. From within Windows a 30 TB spanned volume is created. It has 8 CPUS and 12GB of memory. It was previously running fine with 4 CPUs, but I doubled them once this issue came up and that didn't help. Although I'm not having a memory issue I've tried applying the RefsEnableLargeWorkingSetTrim=1 registry entry to see if that would make a difference and it did not. The Windows patch 4013429 has been applied. The backup job having the issue has two 3 TB file servers. I'm running incremental backups once a day and a synthetic full weekly. The retention for the job is 30. I'm not sure what the process is that Veeam uses to cleanup old backup files, but as a test I cloned the 4 luns that make up my backup repository and presented them to another 2016 server VM with the same amount of resources as my backup repo server and tried manually deleting all the backup files at once that my job having issues is trying to delete from the repo when it runs and I was able to delete them without any issues. All my other jobs that aren't having any issues also have a 30 day retention and do a synthetic full weekly, but there are no VM's in these jobs that are anywhere near the size of my 3 TB file servers. I opened a case with Veeam Monday and haven't heard anything back yet. I gave them a call this morning and had them change it to a sev2. I was hoping maybe someone on here has encountered this issue or something similar that may have some insight.
Gostev
Chief Product Officer
Posts: 31806
Liked: 7300 times
Joined: Jan 01, 2006 1:01 am
Location: Baar, Switzerland
Contact:

Re: REFS issues (server lockups, high CPU, high RAM)

Post by Gostev »

WimVD wrote:Just read Anton's digest and finally feeling a bit confident things will get fixed. Thank you all and merry christmas :D
And just in case anyone missed one, copying the referenced part here:
Gostev wrote:I had a great call with the ReFS team at MIcrosoft last week, and finally there is some good news to share on the infamous ReFS issue. Apparently, the reason it took so long to fix is that there are multiple separate issues! For about 20 min, I just listened to the lead developer going through all of them, and their findings made perfect sense. I can see now how hard it was to identify and separate all these individual issues, when some were outside of ReFS even – for example, caused by NTFS-specific optimization in the OS memory management. And it also makes perfect sense now why only some of our customers were affected – for example, small ReFS volumes should be much less impacted by these bugs. Anyway, long story short – all known issues seem to be fully fixed in RS4 (as validated by multiple impacted customers who Microsoft worked with directly), and they are now waiting for approval to include the backport into RS1 – so if all is well, the fixes should arrive in one of the next cumulative updates. Meanwhile, I suppose it would be a good idea for everyone affected to get the private fix from Microsoft Support – we're getting one for our labs too.
bryand82487
Influencer
Posts: 14
Liked: 2 times
Joined: Dec 13, 2017 3:31 pm
Full Name: Bryan Dennison
Contact:

Re: REFS issues (server lockups, high CPU, high RAM)

Post by bryand82487 »

I didn't read the whole thread since its 47 pages long, but should this resolve my REFS issue since I'm already using 64k blocks and am having high CPU usage locking up the backup repo server when Veeam tries to do deletes per the retention policy or is this only for high memory using 4k blocks?
bryand82487
Influencer
Posts: 14
Liked: 2 times
Joined: Dec 13, 2017 3:31 pm
Full Name: Bryan Dennison
Contact:

Re: REFS 4k horror story

Post by bryand82487 »

thomas.raabo wrote:With this new update i was told to remove all ReFS registry I already had.

Along the update came the new recommendation

Windows Registry Editor Version 5.00

[HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\FileSystem]
"RefsProcessedDeleteQueueEntryCountThreshold"=dword:00010000
"RefsEnableLargeWorkingSetTrim"=dword:00000001
"RefsCheckpointSampleInterval"=dword:00000001
"RefsContainerRotationSampleInterval"=dword:00000001
"RefsLogFileFullSampleInterval"=dword:00000001
;"RefsDisableRefCountParallelDelete"=dword:00000001
;"RefsNumberOfChunksToTrim"=dword:0
;"RefsEnableInlineTrim"=dword:00000001
;"RefsDisableCachedPins"=dword:00000001
Could you please provide the update file name so that I can call and request that? I tried calling and referencing the case number you provided and they think I want information regarding your case. If I can get the filename I will call and try referencing that without mentioning your case number. It's not just replacing the refs.sys driver and creating those reg keys is it? I'm assuming there's an actual exe that you had to run?
bryand82487
Influencer
Posts: 14
Liked: 2 times
Joined: Dec 13, 2017 3:31 pm
Full Name: Bryan Dennison
Contact:

Re: REFS issues (server lockups, high CPU, high RAM)

Post by bryand82487 »

I'm not having any luck with Microsoft obtaining this private fix. We don't have a MS support agreement so I'm not able to get anyone on the phone with MS that can help me get my hands on the update.
kubimike
Veteran
Posts: 391
Liked: 56 times
Joined: Feb 03, 2017 2:34 pm
Full Name: MikeO
Contact:

Re: REFS issues (server lockups, high CPU, high RAM)

Post by kubimike »

I have an open ticket with microsoft and they won't give me a copy of the latest beta either. I am still running the previous private fix. Im still in the process of trying to figure out why I can't test it out. Glad to see this thread is properly renamed to what it should be! :shock:
kubimike
Veteran
Posts: 391
Liked: 56 times
Joined: Feb 03, 2017 2:34 pm
Full Name: MikeO
Contact:

Re: REFS issues (server lockups, high CPU, high RAM)

Post by kubimike » 1 person likes this post

@bryand82487 Filename ? Its REFS . The version is what counts '10.0.14939.1934'
kubimike
Veteran
Posts: 391
Liked: 56 times
Joined: Feb 03, 2017 2:34 pm
Full Name: MikeO
Contact:

Re: REFS issues (server lockups, high CPU, high RAM)

Post by kubimike »

bryand82487 wrote:I didn't read the whole thread since its 47 pages long, but should this resolve my REFS issue since I'm already using 64k blocks and am having high CPU usage locking up the backup repo server when Veeam tries to do deletes per the retention policy or is this only for high memory using 4k blocks?
Do an active full this stops the crashing for me (Hot add mode will get this done quickly), wait a few days after, reduce your restore points incrementally by one each day so its a bit more manageable for the OS( Say max 10 or so). Also time your last active full with the one you're doing now. That will give you a sense of how often you need to do it to prevent freezing/crashing.

Also, if your running the old private fix or perhaps even the latest public release refs driver make sure you have RefsProcessedDeleteQueueEntryCountThreshold"=dword:000000200 set :mrgreen:
bryand82487
Influencer
Posts: 14
Liked: 2 times
Joined: Dec 13, 2017 3:31 pm
Full Name: Bryan Dennison
Contact:

Re: REFS issues (server lockups, high CPU, high RAM)

Post by bryand82487 »

kubimike wrote:@bryand82487 Filename ? Its REFS . The version is what counts '10.0.14939.1934'
Thanks! It looks like MS actually released a cumulative update yesterday. I just installed it hoping for this refs driver to be included but it was the 1770 driver.
Lunatic Magnet
Influencer
Posts: 17
Liked: 3 times
Joined: Oct 18, 2017 6:40 pm
Contact:

Re: REFS 4k horror story

Post by Lunatic Magnet »

Lunatic Magnet wrote: No, mine is currently 10.0.14393.1770. Windows 2016 build 1607 (14393.1884) with all latest patches. I'll see what support has to say
After speaking to Microsoft they confirmed what Gostev had posted (see above) that an update is coming, eta unknown. Since we just passed patch Tuesday, hopefully January. They're not going to release a beta driver if the proper fix is fourth coming. Still, I couldn't get it any earlier.
bryand82487
Influencer
Posts: 14
Liked: 2 times
Joined: Dec 13, 2017 3:31 pm
Full Name: Bryan Dennison
Contact:

Re: REFS issues (server lockups, high CPU, high RAM)

Post by bryand82487 »

One thing that I don't understand about my issue is that my backup repo server locks up with high CPU usage when Veeam tries to remove backup files for a specific job per the retention policy, but I can select all those files at once from within Windows that Veeam wasn't able to remove and hit delete and while there is some high CPU usage for several minutes they all get deleted without the server locking up.
kubimike
Veteran
Posts: 391
Liked: 56 times
Joined: Feb 03, 2017 2:34 pm
Full Name: MikeO
Contact:

Re: REFS issues (server lockups, high CPU, high RAM)

Post by kubimike »

bryand82487 wrote:One thing that I don't understand about my issue is that my backup repo server locks up with high CPU usage when Veeam tries to remove backup files for a specific job per the retention policy, but I can select all those files at once from within Windows that Veeam wasn't able to remove and hit delete and while there is some high CPU usage for several minutes they all get deleted without the server locking up.
Best guess, shortly after a job runs (cloning, merging) that stuff hasn't been released from RAM yet because a check-point hasn't occured or some other issue with the refs driver. Then now comes a delete operation where it has to remove the file. That file is probably pointing to thousands of references and the OS has to weed all that out. With memory in short availability (because it just ran a job) it doesn't have enough to complete this operation. please apply the registry keys and do a active full. Let us know how it goes.

HKLM\SYSTEM\CurrentControlSet\Control\FileSystem\RefsEnableLargeWorkingSetTrim = 1
HKLM\SYSTEM\CurrentControlSet\Control\FileSystem\RefsNumberOfChunksToTrim = 32
HKLM\SYSTEM\CurrentControlSet\Control\FileSystem\RefsDisableCachedPins = 1
HKLM\SYSTEM\CurrentControlSet\Control\FileSystem\RefsProcessedDeleteQueueEntryCountThreshold = 512

veeam-backup-replication-f2/refs-4k-hor ... ml#p254231

Also how much ram does your box have? Im at 196 GIGS
bryand82487
Influencer
Posts: 14
Liked: 2 times
Joined: Dec 13, 2017 3:31 pm
Full Name: Bryan Dennison
Contact:

Re: REFS issues (server lockups, high CPU, high RAM)

Post by bryand82487 »

LOL I only have12 GB on my server but about 11 GB active memory is the highest it ever gets. I will give those registry entries a try and see what happens. I'm already using the first one but not the others. I appreciate the help!
kubimike
Veteran
Posts: 391
Liked: 56 times
Joined: Feb 03, 2017 2:34 pm
Full Name: MikeO
Contact:

Re: REFS issues (server lockups, high CPU, high RAM)

Post by kubimike »

Well I can tell you that amount isn’t going to cut it. Even with 196 gigs doing synthetic fulls on 8TB daily
I start to freeze 2-1/2 months later because I run out of resources. You’re going to need a lot more memory to use REFS. Even with the registry keys it’s still likely not going to work. Have you done an active full yet ?? Tell me about the job that causes the freeze .. how’s it configured ?

BTW
those are D E C I M A L values. Pay attention :)
bryand82487
Influencer
Posts: 14
Liked: 2 times
Joined: Dec 13, 2017 3:31 pm
Full Name: Bryan Dennison
Contact:

Re: REFS issues (server lockups, high CPU, high RAM)

Post by bryand82487 »

The only active full I've done was at the end of October when we setup the REFS backup repository. I really don't want to add more space to the repository so that I can run another active full again since we designed it with synthetic fulls in mind to prevent having to have the space to do that. The job consists of two file servers that processes 7.3 TB. When this job starts all other jobs have been completed for over an hour I'm running an incremental with a weekly synthetic full and have a backup file health check once a month. The compression is set to optimal and inline dedupe is disabled on the job. The retention is set to 30 days. The backup repository is configured to align backup file data blocks, decompress backup data blocks before storing, and to use per-VM backup files. These were the recommended best practice backup repository settings when using array side dedupe which we are. The backup repository is a Server 2016 VMware VM that has a 30 TB REFS volume that consists of four 7.5 TB luns presented from a Hybrid Nimble array to it as raw device mappings over 8GB fiber. All my backup jobs finish within 30 minutes and the synthetic fulls using fast clone take on average 2 or 3 minutes for all my jobs except the one with an issue, and it normally only takes 20-55 minutes which doesn't seem bad compared to the times a lot of people on here are seeing.
kubimike
Veteran
Posts: 391
Liked: 56 times
Joined: Feb 03, 2017 2:34 pm
Full Name: MikeO
Contact:

Re: REFS issues (server lockups, high CPU, high RAM)

Post by kubimike »

That’s really not how synthetic fulls are supposed to be used. You need space to run an active full or else you run into the condition you see now. Add more space run a full see if it stops. You need to do this ever so often anyways.
WimVD
Service Provider
Posts: 60
Liked: 19 times
Joined: Dec 23, 2014 4:04 pm
Contact:

Re: REFS issues (server lockups, high CPU, high RAM)

Post by WimVD » 7 people like this post

For those interested, I can confirm that the private fix solves all issues we had.
We mainly had issues with simultaneous merges that would bring our backups to a crawl.
We obtained the fix (refs.sys 1934) after opening a MS support case and after installation merges are no longer a problem.
Merges are fast and backups run at full speed :)
righter
Influencer
Posts: 11
Liked: 1 time
Joined: May 21, 2015 9:01 am

Re: REFS issues (server lockups, high CPU, high RAM)

Post by righter »

Can anyone share the patch from MS?
BramV
Novice
Posts: 4
Liked: never
Joined: May 11, 2015 3:10 pm
Full Name: Bram
Contact:

Re: REFS issues (server lockups, high CPU, high RAM)

Post by BramV »

Really annoying that you need to create a case at MS to receive the fix. We don't have any kind of support contract with them.
I have an upcoming merge that will probably break one of our jobs.
tdewin
Veeam Software
Posts: 1818
Liked: 655 times
Joined: Mar 02, 2012 1:40 pm
Full Name: Timothy Dewin
Contact:

Re: REFS issues (server lockups, high CPU, high RAM)

Post by tdewin »

Unfortunately, we can not share these kind of fixes as they are not our software

BTW in general, the reason why cases and patches work hand in hand is, because support can also follow up and contain issues that occur because of the hot fix.
kubimike
Veteran
Posts: 391
Liked: 56 times
Joined: Feb 03, 2017 2:34 pm
Full Name: MikeO
Contact:

Re: REFS issues (server lockups, high CPU, high RAM)

Post by kubimike » 1 person likes this post

I have a ticket with Microsoft, mind you I PAID can’t get a copy
florian.meier
Service Provider
Posts: 53
Liked: never
Joined: Dec 01, 2014 11:40 am
Contact:

Re: REFS issues (server lockups, high CPU, high RAM)

Post by florian.meier »

It's a disaster!!
I absolutely disagree with veeams behaviorally. REFS is recommended in your best practices!

Please put more pressure on microsoft. Many here need a solution, urgent.
Mike Resseler
Product Manager
Posts: 8191
Liked: 1322 times
Joined: Feb 08, 2013 3:08 pm
Full Name: Mike Resseler
Location: Belgium
Contact:

Re: REFS issues (server lockups, high CPU, high RAM)

Post by Mike Resseler »

Florian,

I can understand that you disagree with us, but believe me that we are putting as much pressure as we can. If you read the community forums digest of Anton a few weeks ago, he even participated with the PM team of ReFS where the lead dev explained all the issues (it are multiple) and how it took them quite a lot of work to find those. The solution is in RS4 already but it need to be backported to the previous version of Windows 2016 and they need to get permission for that. In the meantime there is a private fix and I honestly have no idea why some get it and some don't. I can only hope that it is in the next cumulative update.

Cheers
Mike
mark_e
Novice
Posts: 8
Liked: 2 times
Joined: Oct 10, 2016 10:13 am
Full Name: Mark Edmonds
Contact:

Re: REFS issues (server lockups, high CPU, high RAM)

Post by mark_e »

So we logged a call with MS and they said we can't have the fix because it was supplied under Premier Support. We have Professional support.
Hopefully it comes in next Tuesday's updates.
Mike Resseler
Product Manager
Posts: 8191
Liked: 1322 times
Joined: Feb 08, 2013 3:08 pm
Full Name: Mike Resseler
Location: Belgium
Contact:

Re: REFS issues (server lockups, high CPU, high RAM)

Post by Mike Resseler »

Mark,

Thanks for letting us know. This is something new for me, that you can't get a hotfix unless you are at another level of support...
And yes, I am crossing my fingers also for the next update.
florian.meier
Service Provider
Posts: 53
Liked: never
Joined: Dec 01, 2014 11:40 am
Contact:

Re: REFS issues (server lockups, high CPU, high RAM)

Post by florian.meier »

Mike,

Thank you for your response. Just to clarify, for me really annoying is, that we are MS Gold Partner and also Veeam Platinum Partner,
but we can not get the patch! We have the same problem then mark_e. Microsoft denies it because we have no Premier Support.
(As i know, premier support is about 50k-100k per year).

For us its not a "partnership" if we are on highest certification level and we cant get a really urgent fix..
Its not important if its veeams or microsoft fault. Its important that both companies tries to help their customers in any way,
especially veeam because you recommend refs in all your documents.
As we all know, there is a hotfix, so please hand it out, some customers really have serious problems.

Hope you understand my situation.
thomas.raabo
Service Provider
Posts: 28
Liked: 11 times
Joined: Oct 31, 2016 6:27 pm
Full Name: Thomas Raabo
Location: infrastructure guy
Contact:

Re: REFS issues (server lockups, high CPU, high RAM)

Post by thomas.raabo »

florian.meier wrote:Mike,

Thank you for your response. Just to clarify, for me really annoying is, that we are MS Gold Partner and also Veeam Platinum Partner,
but we can not get the patch! We have the same problem then mark_e. Microsoft denies it because we have no Premier Support.
(As i know, premier support is about 50k-100k per year).

For us its not a "partnership" if we are on highest certification level and we cant get a really urgent fix..
Its not important if its veeams or microsoft fault. Its important that both companies tries to help their customers in any way,
especially veeam because you recommend refs in all your documents.
As we all know, there is a hotfix, so please hand it out, some customers really have serious problems.

Hope you understand my situation.
Lol i totally get where you are comming from...

whats your email?
Ctek
Service Provider
Posts: 84
Liked: 13 times
Joined: Nov 11, 2015 3:50 pm
Location: Canada
Contact:

Re: REFS issues (server lockups, high CPU, high RAM)

Post by Ctek »

Anyone wants to share that hotfix? :D
VMCE
adrenaline_x
Influencer
Posts: 17
Liked: 2 times
Joined: May 03, 2016 4:24 am
Full Name: Mike Fuller
Contact:

Re: REFS issues (server lockups, high CPU, high RAM)

Post by adrenaline_x »

I got a call back from MS support today after i escalated the case as they would not give me the patch for our 3 repos. They told me that they have since pulled the patch from the premium support users now because after reboots the fix stops working and they are working to resolve it. They have added me to the parent case so i will be notified when they resolve it, but as if RIGHT NOW MS is recommending for us to stop using REFS and go back to NTFS.

wow..

So thats the rest of my day moving backup files to different servers so i can go back to NTFS.
thomas.raabo
Service Provider
Posts: 28
Liked: 11 times
Joined: Oct 31, 2016 6:27 pm
Full Name: Thomas Raabo
Location: infrastructure guy
Contact:

Re: REFS issues (server lockups, high CPU, high RAM)

Post by thomas.raabo »

I think you just got bullshit - I´m one of the first on the patch from MS and my patch have not been pulled.

And i have rebooted many times and fix is still working.
Locked

Who is online

Users browsing this forum: Google [Bot] and 320 guests