Comprehensive data protection for all workloads
Post Reply
dasfliege
Service Provider
Posts: 61
Liked: 16 times
Joined: Nov 17, 2014 1:48 pm
Location: Switzerland
Contact:

Re: Windows 2019, large REFS and deletes

Post by dasfliege » Jan 31, 2020 6:24 am 1 person likes this post

They are all on a single 200TB volume. It's actually a normal backup repository, not a SOBR. So there is no chance that it can be placed on a other volume by mistake.

Would appreciate to hear your feedbacks by the start of next week. I have been contacted by Microsoft ReFS Team and they want to find out what is going wrong on our system. I will update you guys, if there are any findings that could be of general interest.

Gostev
SVP, Product Management
Posts: 26119
Liked: 4066 times
Joined: Jan 01, 2006 1:01 am
Location: Baar, Switzerland
Contact:

Re: Windows 2019, large REFS and deletes

Post by Gostev » Feb 03, 2020 12:04 am

Guys, big ask!

If anyone is:
• Still in the process of migrating to Server 2019, AND
• Have some repositories on 2019 while other still on 2016, AND
• Seeing worse performance on 2019 repositories comparing to 2016 -

ReFS dev team at Microsoft really needs you to do a few memory/performance dumps to confirm their suspicions. There's a theory now that ReFS latency optimizations in Server 2019 (for VM workloads on ReFS) may have adversely affected throughput of Veeam kind of workloads, because latency and throughput are directly connected with one another. The good news is that most of those ReFS improvements are registry tweakable, so they might not even have to update the binary. But they want to confirm first by looking at the two systems (2016 and 2019) side by side in the same environment.

Thank you in advance!

mkretzer
Expert
Posts: 661
Liked: 150 times
Joined: Dec 17, 2015 7:17 am
Contact:

Re: Windows 2019, large REFS and deletes

Post by mkretzer » Feb 03, 2020 5:38 am

It would be excellent if someone could help microsoft with the info Gostev requested - we just informed them last week how big of an disaster 2019 with the new driver is for us.

Our current situation is that our primary repo is on 1903 and works absolutely perfect. Our remote copy target was on 2016 up until last week - then we read the info from MS and upgraded to 2019 in the hope we could replace our additional 1903 server.
Now its the old REFS horror story all over again: nearly all our backup copy jobs are "hanging" and time out with each copy interval.

The only way out for us is now "upgrading" this system to 1903 if microsoft can not help us soon...

dasfliege
Service Provider
Posts: 61
Liked: 16 times
Joined: Nov 17, 2014 1:48 pm
Location: Switzerland
Contact:

Re: Windows 2019, large REFS and deletes

Post by dasfliege » Feb 03, 2020 6:57 am 1 person likes this post

I have been contacted by ReFS Devs on Saturday and have already provided the requested logs and dumps. Let's see what they are able to find out. Would be nice, if it can be fixed by "just" another registry tweak!

JeremiahDDS
Service Provider
Posts: 11
Liked: 4 times
Joined: Mar 28, 2019 12:52 am
Full Name: Jeremiah Glover
Contact:

Re: Windows 2019, large REFS and deletes

Post by JeremiahDDS » Feb 03, 2020 3:10 pm

I uninstalled the 1/2020 patches it was killing my Cloud Connect server. I have 4 50TB volumes, couple hundred servers, the ReFS changes were causing 40GB of additional RAM usage, and causing jobs that were taking a few hours to take 24+ hours. I had also tried the ReFS registry changes which didn't make a difference.

bbuchan
Service Provider
Posts: 8
Liked: 5 times
Joined: May 19, 2016 3:45 pm
Full Name: Bryan Buchan
Contact:

Re: Windows 2019, large REFS and deletes

Post by bbuchan » Feb 03, 2020 6:57 pm

Do I need to reboot after applying these registry changes?

dasfliege
Service Provider
Posts: 61
Liked: 16 times
Joined: Nov 17, 2014 1:48 pm
Location: Switzerland
Contact:

Re: Windows 2019, large REFS and deletes

Post by dasfliege » Feb 04, 2020 9:48 am

Yes you have to reboot the server

DonZoomik
Expert
Posts: 184
Liked: 38 times
Joined: Nov 25, 2016 1:56 pm
Contact:

Re: Windows 2019, large REFS and deletes

Post by DonZoomik » Feb 04, 2020 3:06 pm

Code: Select all

fsutil behavior set disableDeleteNotify refs 1
I always wondered what effect it would have as I imagine that vast majority of real-world deployments are not on thin-provisioned storage or SSD-backed.
Or maybe it affects some internal processing whether reclaiming possibility should even be processed. Just thinking out loud.

mkretzer
Expert
Posts: 661
Liked: 150 times
Joined: Dec 17, 2015 7:17 am
Contact:

Re: Windows 2019, large REFS and deletes

Post by mkretzer » Feb 04, 2020 5:29 pm

We also got new registry settings from MS - which did not really help.
I am currently creating new dumps and perfmon data...

JeremiahDDS
Service Provider
Posts: 11
Liked: 4 times
Joined: Mar 28, 2019 12:52 am
Full Name: Jeremiah Glover
Contact:

Re: Windows 2019, large REFS and deletes

Post by JeremiahDDS » Feb 05, 2020 7:55 pm

So I applied the 1/2020 updates and experienced performance issues even with the registry changes. I removed the updates and the registry changes but I'm still experiencing performance issues. I was not having any of these performance issues before initially installing these updates. Anyone have any ideas?

Gostev
SVP, Product Management
Posts: 26119
Liked: 4066 times
Joined: Jan 01, 2006 1:01 am
Location: Baar, Switzerland
Contact:

Re: Windows 2019, large REFS and deletes

Post by Gostev » Feb 05, 2020 9:39 pm

Based on what you said, there's only one possibility I guess: your issues are simply not connected to the ReFS metadata processing performance. To be fair, there are probably hundreds of other reasons why a storage may be acting slow. People using ReFS tend to associate every issue they see with ReFS, however after spending 12 years at Veeam I can tell you backup repository performance issues existed well before ReFS was a thing :D

That is not to say Server 2019 LTSC does not have regressions with ReFS performance!

mkretzer
Expert
Posts: 661
Liked: 150 times
Joined: Dec 17, 2015 7:17 am
Contact:

Re: Windows 2019, large REFS and deletes

Post by mkretzer » Feb 08, 2020 1:59 am

We just got a private refs.sys and will test it on our 2019 system... Lets see if it helps.

dasfliege
Service Provider
Posts: 61
Liked: 16 times
Joined: Nov 17, 2014 1:48 pm
Location: Switzerland
Contact:

Re: Windows 2019, large REFS and deletes

Post by dasfliege » Feb 10, 2020 6:26 am

Looking forward to hear about your experience. I've provided LiveKD dumps of both, 2019 and 2016 systems last friday, but haven't received any feedback yet.

mkretzer
Expert
Posts: 661
Liked: 150 times
Joined: Dec 17, 2015 7:17 am
Contact:

Re: Windows 2019, large REFS and deletes

Post by mkretzer » Feb 10, 2020 6:31 am 2 people like this post

It looks VERY good! I don't know if it is faster than 2016 but from what i can tell after nearly 2 days it is definately much faster than 2019, no matter which driver version. I asked them if there is hope for getting this in an official update, i will keep you updated.

dasfliege
Service Provider
Posts: 61
Liked: 16 times
Joined: Nov 17, 2014 1:48 pm
Location: Switzerland
Contact:

Re: Windows 2019, large REFS and deletes

Post by dasfliege » Feb 10, 2020 6:45 am

Thats good monday morning news :-)
I may ask them if they also can provide me with that private fix to test it. I guess you're working with the same people at MS as i do, so they should know about it :-)

Gostev
SVP, Product Management
Posts: 26119
Liked: 4066 times
Joined: Jan 01, 2006 1:01 am
Location: Baar, Switzerland
Contact:

Re: Windows 2019, large REFS and deletes

Post by Gostev » Feb 10, 2020 10:36 pm

Yes, I can confirm both of you are working with the same people :)

dasfliege
Service Provider
Posts: 61
Liked: 16 times
Joined: Nov 17, 2014 1:48 pm
Location: Switzerland
Contact:

Re: Windows 2019, large REFS and deletes

Post by dasfliege » Feb 11, 2020 10:06 am

I have also been provided with the private ReFS driver. I'm pretty excited to see if it makes a big difference.
Gonna update you guys probably tomorrow.

mkretzer
Expert
Posts: 661
Liked: 150 times
Joined: Dec 17, 2015 7:17 am
Contact:

Re: Windows 2019, large REFS and deletes

Post by mkretzer » Feb 11, 2020 1:42 pm

For us it made a HUGE difference. Like day and night - maybe even a little bit faster than 1903! Did you also get the version with the signature from 07.02.2020?

DonZoomik
Expert
Posts: 184
Liked: 38 times
Joined: Nov 25, 2016 1:56 pm
Contact:

Re: Windows 2019, large REFS and deletes

Post by DonZoomik » Feb 11, 2020 2:16 pm

When would it be released to public?
When WS2016 had deduplication corruption early in it's life, it took like 2 months from private hotfix to public release, if I remember correctly.

dasfliege
Service Provider
Posts: 61
Liked: 16 times
Joined: Nov 17, 2014 1:48 pm
Location: Switzerland
Contact:

Re: Windows 2019, large REFS and deletes

Post by dasfliege » Feb 11, 2020 2:43 pm

@mkretzer
Yes it has version 10.0.17763.10000 and is dated with 07.02.2020. So far i don't see big differences in creating GFS restorepoints, but it seems like merging of primary backupfiles is a little bit faster. I guess i have to wait a little longer to make a final conclusion.

cmcgee
Lurker
Posts: 2
Liked: never
Joined: Apr 24, 2019 2:28 pm
Contact:

Re: Windows 2019, large REFS and deletes

Post by cmcgee » Feb 12, 2020 5:07 pm

Just to add to this thread- I have 2 x Win2019-1809 physical servers... 174 TB ReFS DAS storage on each for repositories - multi site with 1GB WAN. Each server has 128 GB RAM dual 4110 Xeon CPUs. ReFS volumes are 64k cluster size. Backup Copy Jobs from one site to the other for redundancy - and tape for air-gapped recent backup protection.

Been using it for 1 YR, and never made any registry tweaks or anything special before. Merging and Synth fulls were decent. Exception is a 14 TB file server VM that seemed to take 24+ hrs sometimes.

After applying 1/2020 patches and kb4534321 patch - the same big file server job is taking 3 days to complete! Also merging backup other files is frequently taking 7+ hours, when it used to take ~1 hr. Causing some headaches since this is interfering with other jobs... I opened a ticket and it was suggested to set the RefsEnableLargeWorkingSetTrim and DisableDeleteNotify.

Is this still the recommendation? Or should I uninstall the patches mentioned above? Or wait for a hotfix?

Thanks.

Mgamerz
Expert
Posts: 149
Liked: 25 times
Joined: Sep 29, 2017 8:07 pm
Contact:

Re: Windows 2019, large REFS and deletes

Post by Mgamerz » Feb 12, 2020 6:15 pm

We've recently upgraded our backup server to 2019 (which houses the repos too). I remember back in 2016 there would be issues where system would essential completely lock up (mouse would still work, apps would effectively not) when doing a synthetic merge or large delete (if i recall). Been having same issues recently, not sure if this is related to this like it's a regression to this old behavior.

dasfliege
Service Provider
Posts: 61
Liked: 16 times
Joined: Nov 17, 2014 1:48 pm
Location: Switzerland
Contact:

Re: Windows 2019, large REFS and deletes

Post by dasfliege » Feb 14, 2020 6:24 am 3 people like this post

So, having the private ReFS.sys in place now since a few days, i observed slightly better behavior in terms of stability. Jobs don't lock up completely as they did before. There is always some activity, but it's still extremely slow most of the time. Especially merges of copy jobs still take several days instead of minutes. Also, as soon as i pause the scheduled ClearFSCache script, memory consumtion starts to raise like 15-20GB per hour until everything locks up. Leaving it disabled for several hours, leads to having almost zero disk activity.

After providing several Systemdumps in different states to MS again, they just informed me that another bug has been found in the code, which is related to ReFS metadata processing. In the dumps i provided, they were able to observe operations that took 40s instead of beeing processed almost instant. That finding sounds quite promising for me.

I should receive another private driver tomorrow and will install it as soon as possible. Will give you guys another update by the start of next week.

mkretzer
Expert
Posts: 661
Liked: 150 times
Joined: Dec 17, 2015 7:17 am
Contact:

Re: Windows 2019, large REFS and deletes

Post by mkretzer » Feb 14, 2020 4:35 pm

@dasfliege: How much RAM do you have? I wonder if the new fix just requires alot of RAM to work. We have 384 Gb for ~360 TB of Storage.

I will request the new private as well :-)

FrancWest
Expert
Posts: 193
Liked: 19 times
Joined: Sep 17, 2017 3:20 am
Full Name: Franc
Contact:

Re: Windows 2019, large REFS and deletes

Post by FrancWest » Feb 14, 2020 9:37 pm

Same issue here. KB4534321 Installed, 96GB of ram but only 41% in use. reFS merges take ages, I have a GFS merge running for 51 hours and it’s progress is only at 71% currently.

mkretzer
Expert
Posts: 661
Liked: 150 times
Joined: Dec 17, 2015 7:17 am
Contact:

Re: Windows 2019, large REFS and deletes

Post by mkretzer » Feb 16, 2020 8:09 am

@dasfliege got the new private as well... One question: How much RAM do you have? We have 384 GB in the 2019 repo for ~320 TB. I wonder if you have less RAM/TB.

dasfliege
Service Provider
Posts: 61
Liked: 16 times
Joined: Nov 17, 2014 1:48 pm
Location: Switzerland
Contact:

Re: Windows 2019, large REFS and deletes

Post by dasfliege » Feb 16, 2020 8:29 am

@mkretzer
We have 192GB RAM for ~200TB repo. It's never fully saturated, but the consumption rapidly raises when the ClearFSCache script isn't running. Do you still have the script running?

mkretzer
Expert
Posts: 661
Liked: 150 times
Joined: Dec 17, 2015 7:17 am
Contact:

Re: Windows 2019, large REFS and deletes

Post by mkretzer » Feb 16, 2020 8:25 pm

Sorry für what is the "ClearFSCache" script?? :-)
Never heard of it...

dasfliege
Service Provider
Posts: 61
Liked: 16 times
Joined: Nov 17, 2014 1:48 pm
Location: Switzerland
Contact:

Re: Windows 2019, large REFS and deletes

Post by dasfliege » Feb 17, 2020 7:08 am

Haha seriously? It was a hot topic in this threat some pages earlier. It's a script that can be used to automate the RAMMap commands and can be found here: http://www.toughdev.com/content/2015/05 ... -metafile/

Actually we wouldn't have any backups since weeks, if we wouldn't run it scheduled every five minutes.It immediately reanimates disk activity when it has dropped down to zero.

DonZoomik
Expert
Posts: 184
Liked: 38 times
Joined: Nov 25, 2016 1:56 pm
Contact:

Re: Windows 2019, large REFS and deletes

Post by DonZoomik » Feb 17, 2020 7:45 am

My new repo with ~300TB disk space and 128GB of ram hit first large compact overnight (~50-60TB). It was pretty much stuck in the morning, with next to no progress. After clearing system working set in RAMMap, it started making ~25GB/s (rough estimate) progress again. When it was stuck, memory utilization was ~40-50% and ~40-50% CPU (all kernel time, on one socket/NUMA node).

Post Reply

Who is online

Users browsing this forum: Google [Bot] and 53 guests