REFS 4k horror story

Availability for the Always-On Enterprise

Re: REFS 4k horror story

Veeam Logoby WimVD » Fri Mar 17, 2017 12:27 pm

@Gostev, no problem, the KB actually contains most of the information we require.
Not sure if I missed it the first time I read the KB or if it has been revised in the meantime but there is some guidance on the registry keys:

Recommendation

If a large active working set causes poor performance, first try to set RefsEnableLargeWorkingSetTrim = 1.
If this setting doesn’t produce a satisfactory result, try different values for RefsNumberOfChunksToTrim, such as 8, 16, 32, and so on.
If this still doesn’t provide the desired effect, set RefsEnableInlineTrim = 1.


For the moment I have two 80TB ReFS proxies that are running for one week now.
They have been running flawlessly but I'm patching them now as we speak and going to proactively implement option 1: RefsEnableLargeWorkingSetTrim
Will report back should I encounter any issues.
WimVD
Service Provider
 
Posts: 48
Liked: 10 times
Joined: Tue Dec 23, 2014 4:04 pm

Re: REFS 4k horror story

Veeam Logoby graham8 » Fri Mar 17, 2017 12:37 pm 3 people like this post

I'm also implementing "option 1: RefsEnableLargeWorkingSetTrim" on a copy destination. We'll see how it goes.

Also, it's worth mentioning that people might want to check out the sysinternals "rammap" utility to keep an eye on the ReFS metadata memory usage before and after implementing this (and its various options), since driver memory usage like the ReFS metadata memory mapping doesn't show up in anything conventional like task manager:
https://technet.microsoft.com/en-us/sys ... ammap.aspx

The pink "metafile" segment is what you'll want to keep an eye on - specifically the "active" portion of it. When memory exhaustion has occurred, it's been because the "active" portion of the metadata mapping grows to a point where it's taking up nearly 100% and the system becomes unresponsive.
graham8
Enthusiast
 
Posts: 47
Liked: 16 times
Joined: Wed Dec 14, 2016 1:56 pm

Re: REFS 4k horror story

Veeam Logoby WimVD » Fri Mar 17, 2017 12:41 pm

Good info Graham!
I'm going to implement the regkey on only one of my two repositories so I can compare the metadata usage with/without it.
Will report back...
WimVD
Service Provider
 
Posts: 48
Liked: 10 times
Joined: Tue Dec 23, 2014 4:04 pm

Re: REFS 4k horror story

Veeam Logoby kubimike » Fri Mar 17, 2017 12:44 pm

@Gostev you mentioned this fix was also for users of 64k size. I never experienced the high memory usage others have described in this post. If thats the case is it Microsofts recommendation to just install the update without registry tweaks ?
kubimike
Expert
 
Posts: 133
Liked: 20 times
Joined: Fri Feb 03, 2017 2:34 pm
Full Name: MikeO

Re: REFS 4k horror story

Veeam Logoby kubimike » Fri Mar 17, 2017 2:14 pm

@graham8, NICE find on the tool. Running it now, going to monitor my memory situation. I know I mentioned above I didnt have memory issues but perhaps like you said you can't see it from task manager. Thanks again!
kubimike
Expert
 
Posts: 133
Liked: 20 times
Joined: Fri Feb 03, 2017 2:34 pm
Full Name: MikeO

Re: REFS 4k horror story

Veeam Logoby WimVD » Fri Mar 17, 2017 3:43 pm 1 person likes this post

I just patched one proxy and set the RefsEnableLargeWorkingSetTrim registry key.
Tried to run a testjob to simulate some metadata updates on both proxies.
Can confirm the update and key seems to work as expected.
In rammap the patched proxy is releasing memory from active to standby while the unpatched proxy releases almost nothing.
When my findings are confirmed during a full backup cycle this evening I will implement the key on both proxies.

Quick sidenote: During my patching of the host everything installed fine.
First reboot after install was okay also but when I set the registry key and rebooted the host it kept hanging on "Preparing Windows updates"
After 40 minutes it finally rebooted and everything seemed fine. Could not reproduce it with a third reboot.
WimVD
Service Provider
 
Posts: 48
Liked: 10 times
Joined: Tue Dec 23, 2014 4:04 pm

Re: REFS 4k horror story

Veeam Logoby kubimike » Fri Mar 17, 2017 4:08 pm

@WimVD your repositories are configured with 64K ?
kubimike
Expert
 
Posts: 133
Liked: 20 times
Joined: Fri Feb 03, 2017 2:34 pm
Full Name: MikeO

Re: REFS 4k horror story

Veeam Logoby WimVD » Fri Mar 17, 2017 4:12 pm

Yes, they are
WimVD
Service Provider
 
Posts: 48
Liked: 10 times
Joined: Tue Dec 23, 2014 4:04 pm

Re: REFS 4k horror story

Veeam Logoby graham8 » Fri Mar 17, 2017 4:16 pm

Okay, I installed the update, set the "RefsEnableLargeWorkingSetTrim" registry key, rebooted, and after some time got an avalanche of errors that seemingly started with:

"Bus reset occured on storport adapter (Port Number: 2)" (from Source:StorPort)

Following that, a bunch of disks dropped out, Storage Spaces unmounted itself, and then a cascade of further errors kicked up. I rebooted the server, and everything looks normal again. This may have been a wild coincidence, but....doubtful. I'll wait and see if it reoccurs, but...yeah. Be careful, folks. For sure, don't set this on your primary servers until there's more feedback.

Incidentally, I'm using a custom view in event viewer that might be helpful. It's an easy view of everything refs/storagespaces/disk/etc related, only a small fraction of which shows up by default in the normal syslogs:

Event Viewer -> Custom Views -> Right Click -> Create Custom View...
Event Level: Critical,Error,Warning
Event logs: Microsoft-Windows-DataIntegrityScan/Admin,Microsoft-Windows-DataIntegrityScan/CrashRecovery,Microsoft-Windows-Storage-Disk/Admin,Microsoft-Windows-Storage-Disk/Operational,Microsoft-Windows-Ntfs/Operational,Microsoft-Windows-Ntfs/WHC,ReFS/Operational,Microsoft-Windows-StorageManagement/Operational,Microsoft-Windows-StorageSpaces-Driver/Diagnostic,Microsoft-Windows-StorageSpaces-Driver/Operational,Microsoft-Windows-StorageSpaces-ManagementAgent/WHC,Microsoft-Windows-StorageSpaces-SpaceManager/Diagnostic,Microsoft-Windows-Storage-ClassPnP/Admin,Microsoft-Windows-Storage-ClassPnP/Operational,Microsoft-Windows-Storage-Storport/Admin,Microsoft-Windows-Storage-Storport/Operational
graham8
Enthusiast
 
Posts: 47
Liked: 16 times
Joined: Wed Dec 14, 2016 1:56 pm

Re: REFS 4k horror story

Veeam Logoby Gostev » Sun Mar 19, 2017 7:31 pm

Hi Graham, these new registry parameters should not be related to this type of error. However, ReFS team kindly offered to look at your logs, just to be sure. Please use StorDiag to collect and package them, and PM me the download link. Thanks!
Gostev
Veeam Software
 
Posts: 21047
Liked: 2266 times
Joined: Sun Jan 01, 2006 1:01 am
Full Name: Anton Gostev

Re: REFS 4k horror story

Veeam Logoby kubimike » Mon Mar 20, 2017 2:54 am

took a quick peek at RamMap after a full synthetic ran, "Mapped File" was consuming just about all the memory on the machine. About 10 gigs active and 40 gigs in standby. Anyone else notice that??
kubimike
Expert
 
Posts: 133
Liked: 20 times
Joined: Fri Feb 03, 2017 2:34 pm
Full Name: MikeO

Re: REFS 4k horror story

Veeam Logoby j.forsythe » Mon Mar 20, 2017 7:41 am

kubimike wrote:@j.forsynthe are you also running verifier ? Sad to say my box froze today, even with verifier on it failed to create a dump file. It was just frozen at the login screen (Press CTRL + ALT + DEL to login).


@kubimike No I am not running verifier. Would it help if I would run it?

So far my two "ReFS" jobs are running fine for almost two weeks.
I will install the patch and do the registry change and report later how it is behaving.

Cheers...
j.forsythe
Influencer
 
Posts: 10
Liked: 3 times
Joined: Wed Jan 06, 2016 10:26 am
Full Name: John P. Forsythe

Re: REFS 4k horror story

Veeam Logoby WimVD » Mon Mar 20, 2017 9:57 am

kubimike wrote:took a quick peek at RamMap after a full synthetic ran, "Mapped File" was consuming just about all the memory on the machine. About 10 gigs active and 40 gigs in standby. Anyone else notice that??

Yes, noticed exactly the same but active memory is a lot lower in my case: steady around 400MB so it never worried me.
WimVD
Service Provider
 
Posts: 48
Liked: 10 times
Joined: Tue Dec 23, 2014 4:04 pm

Re: REFS 4k horror story

Veeam Logoby WimVD » Mon Mar 20, 2017 10:16 am 2 people like this post

Some further feedback comparing my patched and unpatched repository:

The metafile in rammap has been growing steadily on my unpatched host and is now around 3.2GB active.
As expected it doesn't seem to release much memory to standby.
The patched host fluctuates between 0.5GB and 1GB active and always quickly releases memory if it is rising.
So my metadata usage is low for the moment but I can see how we would run into issues in say 30 days or so on the unpatched host.

Performance does not seem to be impacted from the registry key.
Everything is stable and the fast clone is amazing: 200GB incremental merges complete in under 2 minutes :)

Our backup window has been reduced from nearly 24 hours to 7 hours.
New hardware was a big factor in this but the ReFS integration definitely helped to switch from reverse incremental to forever incremental without introducing long merges.
And together with integrity streams the ReFS integration is just too good to ignore.
After further validation in the field ReFS will definitely be my default choice for Veeam repositories :)
WimVD
Service Provider
 
Posts: 48
Liked: 10 times
Joined: Tue Dec 23, 2014 4:04 pm

Re: REFS 4k horror story

Veeam Logoby Pikok » Mon Mar 20, 2017 1:08 pm

We have recently replaced our backup server with a Server 2016. As Veeam 9.5 had just been released we decided to try out the ReFS partition and, at first, experienced much higher backup speeds due to the various new features.
We experienced speeds of 150+ Mb/s and were very satisfied. However, after a while we noticed that the backup times increased and after investigation discovered that the ReFS partition wasn't performing as well as it used to. On that same machine an NTFS partition still performs at similar speeds.

After reading how Microsoft resolved various ReFS issues with the latest update, I applied it and performed the various registry changes. However I haven't noticed any changes in speed.
I first set the RefsEnableLargeWorkingSetTrim key and didn't notice any changes. I then set the RefsNumberOfChunksToTrim key to 8 and didn't notice any changes. I've just modified the RefsEnableInlineTrim key and haven't noticed any changes.
As I'm unsure what value to set the RefsNumberOfChunksToTrim key at, I believe I can still resolve my issue through setting up that key accordingly. However I'm unsure as to how I should define that value.

The partition is formatted with a 4k allocation size and the size of the partition is 15 TB.
Pikok
Novice
 
Posts: 3
Liked: never
Joined: Tue Mar 22, 2016 7:24 am
Full Name: Peter Lemmens

PreviousNext

Return to Veeam Backup & Replication



Who is online

Users browsing this forum: Bing [Bot] and 33 guests