REFS issues (server lockups, high CPU, high RAM)

Availability for the Always-On Enterprise

Re: REFS issues (server lockups, high CPU, high RAM)

Veeam Logoby Mike Resseler » Fri Jan 12, 2018 7:04 am

Hi Vishal,

First: Welcome to the forums!
Second: Unfortunately there is no link to the planned timeline. We are also waiting until the release happens

Mike
Mike Resseler
Veeam Software
 
Posts: 3649
Liked: 402 times
Joined: Fri Feb 08, 2013 3:08 pm
Location: Belgium, the land of the fries, the beer, the chocolate and the diamonds...
Full Name: Mike Resseler

Re: REFS issues (server lockups, high CPU, high RAM)

Veeam Logoby NLBnoorlandt » Mon Jan 15, 2018 9:12 am 2 people like this post

Like many we have an larger enviroment, multiple nodes up to 180TB.

We did raise an call with Microsoft and got an test version of the ReFS driver along with some reg keys. it did help our environment a lot.

But more interesting for you guys: we got an date on the official fix: 3rd week of February.
NLBnoorlandt
Service Provider
 
Posts: 1
Liked: 2 times
Joined: Mon Jan 23, 2017 9:59 am
Full Name: Bas Noorlandt

Re: REFS issues (server lockups, high CPU, high RAM)

Veeam Logoby jayscarff » Mon Jan 15, 2018 10:29 pm

Any chance i can get a copy of that update please?
jayscarff
Lurker
 
Posts: 2
Liked: never
Joined: Tue Nov 15, 2016 6:56 pm

Re: REFS issues (server lockups, high CPU, high RAM)

Veeam Logoby jayscarff » Tue Jan 16, 2018 5:31 pm

NLBnoorlandt wrote:Like many we have an larger enviroment, multiple nodes up to 180TB.

We did raise an call with Microsoft and got an test version of the ReFS driver along with some reg keys. it did help our environment a lot.

But more interesting for you guys: we got an date on the official fix: 3rd week of February.



I've raised a called with MS to get the refs3.3 driver, current windows build has it at 3.1. out of interest did your system work at all before? Our through put drops to a terrible level and causes delays plus lose of RPO's.
The fix that is coming in the 3rd week of Feb, would that mean we'd have to rebuild the Repos OS ? or simply patch it and it'll be upgraded?

Jason
jayscarff
Lurker
 
Posts: 2
Liked: never
Joined: Tue Nov 15, 2016 6:56 pm

Re: REFS issues (server lockups, high CPU, high RAM)

Veeam Logoby Neil Flanagan » Thu Jan 18, 2018 9:18 am

KB4057142 (https://support.microsoft.com/en-us/help/4057142) now available from Windows Update and Microsoft Catalog (but not it seems on WSUS) includes Refs.sys 10.0.14393.2035 dated 11 January 2018.
The notes say "Addresses synchronization issue where backing up large Resilient File System (ReFS) volumes may lead to errors 0xc2 and 7E." There is no word on setting Registry keys.

Is this what everyone is waiting for?
Neil Flanagan
Lurker
 
Posts: 2
Liked: never
Joined: Fri Aug 26, 2011 8:24 am
Full Name: Neil Flanagan

Re: REFS issues (server lockups, high CPU, high RAM)

Veeam Logoby aleeri » Thu Jan 18, 2018 10:02 am

Hi,

Next week, we are about to create our new Veeam Backup Repos. (Moving from MS DPM to Veeam)
Should we go with REFS and disable block clones (does this work?) to be prepared for the fix or should we go with NTFS.

We are about to implement 2 x 150TB usable disk RAID60 repos for about 1000 VM backups.

Regards
Alexander
aleeri
Novice
 
Posts: 4
Liked: never
Joined: Thu Dec 14, 2017 1:47 pm
Full Name: Alexander Eriksson

Re: REFS issues (server lockups, high CPU, high RAM)

Veeam Logoby JaySt » Thu Jan 18, 2018 10:17 am

i hope you'll get more votes besides mine, but i'd go the way you suggest. Choose ReFS and disable fast clone for the time being. I feel the fix is too closeby to completely forget about ReFS at this time. I still think the potential of ReFS and the advantages of fast clone once fixed are still interesting enough to consider.
But hey... i get why trusting ReFS is hard at this time, it's a bit of a gamble.
Veeam Certified Engineer
JaySt
Service Provider
 
Posts: 73
Liked: 16 times
Joined: Tue Jun 09, 2015 7:08 pm
Full Name: JaySt

Re: REFS issues (server lockups, high CPU, high RAM)

Veeam Logoby edoyen » Thu Jan 18, 2018 2:19 pm

aleeri wrote:Hi,

Next week, we are about to create our new Veeam Backup Repos. (Moving from MS DPM to Veeam)
Should we go with REFS and disable block clones (does this work?) to be prepared for the fix or should we go with NTFS.

We are about to implement 2 x 150TB usable disk RAID60 repos for about 1000 VM backups.

Regards
Alexander


MS Dev suggested to us 10GB ram for every 1TB of space used in the REFS repo. If you have enough RAM to spare, go REFS, but follow the best practice guide.

In our case with the REFS deletes (they called this a delete storm), the process of flushing the metadata to disk cannot keep up, so the driver queues the metadata writes in RAM so as not to slow down the process. However, there is no check to ensure that all available RAM is not used up, and has caused our repo to lock.

Prior to increasing ram on our repo, MS had us test a plethora of registry changes, some of which haven't been mentioned in this forum. However, these registry changes disable certain optimizations that REFS provides, and could heavily affect the performance of REFS that we have come to appreciate. In the end, none of the registry tweaks fixed our issue. Only by quadrupling our RAM on the repo, with the registry tweaks in place, were we able to get the delete storm to finish.

We have an 18 TB repo, and had only 16GB ram. With private rix refs.sys, and registry tweaks enabled, and RAM increased to 64 GB, the delete storm finished and climaxed at 44GB RAM used, then fell back to normal expected levels.

RAM is expensive. I would rather there be healthier memory management without crippling REFS optimizations. I'm not a dev, so I'm sure it's easier said than done.
edoyen
Novice
 
Posts: 5
Liked: 2 times
Joined: Thu Jan 22, 2015 1:29 pm
Full Name: Eric Doyen

Re: REFS issues (server lockups, high CPU, high RAM)

Veeam Logoby operations » Thu Jan 18, 2018 2:47 pm

For me that would require I had 2.4TB ram :)

I do not see excessive RAM usage but merges still take days to finish along with backup dropping from 1gb/s to 5mbs even if there is only one VM being backed up so there must be more to it than RAM.
operations
Service Provider
 
Posts: 7
Liked: never
Joined: Sat Nov 25, 2017 6:49 pm
Full Name: operations

Re: REFS issues (server lockups, high CPU, high RAM)

Veeam Logoby kubimike » Thu Jan 18, 2018 9:12 pm 1 person likes this post

All my problems were solved with '10.0.14393.1934'. It should be out soon, synthetic fulls went from 3 hours to 30 mins, the entire disk subsystem is much faster. Even tape backups have benefited a bump in speed. Let's not forget I Veeam can now prune old backups without freezing. Thats probably the biggest bonus.
kubimike
Expert
 
Posts: 285
Liked: 29 times
Joined: Fri Feb 03, 2017 2:34 pm
Full Name: MikeO

Re: REFS issues (server lockups, high CPU, high RAM)

Veeam Logoby veeeammeupscotty » Thu Jan 18, 2018 10:39 pm

kubimike wrote:All my problems were solved with '10.0.14393.1934'. It should be out soon, synthetic fulls went from 3 hours to 30 mins, the entire disk subsystem is much faster. Even tape backups have benefited a bump in speed. Let's not forget I Veeam can now prune old backups without freezing. Thats probably the biggest bonus.

I believe the correct version as previously mentioned is 10.0.14939.1934, can you confirm?

Neil Flanagan wrote:KB4057142 (https://support.microsoft.com/en-us/help/4057142) now available from Windows Update and Microsoft Catalog (but not it seems on WSUS) includes Refs.sys 10.0.14393.2035 dated 11 January 2018.
The notes say "Addresses synchronization issue where backing up large Resilient File System (ReFS) volumes may lead to errors 0xc2 and 7E." There is no word on setting Registry keys.

Is this what everyone is waiting for?


I don't think so, I think the refs.sys has to be 10.0.14939.1934 which is higher. Somebody posted earlier that Microsoft said the fix would be coming in February. However, based on the verbiage, it sounds like that CU might be a good one to have anyway unless it makes things worse.
veeeammeupscotty
Influencer
 
Posts: 14
Liked: never
Joined: Fri May 05, 2017 3:06 pm
Full Name: JP

Re: REFS issues (server lockups, high CPU, high RAM)

Veeam Logoby Cicadymn » Thu Jan 18, 2018 11:13 pm

kubimike wrote:All my problems were solved with '10.0.14393.1934'. It should be out soon, synthetic fulls went from 3 hours to 30 mins, the entire disk subsystem is much faster. Even tape backups have benefited a bump in speed. Let's not forget I Veeam can now prune old backups without freezing. Thats probably the biggest bonus.


Is this the new test patch due out in a month or two? I've waited all of 2017 for a real fix, but management is planning on making us flip back to NTFS this month. I'd hate to lose ReFS after all the work and horror I've gone through, but I'm hesitate to keep waiting for just one more patch.
Cicadymn
Influencer
 
Posts: 22
Liked: 5 times
Joined: Mon Jan 30, 2017 7:42 pm
Full Name: Sam

Re: REFS issues (server lockups, high CPU, high RAM)

Veeam Logoby Neil Flanagan » Fri Jan 19, 2018 11:34 am

10.0.14939.xxxx is not a build number for a current release of Windows Server 2016. All Windows Server 2016 files start 10.0.14393, so to my mind the refs.sys 10.14393.2035 put on general release on 17 January in KB4057142 must be the very latest and supersede 10.0.14393.1934.
It is not unusual for Microsoft to produce a bug-fix only, manual-download, cumulative update for Server 2016 a couple of weeks before the complete general and security-fix update on Patch Tuesday, so this could be the fix that is expected to appear on automatic update in February.
Neil Flanagan
Lurker
 
Posts: 2
Liked: never
Joined: Fri Aug 26, 2011 8:24 am
Full Name: Neil Flanagan

Re: REFS issues (server lockups, high CPU, high RAM)

Veeam Logoby kubimike » Fri Jan 19, 2018 2:00 pm

10.0.14939.1934 must have been a typeo by thomas. I have version 10.0.14393.1934
kubimike
Expert
 
Posts: 285
Liked: 29 times
Joined: Fri Feb 03, 2017 2:34 pm
Full Name: MikeO

Re: REFS issues (server lockups, high CPU, high RAM)

Veeam Logoby kubimike » Fri Jan 19, 2018 2:02 pm

Neil Flanagan wrote:10.0.14939.xxxx is not a build number for a current release of Windows Server 2016. All Windows Server 2016 files start 10.0.14393, so to my mind the refs.sys 10.14393.2035 put on general release on 17 January in KB4057142 must be the very latest and supersede 10.0.14393.1934.
It is not unusual for Microsoft to produce a bug-fix only, manual-download, cumulative update for Server 2016 a couple of weeks before the complete general and security-fix update on Patch Tuesday, so this could be the fix that is expected to appear on automatic update in February.


If thats true I wonder if they did away with the registry key tweaks as there is no mention of using them. I have not tested .2035 as of yet.
kubimike
Expert
 
Posts: 285
Liked: 29 times
Joined: Fri Feb 03, 2017 2:34 pm
Full Name: MikeO

PreviousNext

Return to Veeam Backup & Replication



Who is online

Users browsing this forum: s_t and 1 guest