-
- Service Provider
- Posts: 12
- Liked: never
- Joined: Nov 25, 2017 6:49 pm
- Full Name: operations
- Contact:
Re: REFS issues (server lockups, high CPU, high RAM)
Driver Version is 10.0.14393.0
-
- Veteran
- Posts: 391
- Liked: 56 times
- Joined: Feb 03, 2017 2:34 pm
- Full Name: MikeO
- Contact:
Re: REFS issues (server lockups, high CPU, high RAM)
@suprnova
I dont think any of us can really explain why the production driver won't work other then from personal experience. Merges/Synthetic Fulls/Deletes are broken and cause the machine to freeze using the production driver. Only msft has the secret sauce on how the driver works and why.
I dont think any of us can really explain why the production driver won't work other then from personal experience. Merges/Synthetic Fulls/Deletes are broken and cause the machine to freeze using the production driver. Only msft has the secret sauce on how the driver works and why.
-
- Veteran
- Posts: 391
- Liked: 56 times
- Joined: Feb 03, 2017 2:34 pm
- Full Name: MikeO
- Contact:
Re: REFS issues (server lockups, high CPU, high RAM)
Totally agree herethomas.raabo wrote: That will not work! contact MS and get them to help you.
-
- Lurker
- Posts: 2
- Liked: never
- Joined: Dec 13, 2017 10:04 pm
- Contact:
Re: REFS issues (server lockups, high CPU, high RAM)
The 'resolution' we've got was to wipe/reinstall the OS on our Veeam server. So far so good (other than the massive amount of wasted time, and having to reconfigure all our Agents!).Mike Resseler wrote:Hi James,
First: Welcome to the forums
Second: The issue that is discussed here is that ReFS becomes very unstable if there is a lot of activity on it and the size large. Not being able to boot it is not something I have heard off with this issue. Something might be related but I am not sure. Please keep working with MSFT support for now and keep us posted. Who knows this is a new problem with ReFS (I hope not though)
Mike
It's on the January 2018-01 patch now whereas we only got up to 2017-12 before so maybe there is some improvement there, otherwise I would've expected the issue to have reappeared when we remounted the REFS volume and/or did a backup.
Just to confirm, you've seen complete hangs shortly after / during boot?GarethUK wrote:James is indeed correct. This is behaviour I have observed. We have 16 backup repo servers 5 of which are 70TB REFS enabled Windows 2016 servers.
-
- VeeaMVP
- Posts: 6166
- Liked: 1971 times
- Joined: Jul 26, 2009 3:39 pm
- Full Name: Luca Dell'Oca
- Location: Varese, Italy
- Contact:
Re: REFS issues (server lockups, high CPU, high RAM)
A confirmation from the field, we have observed many stable ReFS installations where the memory was from 512MB to (more likely) 1GB for each TB of backups. So, with 240TB of disk, full at say 80%, makes it 192GB, so you may need to plan to have 192GB of Memory. It sounds a bit like an overkill, but this has proved to be a good solution. Not verified nor confirmed by anyone at Microsoft, but from many working deployments we have observed.Gostev wrote: From what I know based on the conversation with ReFS devs, it may be possible to work around this particular bug around huge volumes by adding lots of RAM to the backup repository server. If you can't do this, then I'm afraid the only option is to fall back to NTFS until Microsoft ships that patch.
Luca Dell'Oca
Principal EMEA Cloud Architect @ Veeam Software
@dellock6
https://www.virtualtothecore.com/
vExpert 2011 -> 2022
Veeam VMCE #1
Principal EMEA Cloud Architect @ Veeam Software
@dellock6
https://www.virtualtothecore.com/
vExpert 2011 -> 2022
Veeam VMCE #1
-
- Enthusiast
- Posts: 57
- Liked: 8 times
- Joined: May 09, 2011 12:43 pm
- Full Name: Sebastian
- Location: Germany
- Contact:
Re: REFS issues (server lockups, high CPU, high RAM)
I have 192GB of RAM and 200TB (netto) of Disk and massive problems with ReFS. I upgraded from 64GB to 192GB and see no difference in the ReFS behavior.So, with 240TB of disk, full at say 80%, makes it 192GB, so you may need to plan to have 192GB of Memory. It sounds a bit like an overkill, but this has proved to be a good solution.
-
- Service Provider
- Posts: 12
- Liked: never
- Joined: Nov 25, 2017 6:49 pm
- Full Name: operations
- Contact:
Re: REFS issues (server lockups, high CPU, high RAM)
I was running 256GB ram with 240TB drive now I am running 512GB ram with 240TB volume still have the same problem ... everything seems fine until I run a few merges and backup a couple of large servers 5Tb+ and then suddenly im down from 1.2gb/sec to 5mb/s and it never recovers until i reboot.Gostev wrote: A confirmation from the field, we have observed many stable ReFS installations where the memory was from 512MB to (more likely) 1GB for each TB of backups. So, with 240TB of disk, full at say 80%, makes it 192GB, so you may need to plan to have 192GB of Memory. It sounds a bit like an overkill, but this has proved to be a good solution. Not verified nor confirmed by anyone at Microsoft, but from many working deployments we have observed.
-
- Influencer
- Posts: 22
- Liked: 2 times
- Joined: Mar 21, 2014 11:41 am
- Full Name: Gareth
- Contact:
Re: REFS issues (server lockups, high CPU, high RAM)
Yes I've seen this repeatedly. However, things have moved on.jamesmay wrote: Just to confirm, you've seen complete hangs shortly after / during boot?
I attempted to get some of the data off one of the server and experience further hangs. In sheer desperation (because I couldn't copy the data off) I've hardreset (because it was locked up) quickly removed all the REFS keys then applied
"RefsEnableLargeWorkingSetTrim"=dword:00000001
then rebooted.
This seems to have stabilised the server concerned and I was then able to start moving backup data off - this could be coincidence. I've artificially filled the disk to stop Veeam using it and will slowly move data off. However, last night it didn't lockup and appears more stable and all backups have worked.
On a different server I removed a 50TB (yeah not a good idea we have no choice) single VM backup in an attempt to change the block size from 4k to 64k - I agree it probably isn't right but has performed adequately for 6 months. I then started to copy off a small number of VM backups that were also on the repo. However, it become unresponsive very quickly. I hard rebooted and remove all refs key and applied the above key. I was then able to start copying off all the data without further issues but the RAM usage jumped up very quickly. I stopped the file copy and applied the following key.
"RefsNumberOfChunksToTrim"=dword:00000080
This appear to stabilise the memory usage (again could be coincidence) and I got all the data off. Once that was done I reformatted the disk to 64K block size and have removed the repo from the scaleout pool and dedicated it to the 50TB VM.
The full is running now at 7Gbps. RAM usage is increasing so I will have to keep an eye on it.
Regards,
Gareth
-
- Service Provider
- Posts: 158
- Liked: 9 times
- Joined: Dec 05, 2014 2:13 pm
- Full Name: Iain Green
- Contact:
Re: REFS issues (server lockups, high CPU, high RAM)
Hi,
Been following this a long while. We are ourselves have converted to back to NTFS, and will be staying on this for the foreseeable future.
Question Veeam - As REFS is currently a no go, is it wise to continue point users to REFS as best practice when creating a REPO as you currently do?
(I am updating to update 3 next week so apologies if this no longer is case).
Been following this a long while. We are ourselves have converted to back to NTFS, and will be staying on this for the foreseeable future.
Question Veeam - As REFS is currently a no go, is it wise to continue point users to REFS as best practice when creating a REPO as you currently do?
(I am updating to update 3 next week so apologies if this no longer is case).
Many thanks
Iain Green
Iain Green
-
- Chief Product Officer
- Posts: 31814
- Liked: 7302 times
- Joined: Jan 01, 2006 1:01 am
- Location: Baar, Switzerland
- Contact:
Re: REFS issues (server lockups, high CPU, high RAM)
Iain, absolutely - unless Microsoft ships the fix in the currently planned timelines, we will of course remove this suggestion in the next update.
Although it's not really fair to say it is completely no go for everyone, because it works well for many smaller customers, which for historical reasons B&R has a lot... the issues are quite isolated to bigger ReFS volumes and big backup files. In general, such scalability problems are pretty usual for any new technology. B&R had its own back when it was at v3, and just like ReFS today we too were usable for small customers only.
Although it's not really fair to say it is completely no go for everyone, because it works well for many smaller customers, which for historical reasons B&R has a lot... the issues are quite isolated to bigger ReFS volumes and big backup files. In general, such scalability problems are pretty usual for any new technology. B&R had its own back when it was at v3, and just like ReFS today we too were usable for small customers only.
-
- Veeam ProPartner
- Posts: 300
- Liked: 44 times
- Joined: Dec 03, 2015 3:41 pm
- Location: UK
- Contact:
Re: REFS issues (server lockups, high CPU, high RAM)
Not just for small customers either.
To quote my post from a few weeks back (because there's still calls for ReFS to be removed/discredited) ...
To quote my post from a few weeks back (because there's still calls for ReFS to be removed/discredited) ...
For some customers (us included), NTFS is not sufficient for our needs at the moment, and keeping the status quo brings with it as many issues as the ones currently reported with ReFS.
Some of the posts make an assumption that people are migrating from a position of stability and high performance, to something much worse. The opposite can be true.
NTFS is sufficient on four out of five of our five Veeam repositories, but the nature of the VMs being backed up on the fifth, means multiple synthetic/active fulls break the storage capacity for our current RPO strategy, and merge jobs break the backup window.
ReFS/Fast Clone - provides a solution for both of these, and if an Active Full is required occasionally to reset the performance - so be it. It's no worse than our current position, in fact much, much better.
-
- Enthusiast
- Posts: 38
- Liked: never
- Joined: Apr 08, 2016 5:15 pm
- Contact:
Re: REFS issues (server lockups, high CPU, high RAM)
It's interesting it works for some small customers. I have a 10TB repository that backs up one VM (incrementals around 30GB, full backup around 5TB). As soon as the fast clone merge starts, the repository drives drops offline often causing a merge to take days. Even copying a file off this repository the speed goes from 0 to 1MBps every other second.
-
- Veteran
- Posts: 391
- Liked: 56 times
- Joined: Feb 03, 2017 2:34 pm
- Full Name: MikeO
- Contact:
Re: REFS issues (server lockups, high CPU, high RAM)
@supernova I have 192gigs of ram with a 48TB REPO still have issues.
-
- Lurker
- Posts: 2
- Liked: never
- Joined: Dec 04, 2017 3:02 pm
- Full Name: Vishal Chotoe
- Contact:
Re: REFS issues (server lockups, high CPU, high RAM)
Can you share a link to the planned timeline?
Gostev wrote:Iain, absolutely - unless Microsoft ships the fix in the currently planned timelines, we will of course remove this suggestion in the next update.
Although it's not really fair to say it is completely no go for everyone, because it works well for many smaller customers, which for historical reasons B&R has a lot... the issues are quite isolated to bigger ReFS volumes and big backup files. In general, such scalability problems are pretty usual for any new technology. B&R had its own back when it was at v3, and just like ReFS today we too were usable for small customers only.
-
- Service Provider
- Posts: 28
- Liked: 11 times
- Joined: Oct 31, 2016 6:27 pm
- Full Name: Thomas Raabo
- Location: infrastructure guy
- Contact:
Re: REFS issues (server lockups, high CPU, high RAM)
I think its safe to say that it does not work... maybe it works for small customers because they do fulls, maybe it works because they have disabled blockclone.suprnova wrote:It's interesting it works for some small customers. I have a 10TB repository that backs up one VM (incrementals around 30GB, full backup around 5TB). As soon as the fast clone merge starts, the repository drives drops offline often causing a merge to take days. Even copying a file off this repository the speed goes from 0 to 1MBps every other second.
I´m 100% sure in my case! this does not work - New drivers work and are much better.
But i´m not going to touch blockclone for years! just disable it in registry and use ReFS and your golden! Then when Veeam and Microsoft gets their act together we can start to try the new stuff.
-
- Product Manager
- Posts: 8191
- Liked: 1322 times
- Joined: Feb 08, 2013 3:08 pm
- Full Name: Mike Resseler
- Location: Belgium
- Contact:
Re: REFS issues (server lockups, high CPU, high RAM)
Hi Vishal,
First: Welcome to the forums!
Second: Unfortunately there is no link to the planned timeline. We are also waiting until the release happens
Mike
First: Welcome to the forums!
Second: Unfortunately there is no link to the planned timeline. We are also waiting until the release happens
Mike
-
- Service Provider
- Posts: 1
- Liked: 2 times
- Joined: Jan 23, 2017 9:59 am
- Full Name: Bas Noorlandt
- Contact:
Re: REFS issues (server lockups, high CPU, high RAM)
Like many we have an larger enviroment, multiple nodes up to 180TB.
We did raise an call with Microsoft and got an test version of the ReFS driver along with some reg keys. it did help our environment a lot.
But more interesting for you guys: we got an date on the official fix: 3rd week of February.
We did raise an call with Microsoft and got an test version of the ReFS driver along with some reg keys. it did help our environment a lot.
But more interesting for you guys: we got an date on the official fix: 3rd week of February.
-
- Service Provider
- Posts: 114
- Liked: 12 times
- Joined: Nov 15, 2016 6:56 pm
- Location: Cayman Islands
- Contact:
Re: REFS issues (server lockups, high CPU, high RAM)
Any chance i can get a copy of that update please?
Jason
VMCE
VMCE
-
- Service Provider
- Posts: 114
- Liked: 12 times
- Joined: Nov 15, 2016 6:56 pm
- Location: Cayman Islands
- Contact:
Re: REFS issues (server lockups, high CPU, high RAM)
NLBnoorlandt wrote:Like many we have an larger enviroment, multiple nodes up to 180TB.
We did raise an call with Microsoft and got an test version of the ReFS driver along with some reg keys. it did help our environment a lot.
But more interesting for you guys: we got an date on the official fix: 3rd week of February.
I've raised a called with MS to get the refs3.3 driver, current windows build has it at 3.1. out of interest did your system work at all before? Our through put drops to a terrible level and causes delays plus lose of RPO's.
The fix that is coming in the 3rd week of Feb, would that mean we'd have to rebuild the Repos OS ? or simply patch it and it'll be upgraded?
Jason
Jason
VMCE
VMCE
-
- Novice
- Posts: 4
- Liked: never
- Joined: Aug 26, 2011 8:24 am
- Full Name: Neil Flanagan
- Contact:
Re: REFS issues (server lockups, high CPU, high RAM)
KB4057142 (https://support.microsoft.com/en-us/help/4057142) now available from Windows Update and Microsoft Catalog (but not it seems on WSUS) includes Refs.sys 10.0.14393.2035 dated 11 January 2018.
The notes say "Addresses synchronization issue where backing up large Resilient File System (ReFS) volumes may lead to errors 0xc2 and 7E." There is no word on setting Registry keys.
Is this what everyone is waiting for?
The notes say "Addresses synchronization issue where backing up large Resilient File System (ReFS) volumes may lead to errors 0xc2 and 7E." There is no word on setting Registry keys.
Is this what everyone is waiting for?
-
- Novice
- Posts: 6
- Liked: never
- Joined: Dec 14, 2017 1:47 pm
- Full Name: Alexander Eriksson
- Contact:
Re: REFS issues (server lockups, high CPU, high RAM)
Hi,
Next week, we are about to create our new Veeam Backup Repos. (Moving from MS DPM to Veeam)
Should we go with REFS and disable block clones (does this work?) to be prepared for the fix or should we go with NTFS.
We are about to implement 2 x 150TB usable disk RAID60 repos for about 1000 VM backups.
Regards
Alexander
Next week, we are about to create our new Veeam Backup Repos. (Moving from MS DPM to Veeam)
Should we go with REFS and disable block clones (does this work?) to be prepared for the fix or should we go with NTFS.
We are about to implement 2 x 150TB usable disk RAID60 repos for about 1000 VM backups.
Regards
Alexander
-
- Service Provider
- Posts: 454
- Liked: 86 times
- Joined: Jun 09, 2015 7:08 pm
- Full Name: JaySt
- Contact:
Re: REFS issues (server lockups, high CPU, high RAM)
i hope you'll get more votes besides mine, but i'd go the way you suggest. Choose ReFS and disable fast clone for the time being. I feel the fix is too closeby to completely forget about ReFS at this time. I still think the potential of ReFS and the advantages of fast clone once fixed are still interesting enough to consider.
But hey... i get why trusting ReFS is hard at this time, it's a bit of a gamble.
But hey... i get why trusting ReFS is hard at this time, it's a bit of a gamble.
Veeam Certified Engineer
-
- Novice
- Posts: 5
- Liked: 2 times
- Joined: Jan 22, 2015 1:29 pm
- Full Name: Eric Doyen
- Contact:
Re: REFS issues (server lockups, high CPU, high RAM)
MS Dev suggested to us 10GB ram for every 1TB of space used in the REFS repo. If you have enough RAM to spare, go REFS, but follow the best practice guide.aleeri wrote:Hi,
Next week, we are about to create our new Veeam Backup Repos. (Moving from MS DPM to Veeam)
Should we go with REFS and disable block clones (does this work?) to be prepared for the fix or should we go with NTFS.
We are about to implement 2 x 150TB usable disk RAID60 repos for about 1000 VM backups.
Regards
Alexander
In our case with the REFS deletes (they called this a delete storm), the process of flushing the metadata to disk cannot keep up, so the driver queues the metadata writes in RAM so as not to slow down the process. However, there is no check to ensure that all available RAM is not used up, and has caused our repo to lock.
Prior to increasing ram on our repo, MS had us test a plethora of registry changes, some of which haven't been mentioned in this forum. However, these registry changes disable certain optimizations that REFS provides, and could heavily affect the performance of REFS that we have come to appreciate. In the end, none of the registry tweaks fixed our issue. Only by quadrupling our RAM on the repo, with the registry tweaks in place, were we able to get the delete storm to finish.
We have an 18 TB repo, and had only 16GB ram. With private rix refs.sys, and registry tweaks enabled, and RAM increased to 64 GB, the delete storm finished and climaxed at 44GB RAM used, then fell back to normal expected levels.
RAM is expensive. I would rather there be healthier memory management without crippling REFS optimizations. I'm not a dev, so I'm sure it's easier said than done.
-
- Service Provider
- Posts: 12
- Liked: never
- Joined: Nov 25, 2017 6:49 pm
- Full Name: operations
- Contact:
Re: REFS issues (server lockups, high CPU, high RAM)
For me that would require I had 2.4TB ram
I do not see excessive RAM usage but merges still take days to finish along with backup dropping from 1gb/s to 5mbs even if there is only one VM being backed up so there must be more to it than RAM.
I do not see excessive RAM usage but merges still take days to finish along with backup dropping from 1gb/s to 5mbs even if there is only one VM being backed up so there must be more to it than RAM.
-
- Veteran
- Posts: 391
- Liked: 56 times
- Joined: Feb 03, 2017 2:34 pm
- Full Name: MikeO
- Contact:
Re: REFS issues (server lockups, high CPU, high RAM)
All my problems were solved with '10.0.14393.1934'. It should be out soon, synthetic fulls went from 3 hours to 30 mins, the entire disk subsystem is much faster. Even tape backups have benefited a bump in speed. Let's not forget I Veeam can now prune old backups without freezing. Thats probably the biggest bonus.
-
- Enthusiast
- Posts: 33
- Liked: 2 times
- Joined: May 05, 2017 3:06 pm
- Full Name: JP
- Contact:
Re: REFS issues (server lockups, high CPU, high RAM)
I believe the correct version as previously mentioned is 10.0.14939.1934, can you confirm?kubimike wrote:All my problems were solved with '10.0.14393.1934'. It should be out soon, synthetic fulls went from 3 hours to 30 mins, the entire disk subsystem is much faster. Even tape backups have benefited a bump in speed. Let's not forget I Veeam can now prune old backups without freezing. Thats probably the biggest bonus.
I don't think so, I think the refs.sys has to be 10.0.14939.1934 which is higher. Somebody posted earlier that Microsoft said the fix would be coming in February. However, based on the verbiage, it sounds like that CU might be a good one to have anyway unless it makes things worse.Neil Flanagan wrote:KB4057142 (https://support.microsoft.com/en-us/help/4057142) now available from Windows Update and Microsoft Catalog (but not it seems on WSUS) includes Refs.sys 10.0.14393.2035 dated 11 January 2018.
The notes say "Addresses synchronization issue where backing up large Resilient File System (ReFS) volumes may lead to errors 0xc2 and 7E." There is no word on setting Registry keys.
Is this what everyone is waiting for?
-
- Enthusiast
- Posts: 26
- Liked: 12 times
- Joined: Jan 30, 2017 7:42 pm
- Full Name: Sam
- Contact:
Re: REFS issues (server lockups, high CPU, high RAM)
Is this the new test patch due out in a month or two? I've waited all of 2017 for a real fix, but management is planning on making us flip back to NTFS this month. I'd hate to lose ReFS after all the work and horror I've gone through, but I'm hesitate to keep waiting for just one more patch.kubimike wrote:All my problems were solved with '10.0.14393.1934'. It should be out soon, synthetic fulls went from 3 hours to 30 mins, the entire disk subsystem is much faster. Even tape backups have benefited a bump in speed. Let's not forget I Veeam can now prune old backups without freezing. Thats probably the biggest bonus.
-
- Novice
- Posts: 4
- Liked: never
- Joined: Aug 26, 2011 8:24 am
- Full Name: Neil Flanagan
- Contact:
Re: REFS issues (server lockups, high CPU, high RAM)
10.0.14939.xxxx is not a build number for a current release of Windows Server 2016. All Windows Server 2016 files start 10.0.14393, so to my mind the refs.sys 10.14393.2035 put on general release on 17 January in KB4057142 must be the very latest and supersede 10.0.14393.1934.
It is not unusual for Microsoft to produce a bug-fix only, manual-download, cumulative update for Server 2016 a couple of weeks before the complete general and security-fix update on Patch Tuesday, so this could be the fix that is expected to appear on automatic update in February.
It is not unusual for Microsoft to produce a bug-fix only, manual-download, cumulative update for Server 2016 a couple of weeks before the complete general and security-fix update on Patch Tuesday, so this could be the fix that is expected to appear on automatic update in February.
-
- Veteran
- Posts: 391
- Liked: 56 times
- Joined: Feb 03, 2017 2:34 pm
- Full Name: MikeO
- Contact:
Re: REFS issues (server lockups, high CPU, high RAM)
10.0.14939.1934 must have been a typeo by thomas. I have version 10.0.14393.1934
-
- Veteran
- Posts: 391
- Liked: 56 times
- Joined: Feb 03, 2017 2:34 pm
- Full Name: MikeO
- Contact:
Re: REFS issues (server lockups, high CPU, high RAM)
If thats true I wonder if they did away with the registry key tweaks as there is no mention of using them. I have not tested .2035 as of yet.Neil Flanagan wrote:10.0.14939.xxxx is not a build number for a current release of Windows Server 2016. All Windows Server 2016 files start 10.0.14393, so to my mind the refs.sys 10.14393.2035 put on general release on 17 January in KB4057142 must be the very latest and supersede 10.0.14393.1934.
It is not unusual for Microsoft to produce a bug-fix only, manual-download, cumulative update for Server 2016 a couple of weeks before the complete general and security-fix update on Patch Tuesday, so this could be the fix that is expected to appear on automatic update in February.
Who is online
Users browsing this forum: No registered users and 68 guests