REFS issues (server lockups, high CPU, high RAM)

Availability for the Always-On Enterprise

Re: REFS issues (server lockups, high CPU, high RAM)

Veeam Logoby operations » Wed Jan 10, 2018 3:39 pm

Driver Version is 10.0.14393.0
operations
Service Provider
 
Posts: 7
Liked: never
Joined: Sat Nov 25, 2017 6:49 pm
Full Name: operations

Re: REFS issues (server lockups, high CPU, high RAM)

Veeam Logoby kubimike » Wed Jan 10, 2018 4:05 pm

@suprnova

I dont think any of us can really explain why the production driver won't work other then from personal experience. Merges/Synthetic Fulls/Deletes are broken and cause the machine to freeze using the production driver. Only msft has the secret sauce on how the driver works and why.
kubimike
Expert
 
Posts: 285
Liked: 29 times
Joined: Fri Feb 03, 2017 2:34 pm
Full Name: MikeO

Re: REFS issues (server lockups, high CPU, high RAM)

Veeam Logoby kubimike » Wed Jan 10, 2018 4:08 pm

thomas.raabo wrote:That will not work! contact MS and get them to help you.


Totally agree here
kubimike
Expert
 
Posts: 285
Liked: 29 times
Joined: Fri Feb 03, 2017 2:34 pm
Full Name: MikeO

Re: REFS issues (server lockups, high CPU, high RAM)

Veeam Logoby jamesmay » Thu Jan 11, 2018 12:54 am

Mike Resseler wrote:Hi James,
First: Welcome to the forums
Second: The issue that is discussed here is that ReFS becomes very unstable if there is a lot of activity on it and the size large. Not being able to boot it is not something I have heard off with this issue. Something might be related but I am not sure. Please keep working with MSFT support for now and keep us posted. Who knows this is a new problem with ReFS (I hope not though)
Mike


The 'resolution' we've got was to wipe/reinstall the OS on our Veeam server. So far so good (other than the massive amount of wasted time, and having to reconfigure all our Agents!).

It's on the January 2018-01 patch now whereas we only got up to 2017-12 before so maybe there is some improvement there, otherwise I would've expected the issue to have reappeared when we remounted the REFS volume and/or did a backup.

GarethUK wrote:James is indeed correct. This is behaviour I have observed. We have 16 backup repo servers 5 of which are 70TB REFS enabled Windows 2016 servers.


Just to confirm, you've seen complete hangs shortly after / during boot?
jamesmay
Lurker
 
Posts: 2
Liked: never
Joined: Wed Dec 13, 2017 10:04 pm

Re: REFS issues (server lockups, high CPU, high RAM)

Veeam Logoby dellock6 » Thu Jan 11, 2018 7:11 am 1 person likes this post

Gostev wrote:From what I know based on the conversation with ReFS devs, it may be possible to work around this particular bug around huge volumes by adding lots of RAM to the backup repository server. If you can't do this, then I'm afraid the only option is to fall back to NTFS until Microsoft ships that patch.


A confirmation from the field, we have observed many stable ReFS installations where the memory was from 512MB to (more likely) 1GB for each TB of backups. So, with 240TB of disk, full at say 80%, makes it 192GB, so you may need to plan to have 192GB of Memory. It sounds a bit like an overkill, but this has proved to be a good solution. Not verified nor confirmed by anyone at Microsoft, but from many working deployments we have observed.
Luca Dell'Oca
EMEA Cloud Architect @ Veeam Software

@dellock6
http://www.virtualtothecore.com
vExpert 2011-2012-2013-2014-2015-2016
Veeam VMCE #1
dellock6
Veeam Software
 
Posts: 5195
Liked: 1401 times
Joined: Sun Jul 26, 2009 3:39 pm
Location: Varese, Italy
Full Name: Luca Dell'Oca

Re: REFS issues (server lockups, high CPU, high RAM)

Veeam Logoby soehl » Thu Jan 11, 2018 8:18 am

So, with 240TB of disk, full at say 80%, makes it 192GB, so you may need to plan to have 192GB of Memory. It sounds a bit like an overkill, but this has proved to be a good solution.


I have 192GB of RAM and 200TB (netto) of Disk and massive problems with ReFS. I upgraded from 64GB to 192GB and see no difference in the ReFS behavior.
soehl
Enthusiast
 
Posts: 38
Liked: 4 times
Joined: Mon May 09, 2011 12:43 pm
Location: Germany
Full Name: Sebastian

Re: REFS issues (server lockups, high CPU, high RAM)

Veeam Logoby operations » Thu Jan 11, 2018 9:20 am

Gostev wrote:A confirmation from the field, we have observed many stable ReFS installations where the memory was from 512MB to (more likely) 1GB for each TB of backups. So, with 240TB of disk, full at say 80%, makes it 192GB, so you may need to plan to have 192GB of Memory. It sounds a bit like an overkill, but this has proved to be a good solution. Not verified nor confirmed by anyone at Microsoft, but from many working deployments we have observed.


I was running 256GB ram with 240TB drive now I am running 512GB ram with 240TB volume still have the same problem ... everything seems fine until I run a few merges and backup a couple of large servers 5Tb+ and then suddenly im down from 1.2gb/sec to 5mb/s and it never recovers until i reboot.
operations
Service Provider
 
Posts: 7
Liked: never
Joined: Sat Nov 25, 2017 6:49 pm
Full Name: operations

Re: REFS issues (server lockups, high CPU, high RAM)

Veeam Logoby GarethUK » Thu Jan 11, 2018 12:48 pm

jamesmay wrote:Just to confirm, you've seen complete hangs shortly after / during boot?


Yes I've seen this repeatedly. However, things have moved on.

I attempted to get some of the data off one of the server and experience further hangs. In sheer desperation (because I couldn't copy the data off) I've hardreset (because it was locked up) quickly removed all the REFS keys then applied

"RefsEnableLargeWorkingSetTrim"=dword:00000001

then rebooted.

This seems to have stabilised the server concerned and I was then able to start moving backup data off - this could be coincidence. I've artificially filled the disk to stop Veeam using it and will slowly move data off. However, last night it didn't lockup and appears more stable and all backups have worked.

On a different server I removed a 50TB (yeah not a good idea we have no choice) single VM backup in an attempt to change the block size from 4k to 64k - I agree it probably isn't right but has performed adequately for 6 months. I then started to copy off a small number of VM backups that were also on the repo. However, it become unresponsive very quickly. I hard rebooted and remove all refs key and applied the above key. I was then able to start copying off all the data without further issues but the RAM usage jumped up very quickly. I stopped the file copy and applied the following key.

"RefsNumberOfChunksToTrim"=dword:00000080

This appear to stabilise the memory usage (again could be coincidence) and I got all the data off. Once that was done I reformatted the disk to 64K block size and have removed the repo from the scaleout pool and dedicated it to the 50TB VM.

The full is running now at 7Gbps. RAM usage is increasing so I will have to keep an eye on it.

Regards,

Gareth
GarethUK
Influencer
 
Posts: 20
Liked: 2 times
Joined: Fri Mar 21, 2014 11:41 am
Full Name: Gareth

Re: REFS issues (server lockups, high CPU, high RAM)

Veeam Logoby Iain_Green » Thu Jan 11, 2018 1:01 pm

Hi,

Been following this a long while. We are ourselves have converted to back to NTFS, and will be staying on this for the foreseeable future.

Question Veeam - As REFS is currently a no go, is it wise to continue point users to REFS as best practice when creating a REPO as you currently do?
(I am updating to update 3 next week so apologies if this no longer is case).
Many thanks

Iain Green
Iain_Green
Service Provider
 
Posts: 89
Liked: 4 times
Joined: Fri Dec 05, 2014 2:13 pm
Full Name: Iain Green

Re: REFS issues (server lockups, high CPU, high RAM)

Veeam Logoby Gostev » Thu Jan 11, 2018 2:28 pm

Iain, absolutely - unless Microsoft ships the fix in the currently planned timelines, we will of course remove this suggestion in the next update.

Although it's not really fair to say it is completely no go for everyone, because it works well for many smaller customers, which for historical reasons B&R has a lot... the issues are quite isolated to bigger ReFS volumes and big backup files. In general, such scalability problems are pretty usual for any new technology. B&R had its own back when it was at v3, and just like ReFS today we too were usable for small customers only.
Gostev
Veeam Software
 
Posts: 21728
Liked: 2459 times
Joined: Sun Jan 01, 2006 1:01 am
Location: Baar, Switzerland

Re: REFS issues (server lockups, high CPU, high RAM)

Veeam Logoby ferrus » Thu Jan 11, 2018 3:23 pm 1 person likes this post

Not just for small customers either.
To quote my post from a few weeks back (because there's still calls for ReFS to be removed/discredited) ...

For some customers (us included), NTFS is not sufficient for our needs at the moment, and keeping the status quo brings with it as many issues as the ones currently reported with ReFS.
Some of the posts make an assumption that people are migrating from a position of stability and high performance, to something much worse. The opposite can be true.

NTFS is sufficient on four out of five of our five Veeam repositories, but the nature of the VMs being backed up on the fifth, means multiple synthetic/active fulls break the storage capacity for our current RPO strategy, and merge jobs break the backup window.

ReFS/Fast Clone - provides a solution for both of these, and if an Active Full is required occasionally to reset the performance - so be it. It's no worse than our current position, in fact much, much better.
ferrus
Veeam ProPartner
 
Posts: 161
Liked: 21 times
Joined: Thu Dec 03, 2015 3:41 pm
Location: UK

Re: REFS issues (server lockups, high CPU, high RAM)

Veeam Logoby suprnova » Thu Jan 11, 2018 4:50 pm

It's interesting it works for some small customers. I have a 10TB repository that backs up one VM (incrementals around 30GB, full backup around 5TB). As soon as the fast clone merge starts, the repository drives drops offline often causing a merge to take days. Even copying a file off this repository the speed goes from 0 to 1MBps every other second.
suprnova
Service Provider
 
Posts: 28
Liked: never
Joined: Fri Apr 08, 2016 5:15 pm

Re: REFS issues (server lockups, high CPU, high RAM)

Veeam Logoby kubimike » Thu Jan 11, 2018 5:08 pm 1 person likes this post

@supernova I have 192gigs of ram with a 48TB REPO still have issues.
kubimike
Expert
 
Posts: 285
Liked: 29 times
Joined: Fri Feb 03, 2017 2:34 pm
Full Name: MikeO

Re: REFS issues (server lockups, high CPU, high RAM)

Veeam Logoby vishal.chotoe » Thu Jan 11, 2018 8:28 pm

Can you share a link to the planned timeline?

Gostev wrote:Iain, absolutely - unless Microsoft ships the fix in the currently planned timelines, we will of course remove this suggestion in the next update.

Although it's not really fair to say it is completely no go for everyone, because it works well for many smaller customers, which for historical reasons B&R has a lot... the issues are quite isolated to bigger ReFS volumes and big backup files. In general, such scalability problems are pretty usual for any new technology. B&R had its own back when it was at v3, and just like ReFS today we too were usable for small customers only.
vishal.chotoe
Lurker
 
Posts: 1
Liked: never
Joined: Mon Dec 04, 2017 3:02 pm
Full Name: Vishal Chotoe

Re: REFS issues (server lockups, high CPU, high RAM)

Veeam Logoby thomas.raabo » Fri Jan 12, 2018 6:36 am 1 person likes this post

suprnova wrote:It's interesting it works for some small customers. I have a 10TB repository that backs up one VM (incrementals around 30GB, full backup around 5TB). As soon as the fast clone merge starts, the repository drives drops offline often causing a merge to take days. Even copying a file off this repository the speed goes from 0 to 1MBps every other second.


I think its safe to say that it does not work... maybe it works for small customers because they do fulls, maybe it works because they have disabled blockclone.

I´m 100% sure in my case! this does not work - New drivers work and are much better.

But i´m not going to touch blockclone for years! just disable it in registry and use ReFS and your golden! Then when Veeam and Microsoft gets their act together we can start to try the new stuff.
thomas.raabo
Service Provider
 
Posts: 28
Liked: 11 times
Joined: Mon Oct 31, 2016 6:27 pm
Location: infrastructure guy
Full Name: Thomas Raabo

PreviousNext

Return to Veeam Backup & Replication



Who is online

Users browsing this forum: Yahoo [Bot] and 1 guest