REFS 4k horror story

Availability for the Always-On Enterprise

Re: REFS 4k horror story

Veeam Logoby Cicadymn » Thu Jun 15, 2017 7:11 pm

kubimike wrote:@cicadymn are you able to tell if the job is finishing but simply hanging when deleting the older restore points? My issue is deleting large restore points, 4TB+

The job hangs at whatever % the job was when the server locks up at 99% CPU. Interesting enough after a hard reboot and if I leave it alone. The job may increase a few % until it locks back out again. They've never completed however, eventually failing. Followed by me disabling them to try to get the server stable again. This always locks up during the creating synthetic full stage.
Cicadymn
Influencer
 
Posts: 20
Liked: 5 times
Joined: Mon Jan 30, 2017 7:42 pm
Full Name: Sam

Re: REFS 4k horror story

Veeam Logoby kubimike » Thu Jun 15, 2017 7:21 pm

How much ram do you have ?I had 16 but when to 192GB just to be safe.
kubimike
Expert
 
Posts: 243
Liked: 23 times
Joined: Fri Feb 03, 2017 2:34 pm
Full Name: MikeO

Re: REFS 4k horror story

Veeam Logoby Cicadymn » Thu Jun 15, 2017 8:02 pm

32GB on this one. Which should be overkill for what it's processing.
Cicadymn
Influencer
 
Posts: 20
Liked: 5 times
Joined: Mon Jan 30, 2017 7:42 pm
Full Name: Sam

Re: REFS 4k horror story

Veeam Logoby mkretzer » Thu Jun 15, 2017 8:41 pm

How much do you process? We see RAM usage peaks of 100+ GB when there is a merge of a 3 TB backup file. Our system only stopped crashing since we increased to 384 GB.
mkretzer
Expert
 
Posts: 337
Liked: 74 times
Joined: Thu Dec 17, 2015 7:17 am

Re: REFS 4k horror story

Veeam Logoby Cicadymn » Fri Jun 16, 2017 2:24 pm

WOW! That's a lot of ram! Our primary backup host had the RAM issue, but for some reason our backup copy host is having it max out our CPU with RAM sticking at around 50% usage during the ordeal.

When it locks out it's usually processing two jobs, merging backup files that are both 4.5TB.
Cicadymn
Influencer
 
Posts: 20
Liked: 5 times
Joined: Mon Jan 30, 2017 7:42 pm
Full Name: Sam

Re: REFS 4k horror story

Veeam Logoby mkretzer » Sun Jun 18, 2017 9:16 am

Here you can see how the RAM is used when there is a bigger merge going:

http://imgur.com/GtfjmH8

The problem is that when the RAM went down to this point all of the following merges, backups and so on get extremly slow...
mkretzer
Expert
 
Posts: 337
Liked: 74 times
Joined: Thu Dec 17, 2015 7:17 am

Re: REFS 4k horror story

Veeam Logoby Hauke » Sun Jun 18, 2017 10:29 am

My issues are now solved, I give ReFS a second choice.
I temporary moved all data to another NAS, deleted the ReFS volume, recreated it (with 64k of course), moved the data back to ReFS, did an active full to activate fast clone again - and now the issues are gone.
Speed is back to normal, full 2GBit, (thanks to Veeam that it is able to fully load LACP etherchannels), no memory or CPU issues.
Lets see if it's working now for a longer time.
First ReFS was directly created with a stock 2016 server, now it is fully patched and created than - maybe that's a difference.

The ReFS is powered by a SuperMicro Board with 8 core Intel Xeon, and only 24GB RAM, Areca 1880 RAID controller, 16 Bay box, 10x WD Red 6TB SATA Raid6.
(yep, I now that you will cry because of that low memory, but it went very well for the first 2 month without any issues - and common, having more than 100GB RAM to keep a FS stable is crazy)
Hauke
Novice
 
Posts: 8
Liked: 1 time
Joined: Thu Apr 16, 2015 11:25 am
Full Name: Hauke Ihnen

Re: REFS 4k horror story

Veeam Logoby thomas.raabo » Tue Jun 20, 2017 8:17 am 1 person likes this post

lepphce1 wrote:I also checked versions and realized that I have an older version of ReFS that was included in the April 2017 Roll-up (10.0.14393.953). The ReFS version released yesterday (June 2017) is the same as released in May 2017 as mentioned earlier. I'm about to update my servers to the June 2017 release but I am wondering if anybody noticed any improvements after the May 2017 update.

As a side, I've had a ticket open with Microsoft for some time now, and frankly they've been less than helpful / responsive. They seemed to immediately know what was going on after I submitted my crash dump but have been unwilling to provide any information up to this point. After having my main Veeam server stuck in a boot loop all weekend, I'm close to bypassing MS for now and reaching out to Veeam support for the experimental fix. FWIW, I've not had CPU or memory issues. But server goes onto these "reboot fits" it seems when there is some kind of block cloning operation (as others have said, perhaps when it is doing a massive amount of deletes).


The FIX does work in our system.

We had issues every day before applying the exp patch - and now our system is running 100% as expected.

The new ReFS file is the right way to go.

ReFS.sys - build 14939.1100
thomas.raabo
Service Provider
 
Posts: 14
Liked: 5 times
Joined: Mon Oct 31, 2016 6:27 pm
Location: infrastructure guy
Full Name: Thomas Raabo

Re: REFS 4k horror story

Veeam Logoby mkretzer » Tue Jun 20, 2017 8:49 am

@thomas.raabo
Did it also fix the merge performance issues or did you not have any issues like that?
mkretzer
Expert
 
Posts: 337
Liked: 74 times
Joined: Thu Dec 17, 2015 7:17 am

Re: REFS 4k horror story

Veeam Logoby thomas.raabo » Tue Jun 20, 2017 12:34 pm

mkretzer wrote:@thomas.raabo
Did it also fix the merge performance issues or did you not have any issues like that?


That is also fixed.

Yes had that one also.
thomas.raabo
Service Provider
 
Posts: 14
Liked: 5 times
Joined: Mon Oct 31, 2016 6:27 pm
Location: infrastructure guy
Full Name: Thomas Raabo

Re: REFS 4k horror story

Veeam Logoby nmdange » Tue Jun 20, 2017 1:43 pm

thomas.raabo wrote:The FIX does work in our system.

We had issues every day before applying the exp patch - and now our system is running 100% as expected.

The new ReFS file is the right way to go.

ReFS.sys - build 14939.1100


Just thought I'd mention that I just applied the June cumulative update and Refs.sys on my systems is at build 14939.1198 dated 4/27/2017. I wonder if that means the "experimental" fix has been included in the June updates? I've been holding off on moving my backup repositories to ReFS but maybe it's finally ready :)
nmdange
Expert
 
Posts: 214
Liked: 59 times
Joined: Thu Aug 20, 2015 9:30 pm

Re: REFS 4k horror story

Veeam Logoby kubimike » Tue Jun 20, 2017 2:26 pm

@nmdange I have the 4/27/2017 file on another fully patched windows 2016 box. The version is 14393.1198 .. Is that a type-o ?

The experimental file from msft has version 14393.1100
kubimike
Expert
 
Posts: 243
Liked: 23 times
Joined: Fri Feb 03, 2017 2:34 pm
Full Name: MikeO

Re: REFS 4k horror story

Veeam Logoby nmdange » Tue Jun 20, 2017 2:54 pm

1198 is higher than 1100 so that's why I'm asking. Usually higher build numbers include fixes from lower build numbers.
nmdange
Expert
 
Posts: 214
Liked: 59 times
Joined: Thu Aug 20, 2015 9:30 pm

Re: REFS 4k horror story

Veeam Logoby kubimike » Tue Jun 20, 2017 3:17 pm

you mentioned you applied the June update and your file version is now 14939 which I find unusual.
kubimike
Expert
 
Posts: 243
Liked: 23 times
Joined: Fri Feb 03, 2017 2:34 pm
Full Name: MikeO

Re: REFS 4k horror story

Veeam Logoby nmdange » Tue Jun 20, 2017 3:31 pm

Sorry yes that was a typo, I was just looking at the last 4 digits :)
nmdange
Expert
 
Posts: 214
Liked: 59 times
Joined: Thu Aug 20, 2015 9:30 pm

PreviousNext

Return to Veeam Backup & Replication



Who is online

Users browsing this forum: Bing [Bot], Google [Bot], nmdange, PTide, vmJoe and 101 guests