REFS 4k horror story

Availability for the Always-On Enterprise

Re: REFS 4k horror story

Veeam Logoby Gostev » Mon Mar 20, 2017 1:25 pm

@Peter the discussed fix and registry keys are not supposed to change the performance in any way. All they do is prevent ReFS from consuming all available memory with its metadata cache - the problem that eventually resulted in the server hosting ReFS volume "locking up", becoming unresponsive for extended time periods.
Gostev
Veeam Software
 
Posts: 21517
Liked: 2383 times
Joined: Sun Jan 01, 2006 1:01 am
Location: Baar, Switzerland

Re: REFS 4k horror story

Veeam Logoby graham8 » Mon Mar 20, 2017 1:31 pm

Update on the post-patch issue I had - it's only been a few days of course, but so far, I haven't had the disastrous issue again which I posted about previously. I got logs over to Gostev to forward to the ReFS team. I'll keep everyone updated.

@Pikok re performance:

I've noticed the same issue. Backup performance has gotten terribly slow...in initial tests it was easily doing 200-300MB/s, now it's ranging from 0.25MB-1MB (yes, seriously). I haven't been worrying about it as much since I've been more concerned about general stability, but it's something else I'll have to deal with at some point.

I see a lot of figures like "Data read: 100GB ... Transferred: 3GB with Dedupe values always being 1.0x, adn Compression ranging from ~1.5 - ~3.5". Not sure what's up with the huge Read vs Transferred discrepancy, but even if the backup amount was 100GB, the backup transfer rate would still be horribly slow over the 1-2 hours it generally runs.

Anyway, tangent, sorry - there are probably other threads for performance issues. But no, you're not alone in having issues there.

EDIT: Actually, I don't see any threads jumping out at me relating to performance degradation with ReFS on the forums here... unless I'm missing something, maybe I should call support and have them review these logs to make sure I'm not misunderstanding something, and then start a new thread...
graham8
Enthusiast
 
Posts: 59
Liked: 20 times
Joined: Wed Dec 14, 2016 1:56 pm

Re: REFS 4k horror story

Veeam Logoby SyNtAxx » Mon Mar 20, 2017 1:41 pm

I've been following the thread and there now seems to be a solution. I have a few questions.

1) Is the recommendation still to use 64k cluster sizes (my proxy server has 384gb ram)?
2) Is a ReFS repo fast enough to use as a primary tier of storage for backups, supporting Instant Restore, etc?
3) Any other pointers or best practices (pardon pun)?

Thanks,

Nick
SyNtAxx
Expert
 
Posts: 127
Liked: 14 times
Joined: Fri Jan 02, 2015 7:12 pm

Re: REFS 4k horror story

Veeam Logoby WimVD » Mon Mar 20, 2017 3:02 pm

Just my 2 cents:

1) 64K seems like a logical choice to me considering a Veeam repository works with big files. There is a 10% space tradeoff to consider however.
2) Sure, haven't seen any comparison against NTFS but in my albeit brief personal experience with ReFS performance is really good.
3) Check out https://www.veeam.com/veeamlive/best-practice-scaling-backup-repositories-microsoft-refs.html It has some good info on the inner workings of ReFS
WimVD
Service Provider
 
Posts: 48
Liked: 10 times
Joined: Tue Dec 23, 2014 4:04 pm

Re: REFS 4k horror story

Veeam Logoby WimVD » Mon Mar 20, 2017 3:05 pm

@Graham8: What are your jobs indicating as the bottleneck?
WimVD
Service Provider
 
Posts: 48
Liked: 10 times
Joined: Tue Dec 23, 2014 4:04 pm

Re: REFS 4k horror story

Veeam Logoby graham8 » Mon Mar 20, 2017 5:23 pm 1 person likes this post

WimVD wrote:@Graham8: What are your jobs indicating as the bottleneck?


Source 99% ... though, a Crystal Disk Mark against a network share on the same remote volume shows transfer speeds of ~300ish MB/s and ~3-4 MB/s 4k Q32T1 (which has gone down in speed by roughly half since it was first deployed).

I spoke with someone in support just now. Turns out I had disabled parallel processing a while ago, as a way of reducing the frequency of the server lockups due to the refs metadata memory exhaustion (and it did seem to help). With that set, of course, it's running everything single-threaded, so the performance I'm seeing probably makes sense - especially considering this is for incrementals where it's highly random disk access. Also, the 10:1 difference between "read" and "transferred" is likely due to the veeam blocksize that's being tracked verses the 4k filesystem block size.

In short, my bad - forgot a checkbox. I'll wait a week or two on the other server where I have the new refs patch installed and enabled, and if nothing else crops up, I'll reenable parallel processing, which I think will improve things.
graham8
Enthusiast
 
Posts: 59
Liked: 20 times
Joined: Wed Dec 14, 2016 1:56 pm

Re: REFS 4k horror story

Veeam Logoby kubimike » Mon Mar 20, 2017 5:56 pm

@Pikok are you doing forever forwards? That might be the case if you're not chopping up your backup chains with synthetic fulls
kubimike
Expert
 
Posts: 243
Liked: 23 times
Joined: Fri Feb 03, 2017 2:34 pm
Full Name: MikeO

Re: REFS 4k horror story

Veeam Logoby Limey_005 » Mon Mar 20, 2017 8:23 pm

Setup: Windows 2016 16 GB RAM, 4 Cores / Veeam 9.5 U1 - 36 TB ReFS [Running in a VM ESXi 6.5] - New installation / 4 Remote Proxies - All 10 Gb Network connectivity

I applied the same windows ReFS patch today with the RefsEnableLargeWorkingSetTrim regkey. I'm seeing crazy Transfer rates of 266 MB/s - 953 MB/s (50mins for 2.7 TB Processed, 1.2 TB Read, 1.4 TB Transferred (0.8x)) at this point the hardware reported an issue and dropped the RAID card. This server had been working fine until I applied this patch. I rebooted resolved the RAID array, but it seems highly coincidental I had only applied the patch an hour or so earlier and this was the first backup afterwards - I have since remove the regkey, so we'll see what happens next.......

It seems similar to graham8, maybe the flood gates opened and overwhelmed the hardware....?
Limey_005
Service Provider
 
Posts: 6
Liked: 1 time
Joined: Mon Oct 17, 2016 1:03 am

Re: REFS 4k horror story

Veeam Logoby Gostev » Mon Mar 20, 2017 11:11 pm

Well, this patch has 2 months worth of changes, so of course there may be unrelated bugs... however, I don't see how enabling working set trimming can have any impact on data transfer performance.
Gostev
Veeam Software
 
Posts: 21517
Liked: 2383 times
Joined: Sun Jan 01, 2006 1:01 am
Location: Baar, Switzerland

Re: REFS 4k horror story

Veeam Logoby WimVD » Mon Mar 20, 2017 11:44 pm 1 person likes this post

Limey_005 wrote:Setup: Windows 2016 16 GB RAM, 4 Cores / Veeam 9.5 U1 - 36 TB ReFS [Running in a VM ESXi 6.5]

Can't see how a windows patch in a virtual machine would bring down a raid controller in the ESXi host...
But granted that would be very coincidental seeing kubimike reported similar issues.
WimVD
Service Provider
 
Posts: 48
Liked: 10 times
Joined: Tue Dec 23, 2014 4:04 pm

Re: REFS 4k horror story

Veeam Logoby david.buchanan » Mon Mar 20, 2017 11:58 pm

My repo's been working fine but after seeing this post about the issues being fixed I figured why not patch it so I can minimize any potential issues. However, after installing the patch and applying the "RefsEnableLargeWorkingSetTrim" reg settings I can no longer access my ReFS volume and the server crashes every 30-45 minutes!

Is there some sort of initial scan that ReFS does with this patch or reg change? I can still see read and writes happening on my volume even though I can't access it via explorer.

I'll be logging a ticket with MS shortly but figured I'd add info here in case I'm not the only one.
david.buchanan
Enthusiast
 
Posts: 41
Liked: 8 times
Joined: Tue Jun 02, 2015 12:44 am
Full Name: David

Re: REFS 4k horror story

Veeam Logoby kubimike » Tue Mar 21, 2017 12:39 am 1 person likes this post

My veeam box is hard down with a failed raid controller now. I can't catch a break. 35 grand in hardware HP doesn't have the part handy. So much for 4 hour sla
kubimike
Expert
 
Posts: 243
Liked: 23 times
Joined: Fri Feb 03, 2017 2:34 pm
Full Name: MikeO

Re: REFS 4k horror story

Veeam Logoby david.buchanan » Tue Mar 21, 2017 1:05 am 1 person likes this post

Didn't get far enough to log a ticket with MS before finding it appears my issues is being caused by our AV (Webroot) and something in this new patch. I will talk to Webroot support about it. But this may help others if they run into it.
david.buchanan
Enthusiast
 
Posts: 41
Liked: 8 times
Joined: Tue Jun 02, 2015 12:44 am
Full Name: David

Re: REFS 4k horror story

Veeam Logoby Limey_005 » Tue Mar 21, 2017 2:42 am 1 person likes this post

I agree about not seeing how a windows patch could bring down a RAID controller in an ESXi host, but the only change made today was the windows patch and the regkey to enable it - I was seeing transfer rates of 266 MB/s which I though was great, but the I can't explain the super high transfer rates - my jobs are saying 91-99% proxy bottleneck, I limited them to 2vCPU/4GB ram.... Since removing the regkey the RAID hasn't failed again. Still seeing great transfer rates - 767 MB/s Read & 440 MB/s Transfer.....
Limey_005
Service Provider
 
Posts: 6
Liked: 1 time
Joined: Mon Oct 17, 2016 1:03 am

Re: REFS 4k horror story

Veeam Logoby HJAdams123 » Tue Mar 21, 2017 5:12 am

So has anyone actually had success with this latest patch and the registry settings?
HJAdams123
Enthusiast
 
Posts: 57
Liked: 14 times
Joined: Mon Jul 16, 2012 1:54 pm
Full Name: Harold Adams

PreviousNext

Return to Veeam Backup & Replication



Who is online

Users browsing this forum: Google [Bot], Google Feedfetcher and 61 guests