9.5/ReFS/Server 2016 Memory Consumption

Availability for the Always-On Enterprise

Re: 9.5/ReFS/Server 2016 Memory Consumption

Veeam Logoby dellock6 » Tue Jan 31, 2017 6:18 am

I'm sad to here these issues, but at least I'm happy that you are all confirming the issues are coming from volumes formatted with 4K clusters, and no issue has been reported with 64k clusters. The main issue at this point is that the default block size is 4KB, so many would just read about our new integration and go straight to format a volume with ReFS, and risk then to end up with these issues. Hopefully the news about our suggestion to go 64KB will spread more and more.
Luca Dell'Oca
EMEA Cloud Architect @ Veeam Software

@dellock6
http://www.virtualtothecore.com
vExpert 2011-2012-2013-2014-2015-2016
Veeam VMCE #1
dellock6
Veeam Software
 
Posts: 4876
Liked: 1280 times
Joined: Sun Jul 26, 2009 3:39 pm
Location: Varese, Italy
Full Name: Luca Dell'Oca

Re: 9.5/ReFS/Server 2016 Memory Consumption

Veeam Logoby christiankelly » Wed Feb 01, 2017 8:34 am

If it's this important couldn't you do a check when adding a ReFS repo and warn to format with 64k?
christiankelly
Service Provider
 
Posts: 115
Liked: 7 times
Joined: Sun May 06, 2012 6:22 pm
Full Name: Christian Kelly

Re: 9.5/ReFS/Server 2016 Memory Consumption

Veeam Logoby dellock6 » Wed Feb 01, 2017 9:15 am

Christian, I was almost about to post the same idea... ;)
Luca Dell'Oca
EMEA Cloud Architect @ Veeam Software

@dellock6
http://www.virtualtothecore.com
vExpert 2011-2012-2013-2014-2015-2016
Veeam VMCE #1
dellock6
Veeam Software
 
Posts: 4876
Liked: 1280 times
Joined: Sun Jul 26, 2009 3:39 pm
Location: Varese, Italy
Full Name: Luca Dell'Oca

Re: 9.5/ReFS/Server 2016 Memory Consumption

Veeam Logoby DaveWatkins » Wed Feb 01, 2017 7:02 pm

The issues will (presumably) get fixed by Microsoft, although in saying that 2016 has been out for some time now and they aren't fixed yet, so it may still be some time
DaveWatkins
Expert
 
Posts: 234
Liked: 60 times
Joined: Sun Dec 13, 2015 11:33 pm

Re: 9.5/ReFS/Server 2016 Memory Consumption

Veeam Logoby Gostev » Wed Feb 01, 2017 9:14 pm

2016 is still really, really fresh - those few months it's been out is nothing in Windows terms. It's a huge and complex piece of software, so Microsoft needs time to prioritize and address all major issues gradually (by the way, opening support cases is one thing that does really help to raise priority of addressing the particular issue - at least in Veeam).

But generally speaking, there's nothing unexpected here - there's the reason why most companies practice good old "no upgrade until SP1" rule with new major releases of any software at all, Veeam included. Early adopters should always be prepared to run into those harder to find bugs that has slipped by QC.

christiankelly wrote:If it's this important couldn't you do a check when adding a ReFS repo and warn to format with 64k?

We have this penciled for Update 2 - this is simple change so won't add this immediately, but rather keep monitoring the feedback for the next couple of months, and make more educated decision whether we should make this change closer to the actual update release.

The whole memory issue does look pretty simple at a first sight (there's apparent lack of system resources consumption management for some ReFS maintenance process), and hopefully should be easy for Microsoft to fix.
Gostev
Veeam Software
 
Posts: 21166
Liked: 2304 times
Joined: Sun Jan 01, 2006 1:01 am
Full Name: Anton Gostev

Re: 9.5/ReFS/Server 2016 Memory Consumption

Veeam Logoby mkretzer » Wed Feb 01, 2017 10:14 pm

@Gostev: But is it really an issue of not enough RAM? On our system RAM was never over 60 - 70 % and still the system crashed badly. Is there a "hard limit" which cannot be overcome even if there is more RAM avaiable?
mkretzer
Expert
 
Posts: 251
Liked: 61 times
Joined: Thu Dec 17, 2015 7:17 am

Re: 9.5/ReFS/Server 2016 Memory Consumption

Veeam Logoby DaveWatkins » Wed Feb 01, 2017 10:51 pm

Gostev wrote:But generally speaking, there's nothing unexpected here - there's the reason why most companies practice good old "no upgrade until SP1" rule with new major releases of any software at all, Veeam included. Early adopters should always be prepared to run into those harder to find bugs that has slipped by QC.

The slight wrinkle there is of course there will never be an SP1 for 2016, nor any service pack :). Cumulative updates help when updating a new install, so that is at least no longer a problem, but picking when to start the migration isn't quite as easy as SP1 anymore
DaveWatkins
Expert
 
Posts: 234
Liked: 60 times
Joined: Sun Dec 13, 2015 11:33 pm

Re: 9.5/ReFS/Server 2016 Memory Consumption

Veeam Logoby Gostev » Wed Feb 01, 2017 11:24 pm

@mkretzer not necessarily, can be some other system resource like handles or something (or just a deadlock on some shared resource). What I am saying is that these sort of massive issues are usually very easy to reproduce, troubleshoot and fix (unlike intermittent issues, which are real evil).

@Dave correct, but nevertheless there will still be "feature updates" at roughly the same cadence as service packs were previously.
Gostev
Veeam Software
 
Posts: 21166
Liked: 2304 times
Joined: Sun Jan 01, 2006 1:01 am
Full Name: Anton Gostev

Re: 9.5/ReFS/Server 2016 Memory Consumption

Veeam Logoby rgarvelink » Thu Feb 02, 2017 2:31 pm

I don't want to throw this thread into madness, but shouldn't we be careful before we immediately state that 64k is the only recommendation for ReFS?

The recommendation from Microsoft is still 4k for the majority of workloads: https://blogs.technet.microsoft.com/fil ... -and-ntfs/
Granted, they do state that, "64k clusters are applicable when working with large, sequential IO, but otherwise, 4K should be the default cluster size." Likely Veeam falls within that 64k recommendation, but why then was the initial recommendation from Veeam to utilize 4k unless the volume was large, 100TB I believe. IIRC, Gostev pointed that out in his video here: https://www.youtube.com/watch?v=V3vrsonuLE8&t=1841s

Due to the large IO size from Veeam we're sacrificing 5 - 10% of space for a problem that potentially could be resolved by sizing the repository server properly. As stated in this thread, 4Gb of memory per core is the recommendation wouldn't OP need 64 Gb just for the Veeam operations assuming he's at 16 threads and is hitting the recommendation of 1 core for every thread? We know that ReFS prioritizes data availability over everything else and it appears to do so via memory consumption. We might just need to take that into consideration when sizing repositories.

https://technet.microsoft.com/en-us/lib ... 24(v=ws.11).aspx

Availability. ReFS prioritizes the availability of data. Historically, file systems were often susceptible to data corruption that would require the system to be taken offline for repair. With ReFS, if corruption occurs, the repair process is both localized to the area of corruption and performed online, requiring no volume downtime. Although rare, if a volume does become corrupted or you choose not to use it with a mirror space or a parity space, ReFS implements salvage, a feature that removes the corrupt data from the namespace on a live volume and ensures that good data is not adversely affected by nonrepairable corrupt data. Because ReFS performs all repair operations online, it does not have an offline chkdsk command.

Proactive Error Correction. The integrity capabilities of ReFS are leveraged by a data integrity scanner, which is also known as a scrubber. The integrity scanner periodically scans the volume, identifying latent corruptions and proactively triggering a repair of that corrupt data.
rgarvelink
Lurker
 
Posts: 2
Liked: never
Joined: Fri May 01, 2015 11:45 pm
Full Name: Ryan Garvelink

Re: 9.5/ReFS/Server 2016 Memory Consumption

Veeam Logoby graham8 » Thu Feb 02, 2017 3:27 pm 1 person likes this post

rgarvelink wrote:As stated in this thread, 4Gb of memory per core is the recommendation wouldn't OP need 64 Gb just for the Veeam operations assuming he's at 16 threads and is hitting the recommendation of 1 core for every thread? We know that ReFS prioritizes data availability over everything else and it appears to do so via memory consumption. We might just need to take that into consideration when sizing repositories.

Nothing should ever result in crashes due to memory availability. Performance should suffer, but a system crash should never occur. If a system crashes, it's bad programming that was designed with poor assumptions that didn't take into account scaling considerations, failure to properly clean up memory allocation (leaks), etc. In our case, we had double 4GB per core, with Veeam completely disabled even, and refs integrity scans alone literally crashed the system repeatedly due to memory overconsumption. This appears to have been fixed with recent updates for us, but because these are production servers, I can't exactly test it repeatedly. I've been carefully monitoring ReFS driver memory consumption (via sysinternal's rammap) and I'm seeing it eat up huge amounts of memory during anything that hits a large segment of integrity-enabled data. That's fine and perfectly understandable and desirable if the memory it's filling is unused, but my confidence level is now low that it's always going to do a good job yielding to other memory demands gracefully, since it didn't once already (and I've had other refs problems as well...expansion/etc).

I do agree that 4k shouldn't be a *stability* problem if the refs code is good, and it appears that MS did fix some memory allocation issues in the refs driver code with post-2016 RTM updates. Here's hoping. To be fair, we're talking about a small sample size of people, and the 4k cluster size is new. I agree that it's too early to put a nail in the coffin on its recommendation - but I think the early feedback is pertinent for anyone trying to make a decision on maximizing stability for a production system at this time.
graham8
Enthusiast
 
Posts: 54
Liked: 20 times
Joined: Wed Dec 14, 2016 1:56 pm

Re: 9.5/ReFS/Server 2016 Memory Consumption

Veeam Logoby mkretzer » Thu Feb 02, 2017 4:09 pm

rgarvelink wrote:I don't want to throw this thread into madness, but shouldn't we be careful before we immediately state that 64k is the only recommendation for ReFS?

Please read our REFS 4 k horror story: veeam-backup-replication-f2/refs-4k-horror-story-t40629.html

We have 128 GB of RAM and 16 processor cores. We never had more than 80 GB consumption. The server still crashed 3 times yesterday night. Now with NTFS we never had any crash.

This is not fixable with ressources!
mkretzer
Expert
 
Posts: 251
Liked: 61 times
Joined: Thu Dec 17, 2015 7:17 am

Re: 9.5/ReFS/Server 2016 Memory Consumption

Veeam Logoby lando_uk » Fri Feb 03, 2017 5:41 pm

For those experiencing issues, are they all RefS/StorageSpaces systems? What about ReFS/Raid10 using a Raid controller and skipping shoddy storage spaces?
lando_uk
Expert
 
Posts: 228
Liked: 14 times
Joined: Thu Oct 17, 2013 10:02 am
Full Name: Mark

Re: 9.5/ReFS/Server 2016 Memory Consumption

Veeam Logoby suprnova » Fri Feb 03, 2017 7:01 pm

I do not use Storage Spaces and had the issue.
suprnova
Service Provider
 
Posts: 8
Liked: never
Joined: Fri Apr 08, 2016 5:15 pm

Re: 9.5/ReFS/Server 2016 Memory Consumption

Veeam Logoby tsightler » Fri Feb 03, 2017 7:44 pm

Definitely happens without storage spaces.
tsightler
Veeam Software
 
Posts: 4687
Liked: 1698 times
Joined: Fri Jun 05, 2009 12:57 pm
Full Name: Tom Sightler

Re: 9.5/ReFS/Server 2016 Memory Consumption

Veeam Logoby mk2311 » Mon Feb 06, 2017 9:10 am

Well.....

We don't use REFS, but having upgraded to Veeam 9.5 we have had lots of out of memory issues on the Veeam B&R server. These are the result of the VeeamAgent.exe module using memory for every proxy task we had.

So, we had a lot of proxies, and the proxy maximum concurrent tasks value was set to 16 and in some cases 24. In Veeam 9.0, this was never a problem. In 9.5, we started having memory issues

Ticket opened, lots of logs sent for analysis over a 3 week period

Our Veeam B&R server had 16gb ram and 8 CPU's

We had to double to 32gb ram and 10 CPU's. Still had issues. So had to increase to 40gb ram and 16 CPU's

So, at a particular time of the evening, we start around 20 jobs, each with many VM's. Many proxies will be used and we can see from the Veeam Resource log that many of the proxies used the full 16 tasks. A veeamagent.exe process runs for each task, so we had about 18 proxies running, each using up to 16 threads and we could see around 120-150 veeamagent processes running at any one time. Each one takes uses c250mb, but initially spikes at over 500mb for a few seconds - not all at the same time. Later in the evening we run 26 backup jobs and when complete, we run the last backups, around 22 of them.

So with other system related memory usage, it was killing the Veeam server

We were advised by Veeam to reduce the number of tasks to 8, which we have done, and we have not had the memory issues since. We can see now that it appears to peak at about 34gb memory.

Since upgraded to Update 1, still no problems

May, or may not, be of help?
mk2311
Novice
 
Posts: 3
Liked: never
Joined: Tue Apr 28, 2015 2:12 pm
Full Name: Jeff White

PreviousNext

Return to Veeam Backup & Replication



Who is online

Users browsing this forum: Bing [Bot], Google Feedfetcher and 18 guests