REFS issues (server lockups, high CPU, high RAM)

Availability for the Always-On Enterprise

Re: REFS issues (server lockups, high CPU, high RAM)

Veeam Logoby soehl » Sun Feb 25, 2018 6:09 pm

Our repos have a size from around 50TB to arround 100TB, all repos based on RAID 60 with 2 Stripes. The newest repos have a ssd r/w-cache. (HPE SmartCache)
The largest VMs that we have in Veeam are around 14TB.
soehl
Enthusiast
 
Posts: 40
Liked: 6 times
Joined: Mon May 09, 2011 12:43 pm
Location: Germany
Full Name: Sebastian

Re: REFS issues (server lockups, high CPU, high RAM)

Veeam Logoby jayscarff » Sun Feb 25, 2018 7:49 pm

operations wrote:Would be also nice to know the repo size and larget VM size for those that upgraded.

If there is a recommend LUN size for Veeam that would be great to know, I've a 400TB lun for my refs backup volume..MS sizing limits are...
https://docs.microsoft.com/en-us/window ... s-overview
Jason
VMCE v9
jayscarff
Service Provider
 
Posts: 52
Liked: 2 times
Joined: Tue Nov 15, 2016 6:56 pm
Location: Cayman Islands

Re: REFS issues (server lockups, high CPU, high RAM)

Veeam Logoby Gostev » Sun Feb 25, 2018 10:12 pm

Before the patch, you really wanted to avoid large LUNs like the one you mentioned. The patch changes the game, but no one will tell you the new recommendations as it has just been released. I do know that one of the Veeam users that Microsoft was working with very closely had 400TB ReFS volume.
Gostev
Veeam Software
 
Posts: 22395
Liked: 2673 times
Joined: Sun Jan 01, 2006 1:01 am
Location: Baar, Switzerland

Re: REFS issues (server lockups, high CPU, high RAM)

Veeam Logoby DaveWatkins » Sun Feb 25, 2018 10:17 pm 1 person likes this post

We're running 3 x ~62TB LUN's. Specifically kept them under 64TB because of the various MS/Windows things that have issues bigger than that. Patch seems to have helped fairly dramatically with latency on the LUN's. We'd had various registry entries set before the patch which got us stable (if not really very fast) that I've removed after applying the patch.

Merge times also seem improved using Fast/Block Clone
DaveWatkins
Expert
 
Posts: 309
Liked: 79 times
Joined: Sun Dec 13, 2015 11:33 pm

Re: REFS issues (server lockups, high CPU, high RAM)

Veeam Logoby billcouper » Mon Feb 26, 2018 12:40 am

I have only had one repo server lock up since the February patch and I feel it was unrelated to REFS anyway.

Fast clones were always fast and are still fast, but overall the performance of the volumes feels lower then before the patch. Read/write speeds feel slow. Moving data from one volume to another is painfully slow. Perhaps MS rate limited them to avoid locking up servers? I will see if I still have any detailed performance data from before the update and compare.
billcouper
Enthusiast
 
Posts: 38
Liked: 9 times
Joined: Mon Dec 18, 2017 8:58 am
Full Name: Bill Couper

Re: REFS issues (server lockups, high CPU, high RAM)

Veeam Logoby kubimike » Mon Feb 26, 2018 2:28 am

I noticed is Gostev’s Sunday Email it says uninstall the pre release driver. I assume the process would be to stop all Veeam services boot to recovery mode move out the pre release drives and copy which version of the driver back ? Or does that not matter ?? Reboot then patch ?
kubimike
Expert
 
Posts: 306
Liked: 37 times
Joined: Fri Feb 03, 2017 2:34 pm
Full Name: MikeO

Re: REFS issues (server lockups, high CPU, high RAM)

Veeam Logoby anton » Mon Feb 26, 2018 7:48 am 1 person likes this post

Also in our enviroment a major improvement.
Merges go back from 1 hour to 15 minutes; also in the second run.

Before the patch we had the following registry settings on all repositories (2 backup repositories (210TB + 90 TB), 1 copy repository (112TB))
HKLM\SYSTEM\CurrentControlSet\Control\FileSystem\RefsEnableLargeWorkingSetTrim = 1
HKLM\SYSTEM\CurrentControlSet\Control\FileSystem\RefsNumberOfChunksToTrim = 32
HKLM\SYSTEM\CurrentControlSet\Control\FileSystem\RefsDisableCachedPins = 1
HKLM\SYSTEM\CurrentControlSet\Control\FileSystem\RefsProcessedDeleteQueueEntryCountThreshold = 512

I noticed that this morning I only saw the first setting:
HKLM\SYSTEM\CurrentControlSet\Control\FileSystem\RefsEnableLargeWorkingSetTrim = 1

The others were gone.
I also removed this setting this morning; will let you know tomorrow if this influenced the performance.
anton
Novice
 
Posts: 4
Liked: 1 time
Joined: Tue Oct 04, 2011 7:22 am
Full Name: Anton van der Linden

Re: REFS issues (server lockups, high CPU, high RAM)

Veeam Logoby rhiem » Mon Feb 26, 2018 9:10 am 2 people like this post

The Patch definitely fix the slow Fast Clone Process in our environment.

Before the Patch:

Job1: FastClone -> 1 to 2 Hours

After the Patch:

Job1: FastClone -> 6 to 8 Minutes

I will let you know if it is stable.
rhiem
Novice
 
Posts: 7
Liked: 2 times
Joined: Mon Feb 22, 2016 8:49 pm

Re: REFS issues (server lockups, high CPU, high RAM)

Veeam Logoby antipolis » Mon Feb 26, 2018 9:45 am 1 person likes this post

seems better here as well... ~8 hours > ~2 hours

need to confirm over the next few weeks, I'm tempted to temporarly re-enable synthetics on my biggest job to have a better idea of the improvements
antipolis
Enthusiast
 
Posts: 68
Liked: 9 times
Joined: Wed Oct 26, 2016 9:17 am

Re: REFS issues (server lockups, high CPU, high RAM)

Veeam Logoby mweissen13 » Mon Feb 26, 2018 10:07 am 1 person likes this post

Luckily enough we never had any lock-ups before the patch, but our Repos are fairly small (20TB max) and we always used 64KiB cluster size and plenty of RAM. Now with the patch applied the performance seems to be better, but that was always the case for some days after a reboot. We will see after a few weeks if the performance stays good for a prolonged time.
mweissen13
Service Provider
 
Posts: 15
Liked: 6 times
Joined: Thu Dec 28, 2017 3:22 pm
Full Name: Michael Weissenbacher

Re: REFS issues (server lockups, high CPU, high RAM)

Veeam Logoby LeoKurz » Mon Feb 26, 2018 1:23 pm

WSUS offers "2018-02 Cumulative Update... (KB4074590)"
Update Calatog offers "2018-02 Cumulative Update... (KB4077525)"

Is it save to import the later patch into WSUS and deplioy it from there?

__Leo
LeoKurz
Veeam ProPartner
 
Posts: 23
Liked: 6 times
Joined: Wed Mar 16, 2011 8:36 am
Full Name: Leonhard Kurz

Re: REFS issues (server lockups, high CPU, high RAM)

Veeam Logoby suprnova » Mon Feb 26, 2018 3:06 pm

Seems like everyone has better luck than me. I migrated most of my backups back to NTFS after 8 months of battling ReFS so it's difficult for me to test overall improvement, but I just tested out a delete on a patched repo. While it's a slight improvement (I didn't need to reset the repo), the repository drive goes offline for the duration of the delete. I am still using the usual registry keys. I also tested out a synthetic full with block clone to another repo and it completely froze up 30 hours into it with CPU at 100% (I did have to reset this one).

I'm definitely not comfortable enough to recommend ReFS even with this latest patch.
suprnova
Service Provider
 
Posts: 33
Liked: never
Joined: Fri Apr 08, 2016 5:15 pm

Re: REFS issues (server lockups, high CPU, high RAM)

Veeam Logoby Gostev » Mon Feb 26, 2018 3:46 pm

@suprnova if you are still observing system freezes, then most likely you did not install the patch correctly. From what I remember, this freeze issue was caused a bug in the OS memory manager - an NTFS-specific optimization that was acting up with ReFS volumes, and the patch does address one.

Also, the presence of "the usual registry keys" indicates you may have tried some older version of the patch before at some point, and it may be still there messing things up (guess this is why ReFS team insisted all that old stuff must be removed/uninstalled before installing the patch).

If I were you, I would just start from clean OS install - this is the only way to really make sure you're using the patch in the way that was tested by Microsoft QC.
Gostev
Veeam Software
 
Posts: 22395
Liked: 2673 times
Joined: Sun Jan 01, 2006 1:01 am
Location: Baar, Switzerland

Re: REFS issues (server lockups, high CPU, high RAM)

Veeam Logoby Ctek » Mon Feb 26, 2018 4:47 pm 1 person likes this post

I applied the patch to 1 server for DEV servers, I can't really comment on performance as I did not do fulls with it (I'll do it next week) but what I do see on my monitoring is that the RAM dips during the night are less drastic. This means that on my end at least, on only 1 server, there is lower RAM usage overall during an intensive backup Window. Once properly tested, i'll report back on bigger production servers.
VMCE 9 Certified - Systems Administrator
Ctek
Service Provider
 
Posts: 55
Liked: 7 times
Joined: Wed Nov 11, 2015 3:50 pm
Location: Canada

Re: REFS issues (server lockups, high CPU, high RAM)

Veeam Logoby jameskilbynet » Mon Feb 26, 2018 5:21 pm

We are still seeing some stability issues post this patch. Ours is 160TB REFS volume with approx 80TB in use. We have 128Gb of ram and this is a storage space ( mirror setup) with Nvme cache. We see issues with large data ingestion ie active full or evac of another repo towards the REFS one. We will open another call with Veeam/MS tomorrow
jameskilbynet
Veeam Vanguard
 
Posts: 24
Liked: 4 times
Joined: Wed Jan 14, 2015 11:18 am
Full Name: James Kilby

PreviousNext

Return to Veeam Backup & Replication



Who is online

Users browsing this forum: alex1992, Cicadymn, epierre@ipsoft.com, Google Feedfetcher and 62 guests