Availability for the Always-On Enterprise
Locked
soehl
Enthusiast
Posts: 46
Liked: 7 times
Joined: May 09, 2011 12:43 pm
Full Name: Sebastian
Location: Germany
Contact:

Re: REFS issues (server lockups, high CPU, high RAM)

Post by soehl » Feb 25, 2018 6:09 pm

Our repos have a size from around 50TB to arround 100TB, all repos based on RAID 60 with 2 Stripes. The newest repos have a ssd r/w-cache. (HPE SmartCache)
The largest VMs that we have in Veeam are around 14TB.

jayscarff
Service Provider
Posts: 74
Liked: 3 times
Joined: Nov 15, 2016 6:56 pm
Location: Cayman Islands
Contact:

Re: REFS issues (server lockups, high CPU, high RAM)

Post by jayscarff » Feb 25, 2018 7:49 pm

operations wrote:Would be also nice to know the repo size and larget VM size for those that upgraded.
If there is a recommend LUN size for Veeam that would be great to know, I've a 400TB lun for my refs backup volume..MS sizing limits are...
https://docs.microsoft.com/en-us/window ... s-overview
Jason
VMCE v9

Gostev
Veeam Software
Posts: 23215
Liked: 2977 times
Joined: Jan 01, 2006 1:01 am
Location: Baar, Switzerland
Contact:

Re: REFS issues (server lockups, high CPU, high RAM)

Post by Gostev » Feb 25, 2018 10:12 pm

Before the patch, you really wanted to avoid large LUNs like the one you mentioned. The patch changes the game, but no one will tell you the new recommendations as it has just been released. I do know that one of the Veeam users that Microsoft was working with very closely had 400TB ReFS volume.

DaveWatkins
Expert
Posts: 322
Liked: 86 times
Joined: Dec 13, 2015 11:33 pm
Contact:

Re: REFS issues (server lockups, high CPU, high RAM)

Post by DaveWatkins » Feb 25, 2018 10:17 pm 1 person likes this post

We're running 3 x ~62TB LUN's. Specifically kept them under 64TB because of the various MS/Windows things that have issues bigger than that. Patch seems to have helped fairly dramatically with latency on the LUN's. We'd had various registry entries set before the patch which got us stable (if not really very fast) that I've removed after applying the patch.

Merge times also seem improved using Fast/Block Clone

billcouper
Service Provider
Posts: 56
Liked: 13 times
Joined: Dec 18, 2017 8:58 am
Full Name: Bill Couper
Contact:

Re: REFS issues (server lockups, high CPU, high RAM)

Post by billcouper » Feb 26, 2018 12:40 am

I have only had one repo server lock up since the February patch and I feel it was unrelated to REFS anyway.

Fast clones were always fast and are still fast, but overall the performance of the volumes feels lower then before the patch. Read/write speeds feel slow. Moving data from one volume to another is painfully slow. Perhaps MS rate limited them to avoid locking up servers? I will see if I still have any detailed performance data from before the update and compare.

kubimike
Expert
Posts: 324
Liked: 37 times
Joined: Feb 03, 2017 2:34 pm
Full Name: MikeO
Contact:

Re: REFS issues (server lockups, high CPU, high RAM)

Post by kubimike » Feb 26, 2018 2:28 am

I noticed is Gostev’s Sunday Email it says uninstall the pre release driver. I assume the process would be to stop all Veeam services boot to recovery mode move out the pre release drives and copy which version of the driver back ? Or does that not matter ?? Reboot then patch ?

anton
Novice
Posts: 5
Liked: 1 time
Joined: Oct 04, 2011 7:22 am
Full Name: Anton van der Linden
Contact:

Re: REFS issues (server lockups, high CPU, high RAM)

Post by anton » Feb 26, 2018 7:48 am 1 person likes this post

Also in our enviroment a major improvement.
Merges go back from 1 hour to 15 minutes; also in the second run.

Before the patch we had the following registry settings on all repositories (2 backup repositories (210TB + 90 TB), 1 copy repository (112TB))
HKLM\SYSTEM\CurrentControlSet\Control\FileSystem\RefsEnableLargeWorkingSetTrim = 1
HKLM\SYSTEM\CurrentControlSet\Control\FileSystem\RefsNumberOfChunksToTrim = 32
HKLM\SYSTEM\CurrentControlSet\Control\FileSystem\RefsDisableCachedPins = 1
HKLM\SYSTEM\CurrentControlSet\Control\FileSystem\RefsProcessedDeleteQueueEntryCountThreshold = 512

I noticed that this morning I only saw the first setting:
HKLM\SYSTEM\CurrentControlSet\Control\FileSystem\RefsEnableLargeWorkingSetTrim = 1

The others were gone.
I also removed this setting this morning; will let you know tomorrow if this influenced the performance.

rhiem
Novice
Posts: 8
Liked: 2 times
Joined: Feb 22, 2016 8:49 pm
Contact:

Re: REFS issues (server lockups, high CPU, high RAM)

Post by rhiem » Feb 26, 2018 9:10 am 2 people like this post

The Patch definitely fix the slow Fast Clone Process in our environment.

Before the Patch:

Job1: FastClone -> 1 to 2 Hours

After the Patch:

Job1: FastClone -> 6 to 8 Minutes

I will let you know if it is stable.

antipolis
Enthusiast
Posts: 68
Liked: 9 times
Joined: Oct 26, 2016 9:17 am
Contact:

Re: REFS issues (server lockups, high CPU, high RAM)

Post by antipolis » Feb 26, 2018 9:45 am 1 person likes this post

seems better here as well... ~8 hours > ~2 hours

need to confirm over the next few weeks, I'm tempted to temporarly re-enable synthetics on my biggest job to have a better idea of the improvements

mweissen13
Service Provider
Posts: 18
Liked: 7 times
Joined: Dec 28, 2017 3:22 pm
Full Name: Michael Weissenbacher
Contact:

Re: REFS issues (server lockups, high CPU, high RAM)

Post by mweissen13 » Feb 26, 2018 10:07 am 1 person likes this post

Luckily enough we never had any lock-ups before the patch, but our Repos are fairly small (20TB max) and we always used 64KiB cluster size and plenty of RAM. Now with the patch applied the performance seems to be better, but that was always the case for some days after a reboot. We will see after a few weeks if the performance stays good for a prolonged time.

LeoKurz
Veeam ProPartner
Posts: 25
Liked: 6 times
Joined: Mar 16, 2011 8:36 am
Full Name: Leonhard Kurz
Contact:

Re: REFS issues (server lockups, high CPU, high RAM)

Post by LeoKurz » Feb 26, 2018 1:23 pm

WSUS offers "2018-02 Cumulative Update... (KB4074590)"
Update Calatog offers "2018-02 Cumulative Update... (KB4077525)"

Is it save to import the later patch into WSUS and deplioy it from there?

__Leo

suprnova
Service Provider
Posts: 33
Liked: never
Joined: Apr 08, 2016 5:15 pm
Contact:

Re: REFS issues (server lockups, high CPU, high RAM)

Post by suprnova » Feb 26, 2018 3:06 pm

Seems like everyone has better luck than me. I migrated most of my backups back to NTFS after 8 months of battling ReFS so it's difficult for me to test overall improvement, but I just tested out a delete on a patched repo. While it's a slight improvement (I didn't need to reset the repo), the repository drive goes offline for the duration of the delete. I am still using the usual registry keys. I also tested out a synthetic full with block clone to another repo and it completely froze up 30 hours into it with CPU at 100% (I did have to reset this one).

I'm definitely not comfortable enough to recommend ReFS even with this latest patch.

Gostev
Veeam Software
Posts: 23215
Liked: 2977 times
Joined: Jan 01, 2006 1:01 am
Location: Baar, Switzerland
Contact:

Re: REFS issues (server lockups, high CPU, high RAM)

Post by Gostev » Feb 26, 2018 3:46 pm

@suprnova if you are still observing system freezes, then most likely you did not install the patch correctly. From what I remember, this freeze issue was caused a bug in the OS memory manager - an NTFS-specific optimization that was acting up with ReFS volumes, and the patch does address one.

Also, the presence of "the usual registry keys" indicates you may have tried some older version of the patch before at some point, and it may be still there messing things up (guess this is why ReFS team insisted all that old stuff must be removed/uninstalled before installing the patch).

If I were you, I would just start from clean OS install - this is the only way to really make sure you're using the patch in the way that was tested by Microsoft QC.

Ctek
Service Provider
Posts: 63
Liked: 9 times
Joined: Nov 11, 2015 3:50 pm
Location: Canada
Contact:

Re: REFS issues (server lockups, high CPU, high RAM)

Post by Ctek » Feb 26, 2018 4:47 pm 1 person likes this post

I applied the patch to 1 server for DEV servers, I can't really comment on performance as I did not do fulls with it (I'll do it next week) but what I do see on my monitoring is that the RAM dips during the night are less drastic. This means that on my end at least, on only 1 server, there is lower RAM usage overall during an intensive backup Window. Once properly tested, i'll report back on bigger production servers.
VMCE 9 Certified - Systems Administrator

jameskilbynet
Veeam Vanguard
Posts: 29
Liked: 4 times
Joined: Jan 14, 2015 11:18 am
Full Name: James Kilby
Contact:

Re: REFS issues (server lockups, high CPU, high RAM)

Post by jameskilbynet » Feb 26, 2018 5:21 pm

We are still seeing some stability issues post this patch. Ours is 160TB REFS volume with approx 80TB in use. We have 128Gb of ram and this is a storage space ( mirror setup) with Nvme cache. We see issues with large data ingestion ie active full or evac of another repo towards the REFS one. We will open another call with Veeam/MS tomorrow

Locked

Who is online

Users browsing this forum: Baidu [Spider], foggy, Google [Bot], ivica.vujovic and 80 guests