Comprehensive data protection for all workloads
Andrew@MSFT
Technology Partner
Posts: 10
Liked: 17 times
Joined: Nov 19, 2019 5:31 pm
Full Name: Andrew Hansen
Contact:

Re: Windows 2019, large REFS and deletes

Post by Andrew@MSFT »

Thanks everyone for sharing your experiences with the hotfix! It's good to see it is helping!

dbr
Enthusiast
Posts: 77
Liked: 5 times
Joined: Apr 06, 2017 9:48 am
Full Name: Daniel Brase
Contact:

Re: Windows 2019, large REFS and deletes

Post by dbr »

Glad to hear the issues seems to be solved for many of us. Yesterday I installed the patch and configured the settings recommended by Andrew. However, before and after installing the patch I have performance issues on a refs partition mentioned here. Okay, the next synthetic run will start on Saturday but I'm wondering whether there are still some issues on refs. On ntfs there are no problems. I would have expected that I had a read throughput with diskspd more than 120MB/s. My questions are:

1. Does anyone still have performance issues after installing the patch, even with the given settings?
2. Is diskspd the right tool to check and compare throughput? To me it looks quite strange that on a test refs partition (30TB) with some test runs read throughput switches between 5GB/s and 100-120MB/s.
3. Is it possible to check for refs specific background activity on a refs partition?

On Monday I will compare the fast clone times for the coming syntethic full with the one on last weekend. Maybe fast clone will run much faster as last weekend even though diskspd tells me something different with regard to read throughput.

Andrew@MSFT
Technology Partner
Posts: 10
Liked: 17 times
Joined: Nov 19, 2019 5:31 pm
Full Name: Andrew Hansen
Contact:

Re: Windows 2019, large REFS and deletes

Post by Andrew@MSFT »

@dbr, would love to hear how things go after the weekend synthetic full. Keep me posted and if you still have issues, I'm happy to look into it together.

mkretzer
Expert
Posts: 678
Liked: 159 times
Joined: Dec 17, 2015 7:17 am
Contact:

Re: Windows 2019, large REFS and deletes

Post by mkretzer »

@dbr Andrew can really help here! I and a few others invested only a little bit of time and provides a few live-dumps and they were able to create this update which now has helped alot of people already :-)

dbr
Enthusiast
Posts: 77
Liked: 5 times
Joined: Apr 06, 2017 9:48 am
Full Name: Daniel Brase
Contact:

Re: Windows 2019, large REFS and deletes

Post by dbr »

Last weekend our synthetic full ran quite fast. Seems that the patch speeds up fast clone process significantly also in our environment. But I'm still surprised why diskspd throughput varies so extremely on our system. It wouldn't matter but because of the infinite backup chain when using synthetic full or forward incremental forever, we use monthly health checks on backup files and they take an unusual long time. To do some further tests I formatted the test partition (30TB) with ntfs, will run some backups on it, will check again with diskspd and report it here.

dbr
Enthusiast
Posts: 77
Liked: 5 times
Joined: Apr 06, 2017 9:48 am
Full Name: Daniel Brase
Contact:

Re: Windows 2019, large REFS and deletes

Post by dbr » 1 person likes this post

After testing a bit I saw a performance impact the first time on a ntfs volume. Maybe it's too early to be sure but at the moment I assume the ups and downs in performance are related to raid controller background tasks like patrol read and consistency checks. It seems that this kind of issues also occur on ntfs and therefore aren't refs specific.

SDI
Lurker
Posts: 1
Liked: never
Joined: Apr 03, 2020 8:53 am
Full Name: Dieter Schragner
Contact:

Re: Windows 2019, large REFS and deletes

Post by SDI »

Hello together,
I think I have a similar issue.
My synthethic full job with 40TB run about 33 hours and I think this is too long.
I have a Windows Server 2019 Standard 1809 core installed with a 500TB volume. In this Volume about 200TB are used by Veeam Repository.

Are the following settings recommended for me too?
Andrew@MSFT wrote: Mar 18, 2020 12:43 am Yes. The registry tweaks are still recommended.

• Ensure Trim is disabled

Code: Select all

 fsutil behavior set DisableDeleteNotify ReFS 1
• Set RefsEnableLargeWorkingSetTrim = 1

Code: Select all

HKEY_LOCAL_MACHINE\System\CurrentControlSet\Control\FileSystem
Value Name: RefsEnableLargeWorkingSetTrim
Value Type: REG_DWORD
Value Data: 1
Today I installed the Update "kb4541331" but at the moment I don't know if there is a performance benefit. I will tell you as soon as I can.

AuGL
Enthusiast
Posts: 51
Liked: 3 times
Joined: May 07, 2019 12:22 am
Full Name: Glenn
Contact:

Re: Windows 2019, large REFS and deletes

Post by AuGL »

I would also like to know whether I should implement these settings?

• Ensure Trim is disabled
• Set RefsEnableLargeWorkingSetTrim = 1

I still see some merge operations taking a longer time than expected, with high memory usage. I have to reboot the repository server once a week to keep things going OK.
I am using Server 2019 build 1809 and I have kb4541331 installed.

Thanks

Gostev
SVP, Product Management
Posts: 26887
Liked: 4360 times
Joined: Jan 01, 2006 1:01 am
Location: Baar, Switzerland
Contact:

Re: Windows 2019, large REFS and deletes

Post by Gostev »

Yes, Microsoft recommends these settings to be enabled.

AuGL
Enthusiast
Posts: 51
Liked: 3 times
Joined: May 07, 2019 12:22 am
Full Name: Glenn
Contact:

Re: Windows 2019, large REFS and deletes

Post by AuGL »

Thanks for the reply Gostev.

What is the recommendation from Veeam these days for setting up new Windows based ReFS repositories, in terms of the OS and how large the volumes should be?
I am going to buy two HPE Apollo servers to be used as Veeam Repository servers

Should I use the latest build of Server 2019 (1909)?
Should I create one large ReFS volume, e.g. 280TB, or split them into smaller 60TB volumes for example?

mkretzer
Expert
Posts: 678
Liked: 159 times
Joined: Dec 17, 2015 7:17 am
Contact:

Re: Windows 2019, large REFS and deletes

Post by mkretzer » 1 person likes this post

From my experience, you get the best performance with server core 1909. But with the latest patches of W2019 it really might be "good enough" for you. We use both and the difference is not that huge.

With multiple volumes you usually get better performance as the REFS issues there might be are always "per volume". After talking to microsoft we decided to use 3x 200 TB instead of 1x 600 TB for our primary repo and had no issues since - but with latest patches we also have one remote backup copy target with 360 TB without issues.

So splitting your volume in two medium size parts might be a good "middle way" if you want to be more safe from issues.

JaySt
Service Provider
Posts: 210
Liked: 32 times
Joined: Jun 09, 2015 7:08 pm
Full Name: JaySt
Contact:

Re: Windows 2019, large REFS and deletes

Post by JaySt »

mkretzer, just curious. What's the physical (RAID/HDD) design for each of those 200TB volumes?
Veeam Certified Engineer

mkretzer
Expert
Posts: 678
Liked: 159 times
Joined: Dec 17, 2015 7:17 am
Contact:

Re: Windows 2019, large REFS and deletes

Post by mkretzer » 2 people like this post

They all come from one big Hitachi G200 dedicated SAN system. It has 12x 16-Disk RAID 6 in one big 600 TB Pool.

From that Pool we carved 24 virtual disks, so 8 per REFS volume which leads to 200 TB REFS volumes.

This setup works extremely well for us and we never see a target bottleneck when writing to the system.

JaySt
Service Provider
Posts: 210
Liked: 32 times
Joined: Jun 09, 2015 7:08 pm
Full Name: JaySt
Contact:

Re: Windows 2019, large REFS and deletes

Post by JaySt » 1 person likes this post

good to know, thanks! I have worked alot with Hitachi stuff in the past, solid stuff.
Veeam Certified Engineer

EricB
Influencer
Posts: 12
Liked: 1 time
Joined: Jul 13, 2016 6:55 am
Full Name: Eric
Location: The Netherlands
Contact:

Re: Windows 2019, large REFS and deletes

Post by EricB »

Hi guys, another case of ReFS issues....

Windows 2019 LTSC build 1809 physical server
dual Xeon CPU but only 64GB memory (I know its not enough, an upgrade to 192GB is on the way)

4*90TB ReFS repos 64k cluster size formatted DAS storage connected (no dedupe)
We've not yet applied any registry tweaks or fsutil modifications.

Using short <30 day chains of Forever Incrementals only.

I have case (Veeam case 04154452) that performance has been very good for weeks/months but several weeks after applying KB4541331 we now have the issue that ReFS fastclone performance is terrible (holding up all other job processing).
I see the physical box working at 110% CPU (half going to system proces other half to Veeam) with very little I/O activity for hours after another. Memory consumption (with just 64GB available is only at ~50%!)

Performance looks similar to what PeterC mentioned, only during fastclone activity all other IO drops and it is a little better than during fastclone (although it never again reaches the normal performance).
The issue is mostly that there are now always some running fastclone actions...

Shortly after applying that KB (about a week later) since 02-05-2020 4:19 AM (which is approximately when our current issues started) we have had hourly storage warnings in Windows eventlogs on that server in two flavours:
ReFS eventid 149
In the past 3649 seconds we had IO failures. (summary report)

High latency IO count: 4
Failed writes: 0
Failed reads: 0
Volume Id: {b8928d33-8461-456a-a89a-0e6dfd8956e8}
Volume name: U:

Also, on a lower frequency but still several times a day.
An IO took more than 30000 ms to complete: (direct issue)
ReFS eventid 147
Process Id: 8792
Process name: VeeamAgent.exe (interchangeable with System or VeeamDeploymen)
File name: 000000000000070E 0000000000000F36
File offset: 4096
IO Type: Write: Paging, NonCached, Sync
IO Size: 524288 bytes
0 cluster(s) starting at cluster 0
Latency: 48743 ms

Before that moment I only have three entries on 29 and 30-04 2020 and before that moment we never had them while the system was already under the same load before.
The mentioned patch was installed on 26-04-2020.

From a storage perspective the load is under control and no significant latency is observed.

In the lifetime of this solution, about half a year, we've once before (before applying KB4541331) seen a similar situation where eventually a reboot of the box helped.
To be clear, the solution has been working fine for months apart from those two "incidents" which really doesn't add up for me.

So far the Veeam case didn't help much yet.
Any recommendations you can give me?
To be clear, if go on the route of modifying regkey (RefsEnableLargeWorkingSetTrim = 1 ) and fsutil (disable Trim), I expect I first need to halt all Veeam jobs.
Can you confirm a reboot is required (I hope not because I expect the issue will be gone again for several weeks any way and would like to get a clear before and after situation with those changes).

PS: I've absolutely been very conservative going forward to ReFS and have been following these threads and the word from Gostev closely, I finally felt like it was OK to move forward but at the moment I'm regretting this decision.

Gostev
SVP, Product Management
Posts: 26887
Liked: 4360 times
Joined: Jan 01, 2006 1:01 am
Location: Baar, Switzerland
Contact:

Re: Windows 2019, large REFS and deletes

Post by Gostev » 1 person likes this post

In general, as you can see from the above feedback, we have multiple customers with much larger ReFS deployments happy with the performance - so the issue seems to be specific to your backup repository server configuration. Veeam support will not help you here indeed, as without the source code we cannot debug Windows or ReFS issues... you should open a case with Microsoft instead.

I would also suggest trying the following:
1. Make sure you have real-time antivirus protection disabled (including Windows Defender).
2. Reduce concurrent tasks on the repository significantly. In case lack of memory is the issue, it will be multiplied by not using the recommended ReFS registry settings you mentioned above. The RefsEnableLargeWorkingSetTrim key reduces memory pressure, and is highly recommended for RAM-restricted configurations.
3. Enable synthetic fulls in the jobs. I noticed you do forever incremental, which is a non-default setting, and is unusual to use for ReFS - and makes no sense as synthetic fulls are "free" on ReFS. So currently, the workload you're putting on ReFS is very different from what most customers do.

Finally, if you're still uncomfortable with ReFS, you can always switch to XFS - which provides the same exact capabilities.

Thanks!

EricB
Influencer
Posts: 12
Liked: 1 time
Joined: Jul 13, 2016 6:55 am
Full Name: Eric
Location: The Netherlands
Contact:

Re: Windows 2019, large REFS and deletes

Post by EricB » 1 person likes this post

Hi Gostev,

Thanks for your quick feedback, much appreciated!
At the moment, Windows Defender is enabled but does have all recommended files and folders excluded. Will disable it to see what happens but we have already attempted on the earlier occassion we had this issue several months ago and it didnt make a difference (likely because the exclusions are correct).

Regarding point two, reducing the amount of concurrent tasks is somewhat of a challenge to be able to meet RPO's under normal circumstances.
As mentioned, memory is on the way but will also prepare to implement the following modifications to the system:
• Ensure Trim is disabled --> fsutil behavior set DisableDeleteNotify ReFS 1
• Set RefsEnableLargeWorkingSetTrim - mem restricted configs recommended- reduces memory pressure.
HKEY_LOCAL_MACHINE\System\CurrentControlSet\Control\FileSystem
Value Name: RefsEnableLargeWorkingSetTrim
Value Type: REG_DWORD
Value Data: 1

Still in this regard i think its strange that the configuration has been working fine for months until a sudden slowdown. I'm also pretty sure that the issue will be temporarily resolved after a reboot as this was the same the previous occurrance of the issue.

Regarding the synthetic fulls, I agree there is no benefit to using forever incremetal nor was I aware that there was a con to it. Had never read anything about that up to now.
At the moment we had that Forever Incremental short scheme as default for most jobs and a 150 day Forward scheme with synthetic full backups for a few specific jobs.

Just for my understanding, is it a basic recommendation when using ReFS you must use forward incremental with synthetic fulls?
Does this then also mean it is not recommended to use ReFS on a backup copy target where the primary chain always is a forever incremental? (besides GFS)

Regarding comfort level, one could get the feeling ReFS is still a little too experimental due to all issues and fixes known.

Btw, also support did a follow up today and they basically confirm what you say that a support case at Microsoft is recommended.

Thanks again

Andrew@MSFT
Technology Partner
Posts: 10
Liked: 17 times
Joined: Nov 19, 2019 5:31 pm
Full Name: Andrew Hansen
Contact:

Re: Windows 2019, large REFS and deletes

Post by Andrew@MSFT »

Hi EricB,

Please PM me the support case number when you open it. I'll see what I can do from my side.

Thanks!

EricB
Influencer
Posts: 12
Liked: 1 time
Joined: Jul 13, 2016 6:55 am
Full Name: Eric
Location: The Netherlands
Contact:

Re: Windows 2019, large REFS and deletes

Post by EricB »

Hi Andrew,

Thanks for the support!
Just a quick update, last night I couldn't bear the perspective of another night of missed backups and failed jobs so I decided to give the box a reboot and directly after that reboot the job performance was back to normal (low CPU utilization and high I/O).
Some backup health checks and one problematic fastclone merge are still running but I expect them to finish the coming hours after which I will also reenable backup copies and replica jobs.

Will discus the options with the customer today and depending from a green light I'll make a Microsoft support case.
In the current status I'm not confident the issue won't return after several weeks/months because the same happened once before around 12-03-2020

EricB
Influencer
Posts: 12
Liked: 1 time
Joined: Jul 13, 2016 6:55 am
Full Name: Eric
Location: The Netherlands
Contact:

Re: Windows 2019, large REFS and deletes

Post by EricB »

Situation is that everything still runs fine.
Microsoft support case was now created and i've send you the case number.

AuGL
Enthusiast
Posts: 51
Liked: 3 times
Joined: May 07, 2019 12:22 am
Full Name: Glenn
Contact:

Re: Windows 2019, large REFS and deletes

Post by AuGL »

EricB wrote: May 06, 2020 7:16 am Just a quick update, last night I couldn't bear the perspective of another night of missed backups and failed jobs so I decided to give the box a reboot and directly after that reboot the job performance was back to normal (low CPU utilization and high I/O).
I have this situation too. The server has 128GB memory. One 50TB ReFS volume.
The merge jobs starting taking 2 to 3 times longer on the jobs with larger VMs, such as Exchange or SQL VMs. So every 10 to 14 days I have to reboot. This only started happening a few months back, so I'm thinking it's crept in with one of the recent Windows Updates.

What I noticed after rebooting, if there is no jobs running, the memory usage is less than 8GB.
When the issue occurs, and no jobs are running, the memory usage is around 40GB.

Rebooting always gets it back to normal for another 10 to 14 days.

EdgarRicharte
Service Provider
Posts: 51
Liked: 10 times
Joined: Jul 17, 2019 10:06 pm
Contact:

Re: Windows 2019, large REFS and deletes

Post by EdgarRicharte »

I believe the same thing is happening to us that is happening to EricB and AuGL. Resources in use are low. But have a bunch of jobs in cloud connect just "frozen". Reboot fixed it in the past(gave me an excuse to update to patch 1) but it seems to have come back a few weeks later. I really don't want to restart VCC server and my repos. But I can't have my offsites just "sitting".

evilaedmin
Expert
Posts: 160
Liked: 26 times
Joined: Jul 26, 2018 8:04 pm
Full Name: Eugene V
Contact:

Re: Windows 2019, large REFS and deletes

Post by evilaedmin »

Gostev wrote: May 05, 2020 11:37 am 1. Make sure you have real-time antivirus protection disabled (including Windows Defender).
Does this apply to legacy file signature based products and ‘next generation anti malware’ both?

Gostev
SVP, Product Management
Posts: 26887
Liked: 4360 times
Joined: Jan 01, 2006 1:01 am
Location: Baar, Switzerland
Contact:

Re: Windows 2019, large REFS and deletes

Post by Gostev »

Well, any extra real-time data processing will obviously slow things down - the only question is by how much.

Larissa@MSFT
Technology Partner
Posts: 1
Liked: 2 times
Joined: May 11, 2020 5:58 pm
Full Name: Larissa Knight
Contact:

Re: Windows 2019, large REFS and deletes

Post by Larissa@MSFT » 2 people like this post

Hi AuGL and EdgarRicharte,

I am a peer of Andrew’s at Microsoft - please PM me the support case number when you open it. I'll see what I can do from my side.

Thanks!

Cullan
Service Provider
Posts: 120
Liked: 17 times
Joined: May 21, 2014 8:47 am
Location: New Zealand
Contact:

Re: Windows 2019, large REFS and deletes

Post by Cullan »

I am in a similar situation to EricB. I have had a Veeam case (#04133657) for a few weeks and recently ended by testing using a provided tool block-clone-spd.exe which essentially confirmed it was a ReFS issue. They recommended opening a MS case but before I do that I might try the registry settings recommended above by Gostev.

bbuchan
Service Provider
Posts: 8
Liked: 5 times
Joined: May 19, 2016 3:45 pm
Full Name: Bryan Buchan
Contact:

Re: Windows 2019, large REFS and deletes

Post by bbuchan »

I am experiencing the exact issues described by the other people on here. Performance is great for a few weeks then acts like everything hits a wall. Started seeing 114% CPU usage and all synthetic operations are crawling. Server 2019 1809 with latest patches, 3 x 50TB ReFS volumes on H740P RAID card, with 192GB of RAM and 1x Xeon Silver 4110 8c16T, and all suggested registry settings.

mkretzer
Expert
Posts: 678
Liked: 159 times
Joined: Dec 17, 2015 7:17 am
Contact:

Re: Windows 2019, large REFS and deletes

Post by mkretzer »

What i find confusing... High CPU usage was never an issue with our various ReFS problems we had in the last 2 years. Perhaps this is something new?

AuGL
Enthusiast
Posts: 51
Liked: 3 times
Joined: May 07, 2019 12:22 am
Full Name: Glenn
Contact:

Re: Windows 2019, large REFS and deletes

Post by AuGL »

It's not high CPU thats a problem for me, more that I notice things crawling that Memory usage is higher at idle. During options everything looks fine CPU and Memory wise, just that the Veeam operations have slowed to a crawl.

AuGL
Enthusiast
Posts: 51
Liked: 3 times
Joined: May 07, 2019 12:22 am
Full Name: Glenn
Contact:

Re: Windows 2019, large REFS and deletes

Post by AuGL »

Out of interest, all of you posting recently about having problems, what version of Server 2019 are you using?
Our Veeam repository is on Server 2019 "1809" with the latest patches.

Post Reply

Who is online

Users browsing this forum: Bing [Bot], Majestic-12 [Bot] and 56 guests