Comprehensive data protection for all workloads
Post Reply
dasfliege
Service Provider
Posts: 70
Liked: 16 times
Joined: Nov 17, 2014 1:48 pm
Location: Switzerland
Contact:

Re: Windows 2019, large REFS and deletes

Post by dasfliege »

Still no success here. Also with the second private refs.sys, the performance is still poor and memory gets eaten over time. MS has found another thing that may cause these issues and told me that it's not a big deal to fix it. So maybe i will receive another private for testing in the next days.

mkretzer
Expert
Posts: 675
Liked: 156 times
Joined: Dec 17, 2015 7:17 am
Contact:

Re: Windows 2019, large REFS and deletes

Post by mkretzer »

@dasfliege: Have you ever tried adding more RAM? Both privates work absolutely perfectly for us!
I will still request the next one as well!

spiritie
Expert
Posts: 103
Liked: 12 times
Joined: Mar 01, 2016 10:16 am
Full Name: Gert van Niekerk
Location: Denmark
Contact:

Re: Windows 2019, large REFS and deletes

Post by spiritie »

Anyone know what the new recommended amount of RAM is?

I can't really justify 700GB RAM for our 700TB repo.

Gostev
SVP, Product Management
Posts: 26706
Liked: 4277 times
Joined: Jan 01, 2006 1:01 am
Location: Baar, Switzerland
Contact:

Re: Windows 2019, large REFS and deletes

Post by Gostev »

The core issue with Server 2019 LTSC cannot be fixed with any amount of RAM.
mkretzer has basically infinite RAM, but still needed a private fix to make his LTSC deployment work.

mkretzer
Expert
Posts: 675
Liked: 156 times
Joined: Dec 17, 2015 7:17 am
Contact:

Re: Windows 2019, large REFS and deletes

Post by mkretzer »

Unlimited RAM... Indeed :-)

Seriously: My Feeling is that anything > 384 GB does not help alot. We had a repo server with 128 GB, then 384 GB, 768 GB and now our primary repo has 2,2 TB. The most effect from RAM alone we got was with the jump from 128 GB to 384 GB.

Still, our backup copy repo with 384 GB with the special hotfix from MS is running slightly better than the 2,2 TB system.

dasfliege
Service Provider
Posts: 70
Liked: 16 times
Joined: Nov 17, 2014 1:48 pm
Location: Switzerland
Contact:

Re: Windows 2019, large REFS and deletes

Post by dasfliege »

Hmm. I actually have seen no reason to extend our memory so far. Even if ReFS processes are taking a lot, it never was fully saturated. But i may consider that, if we can't come to a final solution here within a reasonable amount of time. I guess i should have a few bricks laying arounf somewhere :-)

I just received private #3 today. They have located another "memory hogger", which should be removed now. I'm going to install the private today and hope to have any observations until tomorrow. I'm on vacation for the next three weeks, so there may not be much attention to this forum from my side. My colleague will further treat that case with MS in the meantime.

mkretzer
Expert
Posts: 675
Liked: 156 times
Joined: Dec 17, 2015 7:17 am
Contact:

Re: Windows 2019, large REFS and deletes

Post by mkretzer »

Ok. I also got it. Lets see how it behaves with "unlimited RAM" :-)

dasfliege
Service Provider
Posts: 70
Liked: 16 times
Joined: Nov 17, 2014 1:48 pm
Location: Switzerland
Contact:

Re: Windows 2019, large REFS and deletes

Post by dasfliege » 1 person likes this post

I've just upgraded our server from 192GB to 384GB of RAM. I will wait with the installation of the private #3 until tomorrow. Wanna see if just upgrading memory does have any impact.

mkretzer
Expert
Posts: 675
Liked: 156 times
Joined: Dec 17, 2015 7:17 am
Contact:

Re: Windows 2019, large REFS and deletes

Post by mkretzer » 1 person likes this post

Sounds interesting! Looking forward hearing your results. Private #3 is installed, first backups should start in an hour.

shane.gray.cd
Lurker
Posts: 2
Liked: 2 times
Joined: Apr 22, 2019 8:56 pm
Full Name: Shane Gray
Contact:

Re: Windows 2019, large REFS and deletes

Post by shane.gray.cd »

Thank goodness for this article on REFS issues on Server 2019... I was ready to toss the solution out the window. :twisted:

I found the following link to a Microsoft article https://support.microsoft.com/en-us/hel ... is-running specifying registry settings to add. I implemented all of the registry settings as listed with the exception of these ones.
  • RefsNumberOfChunksToTrim > Set value to 128 as my volume is larger than 10TB
  • DuplicateExtentBatchSizeinMB (Only applicable to Microsoft Data Protection Manager) > I did not apply this as I am not using DPM
I also made sure that I had the following version of Server 2019 that contains the REFS improvements > Release Date:January 23, 2020 Version:OS Build 17134.1276

After applying the registry changes and restarting our Cloud Connect Copy jobs started working properly again. It has been running overnight and there appear to be no issues so far. CPU and RAM usage has dropped in half with the changes.

I am running Server 2019 guest VM on Hyper-V 2019 and my repository is a Synology RS2418+ setup as iSCSI LUN running MPIO Thick Provisioned over 1GbE with total available space of 87.2TB > Used Space of 69.3TB and we have over 50 clients that actively use our Cloud Connect service.

DerOest
Enthusiast
Posts: 60
Liked: 27 times
Joined: Oct 30, 2015 10:10 am
Contact:

Re: Windows 2019, large REFS and deletes

Post by DerOest » 8 people like this post

Just writing some offtopic info here on Read Only Friday ;-)
This could as well go into the ReFS Horror Story thread or i could start a XFS-reflink-love thread :D

When we got new non-Netapp-Backup-storage last year, we started with Server 2019 ReFS and experienced absolutely bad performance.
Reinstalled with Server 2016 ReFS -> performance was way better, in fact so ok'ish that we started using it for Veeam storage (while continuing to run Netapp Snapshots & Snapvault to Secondary, previously we wrote the Veeam backups to Netapps, too, via SMB3)

Server 1 - Where are we today
  • ~180 TB Backups, with GFS reaching back ~9 months now
  • Still Windows Server 2016 ReFS
  • FastClone still works very well, but Tape backup speed sucks hard
  • --> write performance is good, we can easily write 500+MB/s
  • --> read performance sucks, we only get ~200 MB/s when running Tape backups ;-( - and the bad read speeds would absolutely KILL US in case of a desaster restore scenario, with such bad performance, we'd be copying for weeks!
We recently added some more disks to that storage server and configured those as a separate raid6/ReFS volume.
Dumping new backups there delivers the same good write performance, while also having way better read performance (also 500+MB/s)

My conclusion: read speed degrades the longer the backup chain reaches back (a.k.a 5+TB fileserver-VM, weeklies/monthlies going back a year)


Server 2 - Along comes Veeam V10 and the Linux XFS reflink goodness ;-)
Instead of Windows Server & ReFS horror stories, we've been waiting for the Linux love...personally, i trust way more in XFS/BTRFS/TFS than the ReFS-mess.
As we had planned to get a secondary storage server for replicas all along, we now got a second storage server (same as above) and currently test Ubuntu 19.10 as a storage repository with XFS reflinks.
That first test worked out nice, just as expected.

Now for speed-tests, i've added the Linux repo to our current v9.5u4b server, performed a backup copy, and then did Tape speed tests.
  • Tapes are attached to Server1 (2016 ReFS)
  • backups sent over the network from Server2 (Ubuntu, XFS)
  • -> Tape speed is 400MB/s+ from XFS, where ReFS only delivered ~200MB/s!
    (I now hope for XFS to not suffer the same performance degradation that ReFS shows with aging/long GFS backup chains)

Yes, i feel bad, i've complained (just recently, too...) that Veeam delivers poor Tape backup speeds. Turns out the biggest problem is ReFS being garbage....
I'm sorry for that, Anton ;-)

For everyone else... go try XFS with reflinks, IMHO absolutely time to trash ReFS

Our strategy is now:
  • Establish Veeam B&R Server v10 as a new VM with storage-repo on Server 2 (Ubuntu/XFS)
  • -> Start using the v10 "primary and GFS in one job" to ramp up required retention periods
  • Old Server 1 (ReFS):
    a) wait for retention period to expire on the previous primary backup chains to free up space
    b) start dumping backup copies from the new server here
    c) wait for the retention period of GFS to expire (1 year+ :( :? )
    d) trash Windows Server ReFS, reinstall with Ubuntu & XFS :D

shane.gray.cd
Lurker
Posts: 2
Liked: 2 times
Joined: Apr 22, 2019 8:56 pm
Full Name: Shane Gray
Contact:

Re: Windows 2019, large REFS and deletes

Post by shane.gray.cd » 2 people like this post

shane.gray.cd wrote: Feb 20, 2020 8:23 pm Thank goodness for this article on REFS issues on Server 2019... I was ready to toss the solution out the window. :twisted:

I found the following link to a Microsoft article https://support.microsoft.com/en-us/hel ... is-running specifying registry settings to add. I implemented all of the registry settings as listed with the exception of these ones.
  • RefsNumberOfChunksToTrim > Set value to 128 as my volume is larger than 10TB
  • DuplicateExtentBatchSizeinMB (Only applicable to Microsoft Data Protection Manager) > I did not apply this as I am not using DPM
I also made sure that I had the following version of Server 2019 that contains the REFS improvements > Release Date:January 23, 2020 Version:OS Build 17134.1276

After applying the registry changes and restarting our Cloud Connect Copy jobs started working properly again. It has been running overnight and there appear to be no issues so far. CPU and RAM usage has dropped in half with the changes.

I am running Server 2019 guest VM on Hyper-V 2019 and my repository is a Synology RS2418+ setup as iSCSI LUN running MPIO Thick Provisioned over 1GbE with total available space of 87.2TB > Used Space of 69.3TB and we have over 50 clients that actively use our Cloud Connect service.
Still working great after the registry changes if anyone is looking for a fix the above worked for me. :D

evilaedmin
Expert
Posts: 159
Liked: 24 times
Joined: Jul 26, 2018 8:04 pm
Full Name: Eugene V
Contact:

Re: Windows 2019, large REFS and deletes

Post by evilaedmin »

DerOest wrote: Feb 21, 2020 2:25 pm My conclusion: read speed degrades the longer the backup chain reaches back (a.k.a 5+TB fileserver-VM, weeklies/monthlies going back a year)
Many experience this and it's a real shame that there is no solution. It appears over time what used to be mostly-sequential reads become more like mostly-random reads.

Has anyone tried accelerating a Veeam ReFS repository with Flash?

DonZoomik
Expert
Posts: 226
Liked: 57 times
Joined: Nov 25, 2016 1:56 pm
Contact:

Re: Windows 2019, large REFS and deletes

Post by DonZoomik »

I have LSI CacheCade with RAID60. 291TB usable filesystem with ~1,5T RAID1 cache.
AFAIK Storage Spaces parity sucks event with flash cache. I had a few ideas in that regard but never got my hands on enough hardware to really test it out.

evilaedmin
Expert
Posts: 159
Liked: 24 times
Joined: Jul 26, 2018 8:04 pm
Full Name: Eugene V
Contact:

Re: Windows 2019, large REFS and deletes

Post by evilaedmin »

I have LSI CacheCade with RAID60. 291TB usable filesystem with ~1,5T RAID1 cache.
How helpful is CacheCade? Can it be turned on / off for easy comparison?

ND40oz
Influencer
Posts: 14
Liked: 1 time
Joined: Nov 17, 2010 12:52 am
Full Name: ND40oz
Contact:

Re: Windows 2019, large REFS and deletes

Post by ND40oz » 1 person likes this post

I have two all SSD arrays, a 6 disk and a 8 disk in one of my repositories made up of 2TB Micron 1300s just to see how things would perform with all flash. Still run into issues where it'll take 72 hours to do full backup chain transformation on a single 1.9TB VM backup with a change rate of about 300GB a day. I also have dedupe enabled and I imagine that's also contributing.

DonZoomik
Expert
Posts: 226
Liked: 57 times
Joined: Nov 25, 2016 1:56 pm
Contact:

Re: Windows 2019, large REFS and deletes

Post by DonZoomik » 1 person likes this post

evilaedmin wrote: Feb 22, 2020 5:42 pmHow helpful is CacheCade? Can it be turned on / off for easy comparison?
It can be turned off.
There was a whitepaper about HPE Apollo 4200 Gen9 (that I can't find anymore) that said that SmartCache (MicroSemi maxCache) improved performance (I don't remember exactly in what regard). This was pre-ReFS but I'm quite sure that it can't hurt. So I just drank the Kool-Aid. The real test would be large instant restores but I hope I don't have to do that too often. :lol:
I also have a CacheCade array at homelab/server - 2*12TB RAID1 (HGST Helium SATA) + 2*400GB SAS SSD (Some random HGST/EMC ones from eBay) and it makes quite a difference. It's older (9261 vs 93something at work) but technology is the same.

Some random reference from this forum: post358698.html#p358698 You can probably find more.

softflame
Novice
Posts: 3
Liked: 1 time
Joined: May 29, 2019 5:32 am
Full Name: Peter Jackson
Contact:

Re: Windows 2019, large REFS and deletes

Post by softflame » 1 person likes this post

I see they have also started on the path of allowing block cloning for deduplication appliances:

From the release notes:
Quantum DXi block cloning integration: Added experimental support for Quantum DXi native block cloning functionality that is based on v10 advanced XFS integration. This functionality is pending Quantum’s internal validation, and must be enabled on the storage side by Quantum support.

mweissen13
Service Provider
Posts: 53
Liked: 20 times
Joined: Dec 28, 2017 3:22 pm
Full Name: Michael Weissenbacher
Contact:

Re: Windows 2019, large REFS and deletes

Post by mweissen13 » 2 people like this post

DerOest wrote: Feb 21, 2020 2:25 pm Server 2 - Along comes Veeam V10 and the Linux XFS reflink goodness ;-)
Just wanna thank you for this first-hand experience. I have been eager to try out XFS support when it's available and after reading your story I am assured to be trying it out ASAP.

poulpreben
Veeam Vanguard
Posts: 1003
Liked: 431 times
Joined: Jul 23, 2012 8:16 am
Full Name: Preben Berg
Contact:

Re: Windows 2019, large REFS and deletes

Post by poulpreben »

mkretzer wrote: Feb 19, 2020 3:36 pm Sounds interesting! Looking forward hearing your results. Private #3 is installed, first backups should start in an hour.
Hey! How is it going with the most recent patch? We're really struggling with some of our synthetic full backups, especially since installing the 2020-02 CU, even though it should have improved things.

mkretzer
Expert
Posts: 675
Liked: 156 times
Joined: Dec 17, 2015 7:17 am
Contact:

Re: Windows 2019, large REFS and deletes

Post by mkretzer »

2020-02 should improve something??

The fix still works perfectly in our system!! Today we will do a bigger test - we will merge data from 3 days all in one day and at the same time delete some GFS points. If the system survives this i really think the issues are fixed!

Do you have a case open with MS?

poulpreben
Veeam Vanguard
Posts: 1003
Liked: 431 times
Joined: Jul 23, 2012 8:16 am
Full Name: Preben Berg
Contact:

Re: Windows 2019, large REFS and deletes

Post by poulpreben »

The public ReFS patch released in January is included in 2020-02.

I don't have a case. We are an MSP, and the MS contracts are owned by our customers. To this date, I still haven't been able to ever successfully log a case with Microsoft. I guess I will have to try again, because this is really starting to become a huge issue for us. I was naively hoping that you guys' work make faster progress.

mkretzer
Expert
Posts: 675
Liked: 156 times
Joined: Dec 17, 2015 7:17 am
Contact:

Re: Windows 2019, large REFS and deletes

Post by mkretzer »

Progress is one thing - the other thing is the microsoft patch team.

Gostev tried to explain to them why it is imperative to get the patch out fast. But it might take some time....

Perhaps Gostev (or even i) can ask they can provide you the hotfix - but officially it is not 100 % safe for production. For us there was simply no alternative as we had upgraded from 2016 to 2019 and there was no way back for our backup copy target other then reformat and transfer everything again over WAN.

mkretzer
Expert
Posts: 675
Liked: 156 times
Joined: Dec 17, 2015 7:17 am
Contact:

Re: Windows 2019, large REFS and deletes

Post by mkretzer » 1 person likes this post

Wow this driver is impressive. Currently several backups are merging, several backups files are beeing deleted while at the same time backups get transferred with 800 MB/s with bottkeneck "network".

Just while this all was going on REFS "garbage collected" 8 TB of heavily fragmented data in one hour!

And now the best: from our 384 GB of RAM only 11 are being used.

DonZoomik
Expert
Posts: 226
Liked: 57 times
Joined: Nov 25, 2016 1:56 pm
Contact:

Re: Windows 2019, large REFS and deletes

Post by DonZoomik »

This ReFS problem seems to also hugely slow down defragment and especially it's analyze phase. Analyze phase blocks some Veeam actions and causes jobs to fail. It took ~16 hours to run until I canceled it and added RefsEnableLargeWorkingSetTrim registry key for testing. Today it took less than 4 hours - with defragmentation.
Defrag helps with long-term slowdown - however you shouldn't use it with synthetic fulls (only forever incrementals) as it cancels out block sharing.

PeterC
Enthusiast
Posts: 25
Liked: 7 times
Joined: Apr 10, 2018 2:24 pm
Full Name: Peter Camps
Contact:

Re: Windows 2019, large REFS and deletes

Post by PeterC »

Yesterday we patched our HPE Apollo 4200 (Windows Server 2019 (1809 - 17763.1075) 218 TB storage and 256 GB RAM) with KB4537818 which gave us a newer refs.sys (10.0.17763.1075). We have been using this as a Veeam repository server, which in the beginning worked like a charm.
But over time jobs were taking longer all the time. What we mostly see is that during merge operations the network traffic degrades dramatically.
Normally running several jobs we see 2 to 10 Gbps traffic, but it will degrade to Kbps when merges start. We also see that it is spiking to whole time going from 0 to 300-500 Mbps. When the merges are done it will become faster and consistent again.
We have been moving jobs to other repositories (Server 2016) but don't have enough storage left to drain the system and eventually reinstall it. Even with less jobs, it still is processing backupjobs in the morning.

And unfortunately this morning gave the same view as the days before, so the newer refs.sys did not help us in this situation. Just to be complete, we also are using the next settings;

ReFS DisableDeleteNotify = 1 (Enabled)
RefsDisableDeleteNotification = 1
RefsDisableLastAccessUpdate = 1
RefsEnableLargeWorkingSetTrim = 1
RefsNumberOfChunksToTrim = 128

YouGotServered
Service Provider
Posts: 104
Liked: 20 times
Joined: Mar 11, 2016 7:41 pm
Full Name: Cory Wallace
Contact:

Re: Windows 2019, large REFS and deletes

Post by YouGotServered » 2 people like this post

In my technology career, I am usually the one to adopt new methods and technologies "too quickly", so I don't think I can be accused of being one of those people stuck in the past. I say that because every time the community quiets down about ReFS, I think that it's finally time to give it a shot, then something else happens, and we end up with a 250+ post forum about it.

NTFS space utilization kills me, but it seems better than dealing with ReFS issues. I know we suffer from the bias of only hearing the problems and not the success stories, but I see way more ReFS related issues than I do NTFS. I understand NTFS is significantly older and more mature, but that's the reason I feel that I need to just keep on waiting on ReFS - it is apparently still very immature at this point.

Being a service provider with several clients, if I greenlight ReFS to my clients, it's inevitable that we're going to have issues somewhere, and it always seem like resolutions from MS take a lot of time (that I don't have) and a lot of arm twisting.

Maybe one day I'll switch, but until then, this will have to be my first-ever technology hold-out.

poulpreben
Veeam Vanguard
Posts: 1003
Liked: 431 times
Joined: Jul 23, 2012 8:16 am
Full Name: Preben Berg
Contact:

Re: Windows 2019, large REFS and deletes

Post by poulpreben » 3 people like this post

Hehe, I can totally relate to your post, Cory. I kept on testing ReFS with almost every CU that came out for Server 2016, and told everyone and their dog to stay off of ReFS because I had some quite serious issues with it initially. I think it was around CU 2018-09 that it started to work pretty well. Then we had to setup some new repository servers and naively went with Server 2019. Oh, man... what a ride it has been.

That said, I managed to get the private hotfix that @mkretzer and @dasfliege had been testing through Microsoft support. They cannot make this stuff public soon enough. We had some weekly synthetic fulls that would easily take 36+ hours to complete, and now they finish in less than 10 minutes. It is pretty amazing.

I am trying to stay positive towards ReFS, but since v10 we are deploying Linux repositories with XFS whenever we can. Our experiences are similar to those of @DerOest.

YoMarK
Enthusiast
Posts: 51
Liked: 7 times
Joined: Jul 13, 2009 12:50 pm
Full Name: Mark
Location: The Netherlands
Contact:

Re: Windows 2019, large REFS and deletes

Post by YoMarK » 2 people like this post

YouGotServered wrote: Mar 03, 2020 10:49 pm Being a service provider with several clients, if I greenlight ReFS to my clients, it's inevitable that we're going to have issues somewhere, and it always seem like resolutions from MS take a lot of time (that I don't have) and a lot of arm twisting.
We're using Windows 2016 with ReFS(Synthetic Full's ) for years now. It's great. Just don't use Windows 2019. Problem solved up until end of life for window 2016 Enterprise on 1/11/2027.

JeremiahDDS
Service Provider
Posts: 11
Liked: 4 times
Joined: Mar 28, 2019 12:52 am
Full Name: Jeremiah Glover
Contact:

Re: Windows 2019, large REFS and deletes

Post by JeremiahDDS »

Can you provide this private patch?

Post Reply

Who is online

Users browsing this forum: Baidu [Spider], Bing [Bot], Google [Bot] and 20 guests