Comprehensive data protection for all workloads
Adrian_C
Service Provider
Posts: 21
Liked: 5 times
Joined: Sep 10, 2013 6:56 am
Contact:

Re: Windows 2019, large REFS and deletes

Post by Adrian_C »

Hi Team!

I've stumbled across this thread today, and I think I too am experiencing this issue on our Cloud Connect repository server since we upgraded to 2019 to take advantage of ReFS. Boy what a disappointing experience that was. We have had nothing but issues and performance issues since.

I've had a support ticket open with Veeam for 2 months now and not once has this thread been mentioned or the fact that there are known performance issues with 2019 and ReFS :(

We are also running 2019 LTS so therefore 1809. I've tried to apply the KB 4534321 however I'm told it is not applicable to my server. I'm assume it is because we have the most recent cumulative updates installed. (Windows is reporting no updates available). Is it safe to say that I have the fix in this update installed as I have the more recent cumulative update installed?

Today I was able to resolve the issue by using Rammay to clear the system working set, which is a step in the right direction. But I need a more robust solution. Is there anyway that I can get my hands on 1909 release? Downgrading to 2016 isn't an option as my volumes are ReFS 3.4 :(

Any assistance would be greatly appreciated.
dasfliege
Service Provider
Posts: 238
Liked: 53 times
Joined: Nov 17, 2014 1:48 pm
Full Name: Florin
Location: Switzerland
Contact:

Re: Windows 2019, large REFS and deletes

Post by dasfliege »

Hi Adrian_C

If you run any cumulative later then apil 19, you should also have the ReFS fix in place. Check the version of your refs.sys driver. Latest LTS version is 10.0.17763.1369

Did you also applied largeworkingsettrim and deletenotify? If not, execute the following two commands in an elevated command prompt:
fsutil behavior set DisableDeleteNotify ReFS 1
REG ADD HKLM\System\CurrentControlSet\Control\FileSystem /v RefsEnableLargeWorkingSetTrim /t REG_DWORD /d 1
Adrian_C
Service Provider
Posts: 21
Liked: 5 times
Joined: Sep 10, 2013 6:56 am
Contact:

Re: Windows 2019, large REFS and deletes

Post by Adrian_C »

Hi dasfliege,

My refs.sys version is 10.0.17763.1369, DisableDeleteNotify is set to 1, and the registry setting is applied :(

I'm thinking that Core 1909 might be our only option if I can work out how to get my hands on it.
dasfliege
Service Provider
Posts: 238
Liked: 53 times
Joined: Nov 17, 2014 1:48 pm
Full Name: Florin
Location: Switzerland
Contact:

Re: Windows 2019, large REFS and deletes

Post by dasfliege »

We've ordered two completely new backup stacks, each consisting of a 32-core proliant dl360 and a NetApp E-Series with 60x 14TB NL-SAS. Seemed to be the only solution to get rid of the problems without loosing one year of backuphistory on our old backupservers. Parts should arrive in the next days.
We now have to think about how to set them up and i guess i won't go with 1809 anymore. I thought about setting up the physics with 1909 or 2004, attach the E-Series via iSCSI to serve the repository and use it as Hyper-V server, which runs the veeam server as a 1809 VM. Does anyone has a similar setup?
Seve CH
Enthusiast
Posts: 69
Liked: 32 times
Joined: May 09, 2016 2:34 pm
Full Name: JM Severino
Location: Switzerland
Contact:

Re: Windows 2019, large REFS and deletes

Post by Seve CH »

Hi Dasfliege,
1. Why are you virtualizing Veeam in that setup? I think that ReFS is resource intensive and delicate enough to add more factors to the equation
2. Your REFS problems with Veeam will be on Veeam server VM. I think it should be your Veeam VM the one running the modern Windows edition and not your hypervisor system. In fact, you do not need REFS for your Hyper-V machine at all.

If you are going virtual, I would consider using XFS instead of ReFS: a Veeam Server VM without storage and a Linux VM (Ubuntu?) with XFS storage as Linux repo on the same Hyper-V host.
dasfliege
Service Provider
Posts: 238
Liked: 53 times
Joined: Nov 17, 2014 1:48 pm
Full Name: Florin
Location: Switzerland
Contact:

Re: Windows 2019, large REFS and deletes

Post by dasfliege »

@Seve CH
I would virtualize the Veeam server because 2004 is core only.
The REFS problems wont be on the veeam server VM, as the physical 2004 server has the storage attached and acts as repository server for the virtual veeam installation.
I didn't considered XFS as it's integration is still quite new and i do not have any experience with it. I therefore doubt that it is the best solution to use for a backup that needs to be absolutely bulletproof because we have to guarantee SLA to 30+ customers that are running on this platform. Even though we've encountered several problems with ReFS, we do have MS premier support at hands to help us out. Don't know about the quality and speed of ubuntu support if something goes terribly wrong.

I've created a bigpic of how the solution may could look like. Hope that help to understand my thoughts. I'm also open to any other inputs from people with similar sized infrastructure who have configurations running that work perfect. Just for reference, we currently have a backup volume of 500TB (>1PB with ReFS savings)

Image
Adrian_C
Service Provider
Posts: 21
Liked: 5 times
Joined: Sep 10, 2013 6:56 am
Contact:

Re: Windows 2019, large REFS and deletes

Post by Adrian_C »

Hi Team,

I'm getting a copy of Windows Server 20H2. Has anyone ran this yet?

I was planing on removing my OS HDD's from the server and installing Server 20H2 on fresh OS drives, while keeping my repository drives as they are. This would hopefully give me a roll back to Server 2019 (1809) if it all goes belly up.

Is the ReFS version the same (3.4) from Server 2019 and 20H2?

How hard will it be to get my VBR server to pick up the existing repositories on the "rebuilt" repository server?

Thanks in advance
Seve CH
Enthusiast
Posts: 69
Liked: 32 times
Joined: May 09, 2016 2:34 pm
Full Name: JM Severino
Location: Switzerland
Contact:

Re: Windows 2019, large REFS and deletes

Post by Seve CH » 1 person likes this post

dasfliege wrote: Jan 06, 2021 9:58 am I would virtualize the Veeam server because 2004 is core only.
The REFS problems wont be on the veeam server VM, as the physical 2004 server has the storage attached and acts as repository server for the virtual veeam installation.
I didn't considered XFS as it's integration is still quite new and i do not have any experience with it. I therefore doubt that it is the best solution to use for a backup that needs to be absolutely bulletproof because we have to guarantee SLA to 30+ customers that are running on this platform.
Hi dasfliege. For a bulletproof solution, I wouldn't go to an "untested" 2004 Windows Server. Windows 2016 is the bulletproof system reference at this moment. In my experience, it works like a charm with several 200-300TB ReFS repos on the same server after years of use. 2004 has also a very short lifespan and you will need to upgrade it several times compared to W2016. Just make sure you have enough RAM (192GB in my servers, but 128GB should do too), and System Center agents uninstalled (not just disabled, but uninstalled).

There could be other reasons I am not aware of, but the only reason I see for what you are building is to run several client Veeam VMs on the same host while sharing the same backend repo. Other than that, I see it as very complicated system and complications = less reliability.

XFS+reflink repos are still very new. We will see. We are deploying a 400TB XFS repo this month. Our tests with a 120TB one was very good (synthetic fulls every 2 days for more than a month and working fine...).
d3nz
Expert
Posts: 130
Liked: 14 times
Joined: Mar 20, 2018 12:47 pm
Contact:

Re: Windows 2019, large REFS and deletes

Post by d3nz »

Windows 2016 is the bulletproof system reference at this moment. In my experience, it works like a charm with several 200-300TB ReFS repos on the same server after years of use.
Seve CH,
Microsoft have another opinion about it and recommending to use Win Svr 2019 LTS instead of Win Svr 2016 LTS (post394383.html#p394383).
soncscy
Veteran
Posts: 643
Liked: 312 times
Joined: Aug 04, 2019 2:57 pm
Full Name: Harvey
Contact:

Re: Windows 2019, large REFS and deletes

Post by soncscy » 1 person likes this post

@d3nz,

Sorry, but I think that MS engineer is full of non-sense.

The conditions for that article are very specific: https://docs.microsoft.com/en-us/troubl ... sive#cause
DPM uses loopback-mounted-VHDs. These appear like normal disks to the OS. Therefore, these disks are displayed in Windows Explorer, Diskmgt, and other GUI tools. These tools periodically poll the disks to make sure that they are functioning correctly. This causes IOs to be sent down the loopback stack to the ReFS volume. If the ReFS volume is busy, these IOs will have to wait. Therefore, when ReFS performs a long-duration operation, such as flushing or a large block-clone call, these IOs will have to wait longer. When these IOs are stuck, the UI of Explorer or Diskmgt won't be refreshed. As a result, it appears like the disks are hung or dismounted.
Additionally, the loopback mount miniport driver (vhdmp) starts generating warning events if any IOs don't complete within 30 seconds.
For sure this is not the setup that I've seen with clients who had similar issues, and furthermore, looks like there are tweaks for the very issue linked.

Unless there's something I'm fundamentally misunderstanding and somehow the loopback mounted VHD issue would persist to DAS or other similar setups, I think that MS engineer is blowing smoke up someone's ass; it sounds fancy and makes someone think they know what they're talking about, but it's complete garbage afaict. Furthermore, I cannot understand why they wouldn't recommend available tweaks.

edit: the Veeam kb doesn't even exist anymore https://www.veeam.com/kb3136. I get a 404.
Gostev
Chief Product Officer
Posts: 31524
Liked: 6700 times
Joined: Jan 01, 2006 1:01 am
Location: Baar, Switzerland
Contact:

Re: Windows 2019, large REFS and deletes

Post by Gostev »

Please read my replies to the post linked by d3nz. They also explain why the KB is gone ;)
Adrian_C
Service Provider
Posts: 21
Liked: 5 times
Joined: Sep 10, 2013 6:56 am
Contact:

Re: Windows 2019, large REFS and deletes

Post by Adrian_C »

I'd like to provide an update on my situation.

Firstly I'm very grateful for finding this article as I wasn't getting very far with Tech Support after 2 months.

I'm still running 1809 build of Server 2019, with all the recommended registry settings applied. I have also implemented the script mentioned earlier to clear the system working set every 15 minutes.

So far things have been stable. I'm not having issues every Monday night / Tuesday morning (highest load on my repository server due to no backups on weekends) anymore. I've had 2 weeks straight where I havne't had to reboot my repository to resolve the issue and allow client backups to complete in an acceptable time.

It is a scabby disgusting band-aid, but it is working for now :)
d3nz
Expert
Posts: 130
Liked: 14 times
Joined: Mar 20, 2018 12:47 pm
Contact:

Re: Windows 2019, large REFS and deletes

Post by d3nz »

I'm still running 1809 build of Server 2019, with all the recommended registry settings applied. I have also implemented the script mentioned earlier to clear the system working set every 15 minutes.
Adrian_C,
Thanks for your feedback!
Could you specify what the registry settings you applied?
What the script you are using? Is it script for cleaning RAM?
Adrian_C
Service Provider
Posts: 21
Liked: 5 times
Joined: Sep 10, 2013 6:56 am
Contact:

Re: Windows 2019, large REFS and deletes

Post by Adrian_C » 2 people like this post

Hi d3nz,

I've set the following:
sutil behavior set DisableDeleteNotify ReFS 1
REG ADD HKLM\System\CurrentControlSet\Control\FileSystem /v RefsEnableLargeWorkingSetTrim /t REG_DWORD /d 1

And I'm running the ClearFSCache script every 15 minutes
http://www.toughdev.com/content/2015/05 ... -metafile/

So far so good. That is 2 months of my life I won't get back!
popjls
Enthusiast
Posts: 54
Liked: 5 times
Joined: Jun 25, 2018 3:41 am
Contact:

Re: Windows 2019, large REFS and deletes

Post by popjls »

And I'm running the ClearFSCache script every 15 minutes
I think this will fix my issue too. Would be nice to see Veeam implement this "feature" natively.
dasfliege
Service Provider
Posts: 238
Liked: 53 times
Joined: Nov 17, 2014 1:48 pm
Full Name: Florin
Location: Switzerland
Contact:

Re: Windows 2019, large REFS and deletes

Post by dasfliege » 3 people like this post

So, we've setup our new backup stacks last weekend. We keep our old stack for another year for restore purposes.
On the new stack, we're using Server 2004 as primary repo and hardened XFS for copyjobs. All repo servers are running on an Hyper-V machine an get their disks passed through from a NetApp E-Series.
It's working quite good so far, but no syntetic operations have been performed so far. Will let you know if any problems appear in the coming weeks.
PeterC
Enthusiast
Posts: 45
Liked: 12 times
Joined: Apr 10, 2018 2:24 pm
Full Name: Peter Camps
Contact:

Re: Windows 2019, large REFS and deletes

Post by PeterC » 1 person likes this post

I would just like to share the results of some tests we did on our HPE Apollo 4200 with Server 2019 (1809). We are still having major performance issues during merge operations.

These are the test result we did yesterday on the 200 TB ReFS volume.

C:\Temp\block-clone-spd-0.3.1>block-clone-spd.exe S:\temp\ 50
block-clone-spd utility, v0.3.1. Vsevolod Zubarev 2018-19.
Volume file system is ReFS.
Block cloning is available.
Cluster size is 65536 bytes.
Free space available: 75676.949 GiB.
Will create three files 50 GiB each, for a total of 150 GiB.
Writing random file "S:\temp\01.data"...
███████████████████████████████████████████████████████████████████████████████████████████████████████████ 51200/51200
Writing random file "S:\temp\02.data"...
███████████████████████████████████████████████████████████████████████████████████████████████████████████ 51200/51200
Writing new file "S:\temp\cloned.data" via block cloning...
All block cloning took 56.988s.
Average speed: 898.438 MiB/s


After this we installed the HyperV role on this server and created a VM (16 x vCPU and 48 GB RAM) with Server 2016 (all latest updates). We placed this VM on the big ReFS volume we are using for the backups. We added a 5 TB volume to this VM (also hosted on the same local volume) and formatted it 64k ReFS.

After this we tested with the same tool and settings in the VM.

C:\Temp\block-clone-spd-0.3.1>block-clone-spd.exe s:\temp\ 50
block-clone-spd utility, v0.3.1. Vsevolod Zubarev 2018-19.
Volume file system is ReFS.
Block cloning is available.
Cluster size is 65536 bytes.
Free space available: 5090.705 GiB.
Will create three files 50 GiB each, for a total of 150 GiB.
Writing random file "s:\temp\01.data"...
███████████████████████████████████████████████████████████████████████████████████████████████████████████ 51200/51200
Writing random file "s:\temp\02.data"...
███████████████████████████████████████████████████████████████████████████████████████████████████████████ 51200/51200
Writing new file "s:\temp\cloned.data" via block cloning...
All block cloning took 4.211s.
Average speed: 12159.057 MiB/s

The performance gain is almost unbelievable! We did the tests several times and every time we ended up above 10000 MiB/s. So we are going to figure a way to create a couple of VM’s and move the 150 TB of data from the Host into the VM’s.
Steve-nIP
Service Provider
Posts: 123
Liked: 52 times
Joined: Feb 06, 2018 10:08 am
Full Name: Steve
Contact:

Re: Windows 2019, large REFS and deletes

Post by Steve-nIP »

That's completely insane. I can barely believe it's still so bad in the long term support build of 2019.
dasfliege
Service Provider
Posts: 238
Liked: 53 times
Joined: Nov 17, 2014 1:48 pm
Full Name: Florin
Location: Switzerland
Contact:

Re: Windows 2019, large REFS and deletes

Post by dasfliege » 1 person likes this post

@PeterC
As i reported above, we run quite a similar setup now. Physical hosts just serving RAW storage and act as Hyper-V server, running several VMs for B&R console and Repos.
Instead of Server 2016, we run 2019 build2004 for the repo. Still nothing to complain in terms of performance.

This setup seems like the way to go for optimal cost-effectiveness ratio right now.
gummett
Veteran
Posts: 404
Liked: 106 times
Joined: Jan 30, 2017 9:23 am
Full Name: Ed Gummett
Location: Manchester, United Kingdom
Contact:

Re: Windows 2019, large REFS and deletes

Post by gummett »

A word of caution - it's possible block-clone-spd isn't reporting data correctly from inside a VM. For example, Microsoft's diskspd certainty doesn't (or didn't)
Ed Gummett (VMCA)
Senior Specialist Solutions Architect, Storage Technologies, AWS
(Senior Systems Engineer, Veeam Software, 2018-2021)
JaySt
Service Provider
Posts: 415
Liked: 75 times
Joined: Jun 09, 2015 7:08 pm
Full Name: JaySt
Contact:

Re: Windows 2019, large REFS and deletes

Post by JaySt »

@gummet
i'm not aware of such issues. do you have a source or more information?
Veeam Certified Engineer
gummett
Veteran
Posts: 404
Liked: 106 times
Joined: Jan 30, 2017 9:23 am
Full Name: Ed Gummett
Location: Manchester, United Kingdom
Contact:

Re: Windows 2019, large REFS and deletes

Post by gummett »

@JaySt I can't see any good comments online about this, as I couldn't when I hit the issue a few years back.
It is noted in the FAQ section of https://www.veeam.com/kb2014, however, so it can't have just been me!

Any diskspd run within a Hyper-V VM produced insanely high figures, and I noted that the dynamic VDHX I was using didn't grow to the size specified with the -c flag - I wondered if somehow the zeros that diskspd writes by default were being ignored. IOMeter didn't have the same discrepancy.

Anyway, my point is just to be aware benchmarks inside a VM don't always give a valid result, so be careful. I do hope your work in this area helps.
Ed Gummett (VMCA)
Senior Specialist Solutions Architect, Storage Technologies, AWS
(Senior Systems Engineer, Veeam Software, 2018-2021)
NightBird
Expert
Posts: 242
Liked: 57 times
Joined: Apr 28, 2009 8:33 am
Location: Strasbourg, FRANCE
Contact:

Re: Windows 2019, large REFS and deletes

Post by NightBird » 1 person likes this post

On a DELL R7515 Full Flash server (AMD Epyc 7302p (16 cores) 64GB and 11x3,84TB SSD) Windows 2019 1809 full patched (March)
block-clone-spd utility, v0.3.1. Vsevolod Zubarev 2018-19.
Volume file system is ReFS.
Block cloning is available.
Cluster size is 65536 bytes.
Free space available: 10798.100 GiB.
Will create three files 50 GiB each, for a total of 150 GiB.
Writing random file "d:\test\01.data"...
███████████████████████████████████████████████████████████████████████████████████████████████████████████ 51200/51200
Writing random file "d:\test\02.data"...
███████████████████████████████████████████████████████████████████████████████████████████████████████████ 51200/51200
Writing new file "d:\test\cloned.data" via block cloning...
All block cloning took 11.683s.
Average speed: 4382.474 MiB/s

The same on DELL R7415 (for BCJ) (AMD Epyc 7281 (16 cores) 64GB and 12x4TB Raid6 (10+2)) Windows 2019 1809 full patched (March)
block-clone-spd utility, v0.3.1. Vsevolod Zubarev 2018-19.
Volume file system is ReFS.
Block cloning is available.
Cluster size is 65536 bytes.
Free space available: 16422.121 GiB.
Will create three files 50 GiB each, for a total of 150 GiB.
Writing random file "d:\test\01.data"...
███████████████████████████████████████████████████████████████████████████████████████████████████████████ 51200/51200
Writing random file "d:\test\02.data"...
███████████████████████████████████████████████████████████████████████████████████████████████████████████ 51200/51200
Writing new file "d:\test\cloned.data" via block cloning...
All block cloning took 56.663s.
Average speed: 903.581 MiB/s
JRRW
Enthusiast
Posts: 76
Liked: 45 times
Joined: Dec 10, 2019 3:59 pm
Full Name: Ryan Walker
Contact:

Re: Windows 2019, large REFS and deletes

Post by JRRW »

Has anyone seen issues with running the fsutil behavior set DisableDeleteNotify ReFS 1 on Storage Spaces Tiered storage? How does that influence a tiered system with an SSD 'fast tier' if trim is disabled.
christer.bergstrom
Lurker
Posts: 1
Liked: 1 time
Joined: Oct 14, 2019 10:58 am
Contact:

Re: Windows 2019, large REFS and deletes

Post by christer.bergstrom » 1 person likes this post

As an alternative to running the ClearFSCache script every 15 minutes, we have scheduled a task that runs RAMMap every 15 minutes. RAMMap.exe -Et
That works well for emptying REFS cached memory.
PeterC
Enthusiast
Posts: 45
Liked: 12 times
Joined: Apr 10, 2018 2:24 pm
Full Name: Peter Camps
Contact:

Re: Windows 2019, large REFS and deletes

Post by PeterC »

Just curious, we upgraded to v11 (P20210324) and the last few days we see that all backups are ready much earlier than they used to be.
Maybe it is a bit to soon to start celebrating, but was just wondering if any of you guys have already upgraded and see the same performance improvement.
Gostev
Chief Product Officer
Posts: 31524
Liked: 6700 times
Joined: Jan 01, 2006 1:01 am
Location: Baar, Switzerland
Contact:

Re: Windows 2019, large REFS and deletes

Post by Gostev »

Of course, because synthetic fulls are much faster on ReFS in V11... see the What's New document.
yara12
Service Provider
Posts: 13
Liked: 2 times
Joined: Oct 25, 2018 11:33 am
Full Name: Yaroslav
Contact:

Re: Windows 2019, large REFS and deletes

Post by yara12 »

And what about regular merges (forever increment)?
Gostev
Chief Product Officer
Posts: 31524
Liked: 6700 times
Joined: Jan 01, 2006 1:01 am
Location: Baar, Switzerland
Contact:

Re: Windows 2019, large REFS and deletes

Post by Gostev »

These will see some benefit, but not significant. However, you should not be using those with ReFS in any case. The default backup mode with periodic synthetic fulls is the recommended one for both ReFS and XFS.
PeterC
Enthusiast
Posts: 45
Liked: 12 times
Joined: Apr 10, 2018 2:24 pm
Full Name: Peter Camps
Contact:

Re: Windows 2019, large REFS and deletes

Post by PeterC »

All our jobs are forever forward incrementals and we see a big improvement since upgrading to v11.
The last week we have seen that almost all jobs are done by 07:30, where in the past it would be at 11:00 - 13:30.
So that is, in our case, pretty significant.
In the past we have tested a little bit with synthetic fulls, but we noticed a significant increase of storage usage so returned back to the forward incrementals. I don't know if this has been improved also.
Post Reply

Who is online

Users browsing this forum: Baidu [Spider], tyler.jurgens and 3 guests