Availability for the Always-On Enterprise
Gostev
Veeam Software
Posts: 23215
Liked: 2977 times
Joined: Jan 01, 2006 1:01 am
Location: Baar, Switzerland
Contact:

ReFS state post September 2018 Windows Updates

Post by Gostev » Oct 15, 2018 2:37 pm 4 people like this post

I am starting the new thread for ReFS feedback now that all the teething issues with its stability and memory consumption seem to have been resolved as of refs.sys driver version 2457.

We want to keep this discussion consumable and on-topic, so before you post your feedback or issues in this thread, please be sure your backup repository meets the following requirements:

Operating System
Using the following OS versions will ensure you're not running into any known ReFS issues:
• Windows Server 2019 (not officially supported until Update 4, so use it for lab environments only please).
• Windows Server 2016 patched to at least September 2018 updates (KB4343884 or any later one, since Windows Updates are cumulative).
• Windows 10 Pro for Workstations
You can double-check that you're on the right patch level by verifying that your refs.sys driver version is 2457 or later.

Hardware
Make sure your backup repository server meets minimum system requirements, particularly around:
• CPU: we recommend allocating at least 1 core per each concurrent backup proxy task, and at least 1 core for each two concurrent repository tasks.
• RAM: if the backup repository server is running multiple Veeam roles, please add up memory requirements of each individual role.
• Storage: we recommend formatting the ReFS volume with 64KB block size.

3rd Party Software
Uninstall the following 3rd party software that have been reported to cause ReFS stability and/or performance issues:
• 3rd party antiviruses (but not Windows Defender)
• Microsoft Configuration Manager Client
If possible at all, we recommend that you start troubleshooting ReFS issues from performing a clean installation of an operating system from the original Windows installation ISO from MSDN, to ensure no bloatware is installed along from vendor-provided installation media. Please create the dedicated topics if you'd like to discuss some specific incompatibilities with the specific 3rd party software, and we will update this master post with results of the corresponding discussions.

For more information on what we know so far, I am reposting the snippet of my "Word" section of the Veeam forum digest from a few weeks ago:
Gostev wrote:We have now completed the stress testing of ReFS driver version 2457 (released 1 month ago as a part of KB4343884). As a reminder, this was a milestone update that brought the backport of ReFS driver memory management fixes from Windows Server 2019 branch. So we decided to really put this driver through its paces in our stress testing lab to validate those changes. Just in case you're wondering why we tested the original version, and not the most current from the latest Windows update - this is because this sort of stability testing takes a few weeks. But the future driver versions of course include all these changes too, anyway. And long story short, I'm happy to report that ReFS remained stable no matter what challenges we threw at it.

The last stress test was particularly impressive, because it was done on a 40TB ReFS repository with just 8GB RAM in the server – which is our minimum system requirements and way below the current recommendation of 1GB per 1TB. Backup job configuration was also the toughest for ReFS to handle with over a hundred VMs, per-VM backup file chains and weekly synthetic fulls enabled for – meaning, the retention process had to delete hundreds of backup files with the total virtual size of 60TB at once - configuration which pretty much guaranteed server lockups before. But the latest ReFS driver chew through this like a piece of cake, with no spikes for either CPU or RAM and no system freezes – very impressive with just 8GB physical RAM. In fact, top RAM usage was just 5.4GB (metafile maximums were 4.5 GB total and 2.1GB active), while CPU load of the 4 vCPU VM was hovering at around 10%.

Does it mean this is finally the end of the ReFS troubles, at least at a wide scale? Anecdotally, we're already starting to receive such confirmations anyway. For example, one of our solution architects has two identical ReFS repositories in his lab with vastly different behavior. The one on Windows 10 Pro for Workstations (which is based on a newer Windows build) has been rock sold - so much that he was quite tempted to suggest customers use Windows 10 instead of Windows 2016! But an identical setup with Windows 2016 was still locking up every now and then up until KB4343884 was installed. So, I'm very optimistic about it, even if of course only the field experience will be able to confirm the resolution for sure. But until then, I recommend "business as usual" particularly in regards to physical RAM on your backup repository servers.
Thanks!

Mgamerz
Enthusiast
Posts: 62
Liked: 8 times
Joined: Sep 29, 2017 8:07 pm
Contact:

Re: ReFS state post September 2018 Windows Updates

Post by Mgamerz » Oct 15, 2018 4:32 pm

Maybe pin this thread so it's at the top?

Iain_Green
Service Provider
Posts: 137
Liked: 8 times
Joined: Dec 05, 2014 2:13 pm
Full Name: Iain Green
Contact:

Re: ReFS state post September 2018 Windows Updates

Post by Iain_Green » Oct 16, 2018 7:41 am

Running Windows Server 2016 OS Build 14393.2551 fully patched.

Yet refs.sys shows the following:

last modified 30/08/2018
File Version 10.0.14393.2515
Product version 10.0.14393.2515

If I try and run the KB4343884 found from https://www.catalog.update.microsoft.co ... =KB4343884 I get "The update is not applicable to your computer"?

How do I get to REFS version .2457?
Many thanks

Iain Green

opg70
Influencer
Posts: 21
Liked: 3 times
Joined: Oct 06, 2013 8:48 am
Contact:

Re: ReFS state post September 2018 Windows Updates

Post by opg70 » Oct 16, 2018 10:32 am

.2515 is more recent and includes all the specified fixes, you don't need .2457

Iain_Green
Service Provider
Posts: 137
Liked: 8 times
Joined: Dec 05, 2014 2:13 pm
Full Name: Iain Green
Contact:

Re: ReFS state post September 2018 Windows Updates

Post by Iain_Green » Oct 16, 2018 10:35 am

@opg70 thanks for confirming.
Many thanks

Iain Green

dimaslan
Service Provider
Posts: 34
Liked: 5 times
Joined: Jul 01, 2017 8:02 pm
Full Name: Dimitris Aslanidis
Contact:

Re: ReFS state post September 2018 Windows Updates

Post by dimaslan » Oct 16, 2018 5:38 pm

I am checking the refs.sys properties and I see nothing in the details tab. Do I need to change something first?

Gostev
Veeam Software
Posts: 23215
Liked: 2977 times
Joined: Jan 01, 2006 1:01 am
Location: Baar, Switzerland
Contact:

Re: ReFS state post September 2018 Windows Updates

Post by Gostev » Oct 16, 2018 9:15 pm 1 person likes this post

Wow! Something is seriously messed up with your OS then... I've no explanation to this.

akornow
Lurker
Posts: 1
Liked: 1 time
Joined: Mar 13, 2018 6:56 pm
Full Name: Alessandro Kornowski
Contact:

Re: ReFS state post September 2018 Windows Updates

Post by akornow » Oct 18, 2018 10:52 pm 1 person likes this post

Hi there

I´m new on forum, but every monday morning i take few minutes to read with very attention gostev news.
They are exactly what we expect from techincal news.Good job Gostev, Make me proud to be veeam partner.

Now back to real life... I wish to ask if someone faced after sudden windows server 2016 restart a REFS volume being marked as RAW (Is an 10TB VHDX attached to Veeam VM and used as backup repository) and if there is any way to bring it back to life.
As i said it is backup repository for long term backups so... there is no backup for backup.
We oppened a ticket to MS, wait, wait, wait a little bit more something like 28 Hours and after that they simply tell us in 5 minute call that there is nothing that they can do except restore from backup or try to find some third party rescue tool... :( :evil: :evil:

Thanks in advance

Alessandro

Gostev
Veeam Software
Posts: 23215
Liked: 2977 times
Joined: Jan 01, 2006 1:01 am
Location: Baar, Switzerland
Contact:

Re: ReFS state post September 2018 Windows Updates

Post by Gostev » Oct 19, 2018 12:08 pm

Thank you for your kind words, Alessandro!

Yes, I heard about these types of complete volume loss situations from the ReFS team, they saw it in support with non-HCL repository server hardware (specifically, RAID controllers without battery-backed write cache). The issue is caused when certain critical ReFS metadata that is already in flight is lost due to a sudden restart, before it actually lands to disks. Enterprise-grade RAID controllers take care of this by automatically committing any pending/unconfirmed writes sitting in BBWC to disk once the server restarts.

anton
Novice
Posts: 5
Liked: 1 time
Joined: Oct 04, 2011 7:22 am
Full Name: Anton van der Linden
Contact:

Re: ReFS state post September 2018 Windows Updates

Post by anton » Oct 29, 2018 2:26 pm

We've intalled KB4462928 on three ReFS based repository servers.
For two of them this didn't give any issues; but on our largest repository we noticed that everything slowed down after installation.

The response times on the ReFS volume went really high (>1000 ms)

After we removed the KB from this repository server the performance was OK again.
With KB4462928 the refs driver was on 2515, currently we have version 2363 which is stable.

We are using Windows Storage spaces (JBOD), the virtual disk has a Mirrored layout and thin provisioning.

Anyone else experiencing this after installing KB4462928?

Gostev
Veeam Software
Posts: 23215
Liked: 2977 times
Joined: Jan 01, 2006 1:01 am
Location: Baar, Switzerland
Contact:

Re: ReFS state post September 2018 Windows Updates

Post by Gostev » Oct 29, 2018 2:40 pm

Version 2515 is from mid-September Windows update, so it's been around for a while and has no known issues with it.
However, not too many Veeam users use Storage Spaces for backup repository, so this can be the culprit in your case.

Frenchyaz
Novice
Posts: 6
Liked: 2 times
Joined: Nov 01, 2018 8:32 pm
Contact:

Re: ReFS state post September 2018 Windows Updates

Post by Frenchyaz » Nov 01, 2018 8:54 pm

Hi everyone,

I am a new Veeam customer and of course during the installation Veeam recommends Refs 64K... It is the first server that we installed with Refs so I was a little bit anxious as to how it may mess up with the backups.

So far, I can't say if it has been good or bad as we haven't started yet production backup on it.

I just wanted to put some information on our configuration in case somebody has one similar and ran into issues.

ESXi host 6.7U1 in standalone server which contains only one VM

VM Guest: Windows server 2016 with up to date patches from Windows Update
Veeam Backup and Replication 9.5U3
SQL Server 2016 SP2 CU2 standard
64GB memory reserved
16vCPU
System OS on 1st Paravirtual controller, VMDK 4K
SQL on 2nd Paravirtual controller, VMDK 64K
Pass through HBA 16GB (*2)
RDM Disks (3 * 64TB volume formatted with Refs)

Preliminary tests show a sustain backup speed of 800MB/s up to 1GB, seems like our slowest link is our SAN...

We haven't tested multiple backups yet however we should pretty soon after we set up our backup policy. Coming from Netbackup, it is a bit of a learning curve however speedwise, Veeam blows away Netbackup.

So if you are using Veeam in this configuration and have issues with Refs, feel free to post here in order to see what can be done in order to avoid pitfalls. :D
I'll continue to update here after further backups, hopefully soon...

olavl
Novice
Posts: 4
Liked: 1 time
Joined: Jan 23, 2018 8:21 am
Full Name: Olav Langeland
Contact:

Re: ReFS state post September 2018 Windows Updates

Post by olavl » Nov 05, 2018 1:10 pm

anyone with experience running Veeam + ReFS with deduplication?

micha2k6
Lurker
Posts: 1
Liked: never
Joined: Aug 22, 2018 12:30 pm
Full Name: Michael Wameling
Contact:

Re: ReFS state post September 2018 Windows Updates

Post by micha2k6 » Nov 12, 2018 10:51 am

akornow wrote:
Oct 18, 2018 10:52 pm
Hi there

I´m new on forum, but every monday morning i take few minutes to read with very attention gostev news.
They are exactly what we expect from techincal news.Good job Gostev, Make me proud to be veeam partner.

Now back to real life... I wish to ask if someone faced after sudden windows server 2016 restart a REFS volume being marked as RAW (Is an 10TB VHDX attached to Veeam VM and used as backup repository) and if there is any way to bring it back to life.
As i said it is backup repository for long term backups so... there is no backup for backup.
We oppened a ticket to MS, wait, wait, wait a little bit more something like 28 Hours and after that they simply tell us in 5 minute call that there is nothing that they can do except restore from backup or try to find some third party rescue tool... :( :evil: :evil:

Thanks in advance

Alessandro
Hi Allessandro,

This happened to me twice so far. My Veeam Jobs failed with the error: "Synthetic full backup creation failed Error: The volume repair was not successful. Failed to create or open file [....vbk]." and after a reboot the volume is gone and marked as RAW.
We tried to resolve the issue with Microsoft and Dell (as the repository is on a Dell physical machine) but both were more or less pointing to the other one. We haven't been able to recover anything from the disks and we ended up updating all firmware versions and drivers to the latest ones and fully reinitialized the raid volume (not quick initialize) and afterwards did a full slow formatting of the whole disk. Now we're hoping for the best. Both incidents happened on our biggest repositories (~80TB) and they both were only roughly 50% filled, both server 2016 on identical HW configuration.
Repository servers are Dell R730 (2x Xeon E5-2630, 96GB RAM, 80TB raid volume on a Perc H730P Mini).
The second time this happened for me was just beginning of November, exactly 2 days after I installed the 2018-10 and 2018-09 Cumulative Updates but we still have AV from McAfee installed on the servers.
Gostev wrote:
Oct 19, 2018 12:08 pm
Yes, I heard about these types of complete volume loss situations from the ReFS team, they saw it in support with non-HCL repository server hardware (specifically, RAID controllers without battery-backed write cache). The issue is caused when certain critical ReFS metadata that is already in flight is lost due to a sudden restart, before it actually lands to disks. Enterprise-grade RAID controllers take care of this by automatically committing any pending/unconfirmed writes sitting in BBWC to disk once the server restarts.
After reading this I had a closer look at the event log of our repository server. There was a clean reboot after the patches got installed and ~ 2 hours later I have the first entries from ReFS in the event log. First checksum errors that the system was able to correct and right after that checksum errors on the same file that could not be corrected.

Michael

l0stb@ackup
Influencer
Posts: 14
Liked: 4 times
Joined: Jul 19, 2018 2:10 am
Contact:

Re: ReFS state post September 2018 Windows Updates

Post by l0stb@ackup » Nov 29, 2018 1:05 am

Gostev wrote:
Oct 15, 2018 2:37 pm
3rd Party Software
Uninstall the following 3rd party software that have been reported to cause ReFS stability and/or performance issues:
• Microsoft Configuration Manager Client
Do we have more information about this issue, symptoms and/or workarounds?
We have two Veeam servers with ReFS repositories and use SCCM to patch all our Windows servers, we've been holding off on installing the agent on the Veeam servers and patching them manually so far because of this note. Thanks

Post Reply

Who is online

Users browsing this forum: AlbieNorth, Bing [Bot], Majestic-12 [Bot] and 77 guests