Host-based backup of VMware vSphere VMs.
Post Reply
nunciate
Veteran
Posts: 257
Liked: 40 times
Joined: May 21, 2013 9:08 pm
Full Name: Alan Wells
Contact:

Blue Screen with Server 2019 REFS and Dedup

Post by nunciate »

Well, things have been running great for months now and I don't understand what is happening. Suddenly last week my backup repository server just started rebooting. We run an HP Apollo 4510 with REFS formatted volumes and Windows Server 2019. I have deduplication enabled as a General File server and like I said it has been fine and then Boom!, blue screens about every 1-2 hours for the last 5 days. Analysis of the dump files shows the following info below. Specifically, I noticed IMAGE_NAME: dedup.sys and FAILURE_BUCKET_ID: AV_dedup!DdpStreamStorePrepareForPagingIoEx.

Does anyone have any ideas here? This is located in my DR location and since I have backups of all this data already in production I decided to take the thing offline and disabled deduplication on all volumes. That didn't help. I have run smart array diagnostics and don't see any issues. I don't see any signs of bad drives or anything like that. I also don't see anything in the event logs that give me any hint as to the issue. I then tried running an unoptimized command on all volumes but that appeared as though it would take over a week to run. Finally, I took the nuclear option. I reformatted all of my volumes back to nothing so they are clean and clear with no deduplication and it is still disabled. Restarted Active full backups and within an hour another blue screen. Currently, I have removed the deduplication services completely from the server and have rebooted so I'll try again but I don't have much faith that it will help me.


#Logs removed by Mod (mildur)
It's not allowed to post Logs in the forums. Thanks for following the rules.
nunciate
Veteran
Posts: 257
Liked: 40 times
Joined: May 21, 2013 9:08 pm
Full Name: Alan Wells
Contact:

Re: Blue Screen with Server 2019 REFS and Dedup

Post by nunciate »

BTW, I forgot to post that my Veeam version is 11.0.1.1261.
My Repository OS Version is 17763.3046 Server 2019 v1809.
The server is fully patched and up to date.
nunciate
Veteran
Posts: 257
Liked: 40 times
Joined: May 21, 2013 9:08 pm
Full Name: Alan Wells
Contact:

Re: Blue Screen with Server 2019 REFS and Dedup

Post by nunciate »

Removing deduplication from the server did not resolve the issue so I don't know what else to do here other than reload the entire O/S. I can't find any hardware issues so I guess I'll just reinstall Windows at this point but I don't actually think that is going to fix it.
Andreas Neufert
VP, Product Management
Posts: 7081
Liked: 1511 times
Joined: May 04, 2011 8:36 am
Full Name: Andreas Neufert
Location: Germany
Contact:

Re: Blue Screen with Server 2019 REFS and Dedup

Post by Andreas Neufert »

Mildur
Product Manager
Posts: 9848
Liked: 2607 times
Joined: May 13, 2017 4:51 pm
Full Name: Fabian K.
Location: Switzerland
Contact:

Re: Blue Screen with Server 2019 REFS and Dedup

Post by Mildur »

Hi Alan

I have removed the bluescreen output. Uploading logs is not allowed in this forum.
Please try out Andreas link.

If it not helps, open a support case with our veeam support as required for technical issues and provide the case number.
Without case number, the topic will eventually be deleted by moderators.

Best regards,
Fabian

PS: support can only help if you upload logs https://www.veeam.com/kb1832
Product Management Analyst @ Veeam Software
nunciate
Veteran
Posts: 257
Liked: 40 times
Joined: May 21, 2013 9:08 pm
Full Name: Alan Wells
Contact:

Re: Blue Screen with Server 2019 REFS and Dedup

Post by nunciate »

Ah sorry about that. Well at this point I have removed the OS partitions and reinstalled Windows. I am patching the OS now and will redeploy the Veema services. Crossing my fingers that this fixes the issue.
Andreas Neufert
VP, Product Management
Posts: 7081
Liked: 1511 times
Joined: May 04, 2011 8:36 am
Full Name: Andreas Neufert
Location: Germany
Contact:

Re: Blue Screen with Server 2019 REFS and Dedup

Post by Andreas Neufert »

I would not go that path as it is clearly a filesystem issue. If you have finished installing windows and updating the OS to at least a version/patch level higher than the old operating system, run the commands to repair the filesystem as listed in my link.
nunciate
Veteran
Posts: 257
Liked: 40 times
Joined: May 21, 2013 9:08 pm
Full Name: Alan Wells
Contact:

Re: Blue Screen with Server 2019 REFS and Dedup

Post by nunciate »

Well I have run out of things to try here. I have tried all of the following with no luck figuring this out.
Anytime I run backup jobs my HPE Apollo 4510 G9 just reboots. No sign of why.
I get no hardware alerts, no event logs no dump files. Nothing that points to an issue.

I built a SAN volume and attached it directly to this server.
I copied a 500 Gb file from the SAN volume to the physical drives. No issue. So that means I can copy data over the fiber connection via the PCI bus to the local disk with no issues.
I also copied data over the network with no issues.
BTW, this physical server also acts as my proxy for replication jobs. The replication jobs appear to run without issue.

I booted into the HPE diagnostics console and ran all the tests there. No issues with anything reported. Memory and CPU are solid.
Looking at the ILO connection there are no issues reported there. The logs there just show the reboots when they happen.

I cleared out the entire SmartArray controller, and reinstalled the entire OS, rebuilt the disk arrays from scratch and reformated the partitions (both REFS and NTFS) and I still get reboots. Yes I had to remove and recreate the repositories in Veeam when I switched between REFS and NTFS.

I am at a complete loss here. It feels like a hardware issue but honestly, I cannot find one anywhere.
nunciate
Veteran
Posts: 257
Liked: 40 times
Joined: May 21, 2013 9:08 pm
Full Name: Alan Wells
Contact:

Re: Blue Screen with Server 2019 REFS and Dedup

Post by nunciate »

I opened Case #05507694
Mildur
Product Manager
Posts: 9848
Liked: 2607 times
Joined: May 13, 2017 4:51 pm
Full Name: Fabian K.
Location: Switzerland
Contact:

Re: Blue Screen with Server 2019 REFS and Dedup

Post by Mildur »

Thanks for the case number. I'm glad, that you found a workaround with the network mode.
Let's wait for the log analysis from the support team.
Product Management Analyst @ Veeam Software
Regnor
VeeaMVP
Posts: 1007
Liked: 314 times
Joined: Jan 31, 2011 11:17 am
Full Name: Max
Contact:

Re: Blue Screen with Server 2019 REFS and Dedup

Post by Regnor »

We've seen a similar issue with an Apollo G9 many years ago, but we did have a different bluescreen message and also no deduplication.
The problem also started after the system ran stable for some time, without any change which could be matched to the issue.
We also suspected the hardware or a bug with ReFS, but Microsoft and HPE were pointing at each other, and only Veeam Support analyzed the bluescreen.
Unfortunately (or fortunately?) the problems disappeared after a long time and never came back; so our finaly conclusion was that it got solved by some Windows update.

While it's a different case than yours, I thought that perhaps it's interesting to share this.
Just make sure that firmware/driver/software from HPE is up-to-date and also install any pending Windows updates.
At some point we also thought it could be a load problem. I'm not sure how far a backup job gets in your case, but you could try to limit the storage load inside the repository configuration.
nunciate
Veteran
Posts: 257
Liked: 40 times
Joined: May 21, 2013 9:08 pm
Full Name: Alan Wells
Contact:

Re: Blue Screen with Server 2019 REFS and Dedup

Post by nunciate »

Some new info we discovered yesterday. Changing this Proxy Transport Setting to Network Only mode allows backups to run fine. Setting it to Direct Access (SAN) mode and the server reboots within minutes of starting active full backup jobs.

So it has something to do with the fiber connectivity but I still can't figure out what. I have fully tested my connection to the SAN by manually copying files across from SAN volumes to local volumes.
What we have in that server is a single dual port fiber card. We have 1 port going to 1 fiber switch and another going to another fiber switch. Same with the SAN. We have multiple controllers with 4 ports on each controller 2 going to each fiber switch. Been this way for 10 years. I think I will pull 1 cable at a time on my backup server and test my fiber connection again and see if I can identify anything that way. Otherwise I'll maybe replace that card.
kikoen
Lurker
Posts: 1
Liked: never
Joined: Jun 13, 2017 9:38 am
Contact:

Re: Blue Screen with Server 2019 REFS and Dedup

Post by kikoen »

I have a similar case, analysis of the crash dumps returns the same results.
One backupserver with Windows 2019, Refs, dedup enabled that also mysteriously reboots once every 24h.
Another backupserver, similar setup but with NTFS, that does not reboot.
Both servers have their backup storage attached by iSCSI, and are VMs running on VMware vSphere using the hotadd backup method.

The one that reboots has all windows updates installed.
The one that doesn't has the following updates pending to be installed:
- Security Intelligence Update for Microsoft Defender Antivirus - KB2267602 (Version 1.369.510.0)
- Windows Malicious Software Removal Tool x64 - v5.102 (KB890830)
- 2022-06 Cumulative Update for .NET Framework 3.5, 4.7.2 and 4.8 for Windows Server 2019 for x64 (KB5014805)
- 2022-05 Cumulative Update for Windows Server 2019 (1809) for x64-based Systems (KB5013941)

We suspect that a Windows update introduced the problem.
joern.schonhoff
Lurker
Posts: 2
Liked: 1 time
Joined: Feb 03, 2020 9:50 pm
Full Name: Jörn S.
Contact:

Re: Blue Screen with Server 2019 REFS and Dedup

Post by joern.schonhoff » 1 person likes this post

I had a very similar problem with a ReFS + Dedup repository. According to the crash report the error was in FLTMGR.SYS.
The "solution" for the affected server was to extend the Windows Defender exceptions to the entire drive. This is not a problem because the drive only has Veeam backup data on it.
m.novelli
Veeam ProPartner
Posts: 566
Liked: 103 times
Joined: Dec 29, 2009 12:48 pm
Full Name: Marco Novelli
Location: Asti - Italy
Contact:

Re: Blue Screen with Server 2019 REFS and Dedup

Post by m.novelli »

nunciate wrote: Jun 26, 2022 5:57 pm BTW, I forgot to post that my Veeam version is 11.0.1.1261.
My Repository OS Version is 17763.3046 Server 2019 v1809.
The server is fully patched and up to date.
Please update all BIOS / Firmware of the Server: BIOS , controller RAID, backplane, CPLD, disk firmware, network...
It could be a faulty RAM memory, check in your BIOS if you have a "memory testing" feature
Also check if there is an updated driver for your RAID controller

Marco
mkeating44
Influencer
Posts: 13
Liked: 3 times
Joined: Jun 07, 2022 10:57 pm
Full Name: Michael Keating
Contact:

Re: Blue Screen with Server 2019 REFS and Dedup

Post by mkeating44 »

We've seeing the same after last month's updates, specifically when a dedup job finishes. There have been ongoing issues with Windows and ReFS over the last year but this is the first time we have been affected.

Unfortunately we are moving off the dedup storage so I don't have an answer. We did update with this month's update and haven't seen a BSOD but we stopped all our jobs so haven't seen it since.
hayliz
Service Provider
Posts: 35
Liked: 5 times
Joined: Jun 27, 2022 8:12 am
Full Name: Abdull
Contact:

Re: Blue Screen with Server 2019 REFS and Dedup

Post by hayliz »

joern.schonhoff wrote: Jul 04, 2022 7:38 am I had a very similar problem with a ReFS + Dedup repository. According to the crash report the error was in FLTMGR.SYS.
The "solution" for the affected server was to extend the Windows Defender exceptions to the entire drive. This is not a problem because the drive only has Veeam backup data on it.
We had the same problem. Have also as you described, the folder where the backups were excluded and then was quiet again.
After a detailed investigation it turned out that Defender has crashed the dedup driver.
govi
Expert
Posts: 101
Liked: 8 times
Joined: Sep 26, 2017 11:38 am
Full Name: Govinda Naik
Contact:

Re: Blue Screen with Server 2019 REFS and Dedup

Post by govi »

Hi,

I have experience similar issue with Windows 11 OS Pro 23H2.

We were simulating disaster recover procedure. Installed Latest version of Veeam Backup and Replication software on Windows 11 OS and tried to attached iSCSI Drive( NAS Synology) which is formatted using ReFS.

We have not seen this issue on Windows 10 Pro OS as we have tried from Windows 10 Pro laptop as well.
nunciate
Veteran
Posts: 257
Liked: 40 times
Joined: May 21, 2013 9:08 pm
Full Name: Alan Wells
Contact:

Re: Blue Screen with Server 2019 REFS and Dedup

Post by nunciate » 1 person likes this post

I realized I never posted about fixing this.

Ultimately I did fix this issue but it took replacing pretty much every piece of electronics in the chassis.
We have a 3rd party warranty on this and they replaced the network cards, the FC cards, the backplane to the Chasis (where the server plugs in)
None of that seemed to help. Then they replaced the main board on the server module itself and that seemed to stabilize everything.
This thin has been running since my last post so ultimately it was not a Veeam issue nor a Dedup issue but a hardware issue as I expected.
Just could never figure out which thing it was util we threw part at it.
Post Reply

Who is online

Users browsing this forum: No registered users and 23 guests