-
- Veteran
- Posts: 257
- Liked: 40 times
- Joined: May 21, 2013 9:08 pm
- Full Name: Alan Wells
- Contact:
Blue Screen with Server 2019 REFS and Dedup
Well, things have been running great for months now and I don't understand what is happening. Suddenly last week my backup repository server just started rebooting. We run an HP Apollo 4510 with REFS formatted volumes and Windows Server 2019. I have deduplication enabled as a General File server and like I said it has been fine and then Boom!, blue screens about every 1-2 hours for the last 5 days. Analysis of the dump files shows the following info below. Specifically, I noticed IMAGE_NAME: dedup.sys and FAILURE_BUCKET_ID: AV_dedup!DdpStreamStorePrepareForPagingIoEx.
Does anyone have any ideas here? This is located in my DR location and since I have backups of all this data already in production I decided to take the thing offline and disabled deduplication on all volumes. That didn't help. I have run smart array diagnostics and don't see any issues. I don't see any signs of bad drives or anything like that. I also don't see anything in the event logs that give me any hint as to the issue. I then tried running an unoptimized command on all volumes but that appeared as though it would take over a week to run. Finally, I took the nuclear option. I reformatted all of my volumes back to nothing so they are clean and clear with no deduplication and it is still disabled. Restarted Active full backups and within an hour another blue screen. Currently, I have removed the deduplication services completely from the server and have rebooted so I'll try again but I don't have much faith that it will help me.
#Logs removed by Mod (mildur)
It's not allowed to post Logs in the forums. Thanks for following the rules.
Does anyone have any ideas here? This is located in my DR location and since I have backups of all this data already in production I decided to take the thing offline and disabled deduplication on all volumes. That didn't help. I have run smart array diagnostics and don't see any issues. I don't see any signs of bad drives or anything like that. I also don't see anything in the event logs that give me any hint as to the issue. I then tried running an unoptimized command on all volumes but that appeared as though it would take over a week to run. Finally, I took the nuclear option. I reformatted all of my volumes back to nothing so they are clean and clear with no deduplication and it is still disabled. Restarted Active full backups and within an hour another blue screen. Currently, I have removed the deduplication services completely from the server and have rebooted so I'll try again but I don't have much faith that it will help me.
#Logs removed by Mod (mildur)
It's not allowed to post Logs in the forums. Thanks for following the rules.
-
- Veteran
- Posts: 257
- Liked: 40 times
- Joined: May 21, 2013 9:08 pm
- Full Name: Alan Wells
- Contact:
Re: Blue Screen with Server 2019 REFS and Dedup
BTW, I forgot to post that my Veeam version is 11.0.1.1261.
My Repository OS Version is 17763.3046 Server 2019 v1809.
The server is fully patched and up to date.
My Repository OS Version is 17763.3046 Server 2019 v1809.
The server is fully patched and up to date.
-
- Veteran
- Posts: 257
- Liked: 40 times
- Joined: May 21, 2013 9:08 pm
- Full Name: Alan Wells
- Contact:
Re: Blue Screen with Server 2019 REFS and Dedup
Removing deduplication from the server did not resolve the issue so I don't know what else to do here other than reload the entire O/S. I can't find any hardware issues so I guess I'll just reinstall Windows at this point but I don't actually think that is going to fix it.
-
- VP, Product Management
- Posts: 7081
- Liked: 1511 times
- Joined: May 04, 2011 8:36 am
- Full Name: Andreas Neufert
- Location: Germany
- Contact:
-
- Product Manager
- Posts: 9848
- Liked: 2607 times
- Joined: May 13, 2017 4:51 pm
- Full Name: Fabian K.
- Location: Switzerland
- Contact:
Re: Blue Screen with Server 2019 REFS and Dedup
Hi Alan
I have removed the bluescreen output. Uploading logs is not allowed in this forum.
Please try out Andreas link.
If it not helps, open a support case with our veeam support as required for technical issues and provide the case number.
Without case number, the topic will eventually be deleted by moderators.
Best regards,
Fabian
PS: support can only help if you upload logs https://www.veeam.com/kb1832
I have removed the bluescreen output. Uploading logs is not allowed in this forum.
Please try out Andreas link.
If it not helps, open a support case with our veeam support as required for technical issues and provide the case number.
Without case number, the topic will eventually be deleted by moderators.
Best regards,
Fabian
PS: support can only help if you upload logs https://www.veeam.com/kb1832
Product Management Analyst @ Veeam Software
-
- Veteran
- Posts: 257
- Liked: 40 times
- Joined: May 21, 2013 9:08 pm
- Full Name: Alan Wells
- Contact:
Re: Blue Screen with Server 2019 REFS and Dedup
Ah sorry about that. Well at this point I have removed the OS partitions and reinstalled Windows. I am patching the OS now and will redeploy the Veema services. Crossing my fingers that this fixes the issue.
-
- VP, Product Management
- Posts: 7081
- Liked: 1511 times
- Joined: May 04, 2011 8:36 am
- Full Name: Andreas Neufert
- Location: Germany
- Contact:
Re: Blue Screen with Server 2019 REFS and Dedup
I would not go that path as it is clearly a filesystem issue. If you have finished installing windows and updating the OS to at least a version/patch level higher than the old operating system, run the commands to repair the filesystem as listed in my link.
-
- Veteran
- Posts: 257
- Liked: 40 times
- Joined: May 21, 2013 9:08 pm
- Full Name: Alan Wells
- Contact:
Re: Blue Screen with Server 2019 REFS and Dedup
Well I have run out of things to try here. I have tried all of the following with no luck figuring this out.
Anytime I run backup jobs my HPE Apollo 4510 G9 just reboots. No sign of why.
I get no hardware alerts, no event logs no dump files. Nothing that points to an issue.
I built a SAN volume and attached it directly to this server.
I copied a 500 Gb file from the SAN volume to the physical drives. No issue. So that means I can copy data over the fiber connection via the PCI bus to the local disk with no issues.
I also copied data over the network with no issues.
BTW, this physical server also acts as my proxy for replication jobs. The replication jobs appear to run without issue.
I booted into the HPE diagnostics console and ran all the tests there. No issues with anything reported. Memory and CPU are solid.
Looking at the ILO connection there are no issues reported there. The logs there just show the reboots when they happen.
I cleared out the entire SmartArray controller, and reinstalled the entire OS, rebuilt the disk arrays from scratch and reformated the partitions (both REFS and NTFS) and I still get reboots. Yes I had to remove and recreate the repositories in Veeam when I switched between REFS and NTFS.
I am at a complete loss here. It feels like a hardware issue but honestly, I cannot find one anywhere.
Anytime I run backup jobs my HPE Apollo 4510 G9 just reboots. No sign of why.
I get no hardware alerts, no event logs no dump files. Nothing that points to an issue.
I built a SAN volume and attached it directly to this server.
I copied a 500 Gb file from the SAN volume to the physical drives. No issue. So that means I can copy data over the fiber connection via the PCI bus to the local disk with no issues.
I also copied data over the network with no issues.
BTW, this physical server also acts as my proxy for replication jobs. The replication jobs appear to run without issue.
I booted into the HPE diagnostics console and ran all the tests there. No issues with anything reported. Memory and CPU are solid.
Looking at the ILO connection there are no issues reported there. The logs there just show the reboots when they happen.
I cleared out the entire SmartArray controller, and reinstalled the entire OS, rebuilt the disk arrays from scratch and reformated the partitions (both REFS and NTFS) and I still get reboots. Yes I had to remove and recreate the repositories in Veeam when I switched between REFS and NTFS.
I am at a complete loss here. It feels like a hardware issue but honestly, I cannot find one anywhere.
-
- Veteran
- Posts: 257
- Liked: 40 times
- Joined: May 21, 2013 9:08 pm
- Full Name: Alan Wells
- Contact:
Re: Blue Screen with Server 2019 REFS and Dedup
I opened Case #05507694
-
- Product Manager
- Posts: 9848
- Liked: 2607 times
- Joined: May 13, 2017 4:51 pm
- Full Name: Fabian K.
- Location: Switzerland
- Contact:
Re: Blue Screen with Server 2019 REFS and Dedup
Thanks for the case number. I'm glad, that you found a workaround with the network mode.
Let's wait for the log analysis from the support team.
Let's wait for the log analysis from the support team.
Product Management Analyst @ Veeam Software
-
- VeeaMVP
- Posts: 1007
- Liked: 314 times
- Joined: Jan 31, 2011 11:17 am
- Full Name: Max
- Contact:
Re: Blue Screen with Server 2019 REFS and Dedup
We've seen a similar issue with an Apollo G9 many years ago, but we did have a different bluescreen message and also no deduplication.
The problem also started after the system ran stable for some time, without any change which could be matched to the issue.
We also suspected the hardware or a bug with ReFS, but Microsoft and HPE were pointing at each other, and only Veeam Support analyzed the bluescreen.
Unfortunately (or fortunately?) the problems disappeared after a long time and never came back; so our finaly conclusion was that it got solved by some Windows update.
While it's a different case than yours, I thought that perhaps it's interesting to share this.
Just make sure that firmware/driver/software from HPE is up-to-date and also install any pending Windows updates.
At some point we also thought it could be a load problem. I'm not sure how far a backup job gets in your case, but you could try to limit the storage load inside the repository configuration.
The problem also started after the system ran stable for some time, without any change which could be matched to the issue.
We also suspected the hardware or a bug with ReFS, but Microsoft and HPE were pointing at each other, and only Veeam Support analyzed the bluescreen.
Unfortunately (or fortunately?) the problems disappeared after a long time and never came back; so our finaly conclusion was that it got solved by some Windows update.
While it's a different case than yours, I thought that perhaps it's interesting to share this.
Just make sure that firmware/driver/software from HPE is up-to-date and also install any pending Windows updates.
At some point we also thought it could be a load problem. I'm not sure how far a backup job gets in your case, but you could try to limit the storage load inside the repository configuration.
-
- Veteran
- Posts: 257
- Liked: 40 times
- Joined: May 21, 2013 9:08 pm
- Full Name: Alan Wells
- Contact:
Re: Blue Screen with Server 2019 REFS and Dedup
Some new info we discovered yesterday. Changing this Proxy Transport Setting to Network Only mode allows backups to run fine. Setting it to Direct Access (SAN) mode and the server reboots within minutes of starting active full backup jobs.
So it has something to do with the fiber connectivity but I still can't figure out what. I have fully tested my connection to the SAN by manually copying files across from SAN volumes to local volumes.
What we have in that server is a single dual port fiber card. We have 1 port going to 1 fiber switch and another going to another fiber switch. Same with the SAN. We have multiple controllers with 4 ports on each controller 2 going to each fiber switch. Been this way for 10 years. I think I will pull 1 cable at a time on my backup server and test my fiber connection again and see if I can identify anything that way. Otherwise I'll maybe replace that card.
So it has something to do with the fiber connectivity but I still can't figure out what. I have fully tested my connection to the SAN by manually copying files across from SAN volumes to local volumes.
What we have in that server is a single dual port fiber card. We have 1 port going to 1 fiber switch and another going to another fiber switch. Same with the SAN. We have multiple controllers with 4 ports on each controller 2 going to each fiber switch. Been this way for 10 years. I think I will pull 1 cable at a time on my backup server and test my fiber connection again and see if I can identify anything that way. Otherwise I'll maybe replace that card.
-
- Lurker
- Posts: 1
- Liked: never
- Joined: Jun 13, 2017 9:38 am
- Contact:
Re: Blue Screen with Server 2019 REFS and Dedup
I have a similar case, analysis of the crash dumps returns the same results.
One backupserver with Windows 2019, Refs, dedup enabled that also mysteriously reboots once every 24h.
Another backupserver, similar setup but with NTFS, that does not reboot.
Both servers have their backup storage attached by iSCSI, and are VMs running on VMware vSphere using the hotadd backup method.
The one that reboots has all windows updates installed.
The one that doesn't has the following updates pending to be installed:
- Security Intelligence Update for Microsoft Defender Antivirus - KB2267602 (Version 1.369.510.0)
- Windows Malicious Software Removal Tool x64 - v5.102 (KB890830)
- 2022-06 Cumulative Update for .NET Framework 3.5, 4.7.2 and 4.8 for Windows Server 2019 for x64 (KB5014805)
- 2022-05 Cumulative Update for Windows Server 2019 (1809) for x64-based Systems (KB5013941)
We suspect that a Windows update introduced the problem.
One backupserver with Windows 2019, Refs, dedup enabled that also mysteriously reboots once every 24h.
Another backupserver, similar setup but with NTFS, that does not reboot.
Both servers have their backup storage attached by iSCSI, and are VMs running on VMware vSphere using the hotadd backup method.
The one that reboots has all windows updates installed.
The one that doesn't has the following updates pending to be installed:
- Security Intelligence Update for Microsoft Defender Antivirus - KB2267602 (Version 1.369.510.0)
- Windows Malicious Software Removal Tool x64 - v5.102 (KB890830)
- 2022-06 Cumulative Update for .NET Framework 3.5, 4.7.2 and 4.8 for Windows Server 2019 for x64 (KB5014805)
- 2022-05 Cumulative Update for Windows Server 2019 (1809) for x64-based Systems (KB5013941)
We suspect that a Windows update introduced the problem.
-
- Lurker
- Posts: 2
- Liked: 1 time
- Joined: Feb 03, 2020 9:50 pm
- Full Name: Jörn S.
- Contact:
Re: Blue Screen with Server 2019 REFS and Dedup
I had a very similar problem with a ReFS + Dedup repository. According to the crash report the error was in FLTMGR.SYS.
The "solution" for the affected server was to extend the Windows Defender exceptions to the entire drive. This is not a problem because the drive only has Veeam backup data on it.
The "solution" for the affected server was to extend the Windows Defender exceptions to the entire drive. This is not a problem because the drive only has Veeam backup data on it.
-
- Veeam ProPartner
- Posts: 566
- Liked: 103 times
- Joined: Dec 29, 2009 12:48 pm
- Full Name: Marco Novelli
- Location: Asti - Italy
- Contact:
Re: Blue Screen with Server 2019 REFS and Dedup
Please update all BIOS / Firmware of the Server: BIOS , controller RAID, backplane, CPLD, disk firmware, network...
It could be a faulty RAM memory, check in your BIOS if you have a "memory testing" feature
Also check if there is an updated driver for your RAID controller
Marco
-
- Influencer
- Posts: 13
- Liked: 3 times
- Joined: Jun 07, 2022 10:57 pm
- Full Name: Michael Keating
- Contact:
Re: Blue Screen with Server 2019 REFS and Dedup
We've seeing the same after last month's updates, specifically when a dedup job finishes. There have been ongoing issues with Windows and ReFS over the last year but this is the first time we have been affected.
Unfortunately we are moving off the dedup storage so I don't have an answer. We did update with this month's update and haven't seen a BSOD but we stopped all our jobs so haven't seen it since.
Unfortunately we are moving off the dedup storage so I don't have an answer. We did update with this month's update and haven't seen a BSOD but we stopped all our jobs so haven't seen it since.
-
- Service Provider
- Posts: 35
- Liked: 5 times
- Joined: Jun 27, 2022 8:12 am
- Full Name: Abdull
- Contact:
Re: Blue Screen with Server 2019 REFS and Dedup
We had the same problem. Have also as you described, the folder where the backups were excluded and then was quiet again.joern.schonhoff wrote: ↑Jul 04, 2022 7:38 am I had a very similar problem with a ReFS + Dedup repository. According to the crash report the error was in FLTMGR.SYS.
The "solution" for the affected server was to extend the Windows Defender exceptions to the entire drive. This is not a problem because the drive only has Veeam backup data on it.
After a detailed investigation it turned out that Defender has crashed the dedup driver.
-
- Expert
- Posts: 101
- Liked: 8 times
- Joined: Sep 26, 2017 11:38 am
- Full Name: Govinda Naik
- Contact:
Re: Blue Screen with Server 2019 REFS and Dedup
Hi,
I have experience similar issue with Windows 11 OS Pro 23H2.
We were simulating disaster recover procedure. Installed Latest version of Veeam Backup and Replication software on Windows 11 OS and tried to attached iSCSI Drive( NAS Synology) which is formatted using ReFS.
We have not seen this issue on Windows 10 Pro OS as we have tried from Windows 10 Pro laptop as well.
I have experience similar issue with Windows 11 OS Pro 23H2.
We were simulating disaster recover procedure. Installed Latest version of Veeam Backup and Replication software on Windows 11 OS and tried to attached iSCSI Drive( NAS Synology) which is formatted using ReFS.
We have not seen this issue on Windows 10 Pro OS as we have tried from Windows 10 Pro laptop as well.
-
- Veteran
- Posts: 257
- Liked: 40 times
- Joined: May 21, 2013 9:08 pm
- Full Name: Alan Wells
- Contact:
Re: Blue Screen with Server 2019 REFS and Dedup
I realized I never posted about fixing this.
Ultimately I did fix this issue but it took replacing pretty much every piece of electronics in the chassis.
We have a 3rd party warranty on this and they replaced the network cards, the FC cards, the backplane to the Chasis (where the server plugs in)
None of that seemed to help. Then they replaced the main board on the server module itself and that seemed to stabilize everything.
This thin has been running since my last post so ultimately it was not a Veeam issue nor a Dedup issue but a hardware issue as I expected.
Just could never figure out which thing it was util we threw part at it.
Ultimately I did fix this issue but it took replacing pretty much every piece of electronics in the chassis.
We have a 3rd party warranty on this and they replaced the network cards, the FC cards, the backplane to the Chasis (where the server plugs in)
None of that seemed to help. Then they replaced the main board on the server module itself and that seemed to stabilize everything.
This thin has been running since my last post so ultimately it was not a Veeam issue nor a Dedup issue but a hardware issue as I expected.
Just could never figure out which thing it was util we threw part at it.
Who is online
Users browsing this forum: Baidu [Spider] and 20 guests