Comprehensive data protection for all workloads
ChrisRoad
Service Provider
Posts: 20
Liked: 4 times
Joined: Oct 19, 2010 8:24 am
Full Name: Christoph Roethlisberger
Contact:

Windows Server 2022 ReFS: in-place upgrade issue (BSOD boot loop)

Post by ChrisRoad » 1 person likes this post

Whelp, I finally migrated one of our Veeam Backup/Repository servers from Windows 2019 to 2022.
And just so you know, this will also upgrade ReFS version of existing volumes to v3.7, so you cannot go back to Windows 2019.

As it happens, one of the first things we did, was to delete some old/archived Veeam backup chains.
And what should I say - it seems as slow as before, but also just "killed" our server :shock:

Well, let's go into some details...
We have two independent volumes of ~50TB in this server and deleted like 10TB of Veeam backups on each at more or less the same time.
We checked progress (i.e. looking on how fast disk space was released) and after 2-3 minutes the system just crashed with a bluescreen on "refs.sys - thread exception not handled" message.
Unfortunately, the server would also no longer start after that, as on bootup it just locks up again with the same message and reboots itself. (I assume this happens the moment when Windows tries to auto-mount these ReFS volumes during startup)

After several unsuccessful tries to somehow boot up Windows again with save boot or repair options, we then disabled these ReFS volumes on the RAID controller.
Somewhat unsurprisingly, Windows did now start up again without any issues.
After checking the volume consistency and everything else we thought of, we enabled these volumes again one after the other.
And while one of them works without any issues, as soon as we enable the other and Windows tries to access it --> refs.sys bluescreen and system reboot

At the moment we are still looking into options of recovering the data from this obviously corrupted ReFS filesystem.
Of course we are also quite surprised that Windows would lockup/reboot (and be caught in a loop) when such a filesystem corruption on a non-system volume occurs.

Well, our "incident" may be pure (bad) luck and have nothing to do with Windows 2022/ReFS 3.7, but at the moment we ceased on all plans to upgrade or install Windows 2022 where ReFS is involved.

kaffeine
Influencer
Posts: 14
Liked: 7 times
Joined: Jun 04, 2018 8:03 am
Full Name: Espresso Doppio
Location: Austria
Contact:

Re: Windows 2019, large REFS and deletes

Post by kaffeine »

Is your environment already fully compatible with W2022, both on the software and hardware layer (like RAID Controller drivers and so on)?

ChrisRoad
Service Provider
Posts: 20
Liked: 4 times
Joined: Oct 19, 2010 8:24 am
Full Name: Christoph Roethlisberger
Contact:

Re: Windows 2019, large REFS and deletes

Post by ChrisRoad »

Who knows? I guess maybe
If you mean "fully certified", then of course not, as neither the RAID controller (with it's current firmware) nor it's driver are certified. (because way to old)

In this system we are using an LSI 2208 ROC with latest firmware (23.34.0-0019) and driver (6.714.18.0)
Yeah, this controller is "problematic" with newer Windows versions (read Win10 2004 +) but this combination of controller, firmware and driver are known to run just fine.
Moreover, it there would a problem with the controller, we would get a storport.sys or megasas2.sys related BlueScreen and not refs.sys

So even if this whole hardware/controller stack would not be compatible, a refs.sys BlueScreen is not what anyone should ever get - not ever, regardless of how broken the hardware is

Btw. if using a different driver like 6.714.05.0 (older) or 6.714.19.0 (newer, but from a dubious IBM source) the system properly locks up with a storport.sys BlueScreen - regardless of enabled or disabled RAID arrays/volumes
This is the expected behavior though, if there is something wrong with the hardware or driver.

Gostev
SVP, Product Management
Posts: 29368
Liked: 5486 times
Joined: Jan 01, 2006 1:01 am
Location: Baar, Switzerland
Contact:

Re: Windows 2019, large REFS and deletes

Post by Gostev »

I've split this into the new topic.

@ChrisRoad if you end up opening a support case with Microsoft, please share with me as the ReFS dev lead would love to look at the crash dump.

Thanks!

ChrisRoad
Service Provider
Posts: 20
Liked: 4 times
Joined: Oct 19, 2010 8:24 am
Full Name: Christoph Roethlisberger
Contact:

Re: Windows Server 2022 ReFS megathread

Post by ChrisRoad »

I will most likely not open a support case, because you know - despite being a service provider with active SPLA partner agreement and paying north of 100k $ in license fee every month, we don't get any support or sh** from this company.
Yeah, I could go through official business support and pay 500 bucks upfront for even opening a support case....but I don't know if it's worth the hassle.

I can already see me to fight a tedious battle, to even get past the 1st level supporter of M$, cause we don't use certified hardware/drivers, yada-yada....

ChrisRoad
Service Provider
Posts: 20
Liked: 4 times
Joined: Oct 19, 2010 8:24 am
Full Name: Christoph Roethlisberger
Contact:

Re: Windows Server 2022 ReFS megathread

Post by ChrisRoad » 1 person likes this post

If of any interest - a screenshot of the BSOD and the Memory/Crash dump (!analyze -v output of it)
Image
Image

@Gostev I can also provide the full crash dump if needed

...and I did now also go through with the Microsoft Support and paid these 449 bucks out of my own pocket, just to report this ReFS problem/bug (yeah yeah, I know, maybe I will get a refund, if they decide it's really a bug worth reporting and not a normal support/help request)

Gostev
SVP, Product Management
Posts: 29368
Liked: 5486 times
Joined: Jan 01, 2006 1:01 am
Location: Baar, Switzerland
Contact:

Re: Windows Server 2022 ReFS megathread

Post by Gostev »

Can you share the Microsoft support case ID? I assume ReFS devs will need full crash dump, but now they can easily request everything that is needed through the case you've opened.

markusmobius
Lurker
Posts: 2
Liked: never
Joined: Sep 11, 2021 2:43 am
Full Name: MARKUS M MOBIUS
Contact:

Re: Windows Server 2022 ReFS megathread

Post by markusmobius »

We are seeing exactly the same BSOD and the same crash dump after upgrading from Win 2019 to 2022 (we have two REFS volumes - also on old hardware but running just fine up to now):

Code: Select all

DUMP_FILE_ATTRIBUTES: 0x1000

BUGCHECK_CODE:  7e

BUGCHECK_P1: ffffffff80000003

BUGCHECK_P2: fffff8061d178474

BUGCHECK_P3: ffffa4032b77f9e8

BUGCHECK_P4: ffffa4032b77f200

EXCEPTION_RECORD:  ffffa4032b77f9e8 -- (.exr 0xffffa4032b77f9e8)
ExceptionAddress: fffff8061d178474 (ReFS!SmsAllocationRegionEx::AdjustNonCompactibleAllocated+0x0000000000000020)
   ExceptionCode: 80000003 (Break instruction exception)
  ExceptionFlags: 00000000
NumberParameters: 1
   Parameter[0]: 0000000000000000

CONTEXT:  ffffa4032b77f200 -- (.cxr 0xffffa4032b77f200)
rax=ffffffffffffffef rbx=ffffc707296ff640 rcx=ffffc707296ff640
rdx=ffffffffffffffe6 rsi=ffffc70727e8f010 rdi=00000000ffffffe6
rip=fffff8061d178474 rsp=ffffa4032b77fc28 rbp=000000000000001a
 r8=ffffbf055878e7f0  r9=000000000000001a r10=fffff80615160d08
r11=ffffa4032b77fc60 r12=0000000000000000 r13=ffffc70727e8f010
r14=ffffc70727e9a210 r15=ffffc7072ac9a008
iopl=0         nv up ei ng nz na pe nc
cs=0010  ss=0018  ds=002b  es=002b  fs=0053  gs=002b             efl=00000282
ReFS!SmsAllocationRegionEx::AdjustNonCompactibleAllocated+0x20:
fffff806`1d178474 cc              int     3
Resetting default scope

BLACKBOXBSD: 1 (!blackboxbsd)


BLACKBOXNTFS: 1 (!blackboxntfs)


BLACKBOXPNP: 1 (!blackboxpnp)


BLACKBOXWINLOGON: 1

PROCESS_NAME:  System

ERROR_CODE: (NTSTATUS) 0x80000003 - {EXCEPTION}  Breakpoint  A breakpoint has been reached.

EXCEPTION_CODE_STR:  80000003

EXCEPTION_PARAMETER1:  0000000000000000

EXCEPTION_STR:  0x80000003

STACK_TEXT:  
ffffa403`2b77fc28 fffff806`1d178382     : ffffc707`296ff640 65565fcc`39fc4b5f 795c0f5e`95f3ec81 6b14dc41`f93d386c : ReFS!SmsAllocationRegionEx::AdjustNonCompactibleAllocated+0x20
ffffa403`2b77fc30 fffff806`1d159895     : 00000000`00000000 ffffa403`2b77fd81 ffffbf05`5ad92e58 00000000`00000000 : ReFS!CmsAllocator::UpdateCountersForDeleteQueue+0xf2
ffffa403`2b77fca0 fffff806`1d17a404     : ffffc707`27e9a210 ffffc707`2ac9a008 ffffc707`292cebe8 ffffbf05`570f38a8 : ReFS!CmsAllocator::ProcessNonCompactionDeleteQueueEntry+0x1d9
ffffa403`2b77fdd0 fffff806`1d11f112     : ffffc707`27e8f470 ffffbf05`570f38a8 ffffbf05`570f38a8 fffff806`1d116275 : ReFS!CmsAllocator::ProcessDeleteQueueEntryParallel+0xe4
ffffa403`2b77fe60 fffff806`1d11eff2     : 00000000`00000000 ffffc707`292cebe8 00000000`0000001d ffffc707`292d1a58 : ReFS!CmsContainerRangeMap::ForEachContainerBucket+0xba
ffffa403`2b77fef0 fffff806`1d11ef28     : 00000000`00000000 ffffc707`292ce8e0 fffff806`1d17a320 ffffa403`2b77ffc0 : ReFS!CmsContainerRangeMap::ParallelForEachBucket+0x8a
ffffa403`2b77ff90 fffff806`1d12a936     : ffffc707`27e8f010 ffffc707`2ac9a008 ffffc707`2ac9a008 fffff806`1d12abb0 : ReFS!CmsAllocator::ProcessDeleteQueue+0x44
ffffa403`2b77ffe0 fffff806`1d13098d     : 00000000`00000000 ffffc707`2ac9a008 00000000`00000000 ffffc707`292d1948 : ReFS!CmsBPlusTable::SpecialPreProcessingForGlobalTables+0x1ce
ffffa403`2b780040 fffff806`1d130794     : 00000000`00000000 00000000`00000000 ffffc707`279fd290 fffff806`1d11f252 : ReFS!CmsBPlusTable::FailableTreeUpdate+0x7d
ffffa403`2b7800d0 fffff806`1d12e1b2     : ffffc707`27e8f010 00000209`0001441f 00000000`000083b1 ffffc707`27e8f010 : ReFS!CmsBPlusTable::RunFailableTreeUpdateForAllDirtyTrees+0x274
ffffa403`2b7801b0 fffff806`1d12110d     : 00000000`00000000 ffffa403`2b7807b0 00000000`00000001 00000000`00000000 : ReFS!CmsBPlusTable::UpdateBoundTrees+0x762
ffffa403`2b7806b0 fffff806`1d1657af     : ffffc707`27e8f010 ffffa800`36ac8b80 ffffc707`27e8f9a0 00000000`00000000 : ReFS!CmsVolume::Checkpoint+0x601
ffffa403`2b7808e0 fffff806`1d1744b6     : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : ReFS!MspCheckpointVolume+0x4f
ffffa403`2b780990 fffff806`150678d1     : ffffc707`274068b0 ffffc707`2c84e280 ffffa800`00000000 ffffc707`00000000 : ReFS!MspWorkerRoutine+0x46
ffffa403`2b7809e0 fffff806`150e8335     : ffffc707`2c84e280 00000000`00000001 ffffc707`2c84e280 08000197`00000001 : nt!ExpWorkerThread+0x161
ffffa403`2b780bf0 fffff806`15219fd8     : ffffa800`36f84180 ffffc707`2c84e280 fffff806`150e82e0 08000198`c0080001 : nt!PspSystemThreadStartup+0x55
ffffa403`2b780c40 00000000`00000000     : ffffa403`2b781000 ffffa403`2b77b000 00000000`00000000 00000000`00000000 : nt!KiStartSystemThread+0x28


SYMBOL_NAME:  ReFS!SmsAllocationRegionEx::AdjustNonCompactibleAllocated+20

MODULE_NAME: ReFS

IMAGE_NAME:  ReFS.SYS

IMAGE_VERSION:  10.0.20348.1
Did you find out anything more?

DonZoomik
Service Provider
Posts: 293
Liked: 95 times
Joined: Nov 25, 2016 1:56 pm
Contact:

Re: Windows Server 2022 ReFS megathread

Post by DonZoomik »

Marking the thread.
Reminds me of early days of WS2016. While testing before GA (but after RTM), found the dedup data corruption that took a while to get fixed with MS support.

markusmobius
Lurker
Posts: 2
Liked: never
Joined: Sep 11, 2021 2:43 am
Full Name: MARKUS M MOBIUS
Contact:

Re: Windows Server 2022 ReFS megathread

Post by markusmobius »

markusmobius wrote: Sep 11, 2021 2:58 am We are seeing exactly the same BSOD and the same crash dump after upgrading from Win 2019 to 2022 (we have two REFS volumes - also on old hardware but running just fine up to now):

Code: Select all

DUMP_FILE_ATTRIBUTES: 0x1000

BUGCHECK_CODE:  7e

BUGCHECK_P1: ffffffff80000003

BUGCHECK_P2: fffff8061d178474

BUGCHECK_P3: ffffa4032b77f9e8

BUGCHECK_P4: ffffa4032b77f200

EXCEPTION_RECORD:  ffffa4032b77f9e8 -- (.exr 0xffffa4032b77f9e8)
ExceptionAddress: fffff8061d178474 (ReFS!SmsAllocationRegionEx::AdjustNonCompactibleAllocated+0x0000000000000020)
   ExceptionCode: 80000003 (Break instruction exception)
  ExceptionFlags: 00000000
NumberParameters: 1
   Parameter[0]: 0000000000000000

CONTEXT:  ffffa4032b77f200 -- (.cxr 0xffffa4032b77f200)
rax=ffffffffffffffef rbx=ffffc707296ff640 rcx=ffffc707296ff640
rdx=ffffffffffffffe6 rsi=ffffc70727e8f010 rdi=00000000ffffffe6
rip=fffff8061d178474 rsp=ffffa4032b77fc28 rbp=000000000000001a
 r8=ffffbf055878e7f0  r9=000000000000001a r10=fffff80615160d08
r11=ffffa4032b77fc60 r12=0000000000000000 r13=ffffc70727e8f010
r14=ffffc70727e9a210 r15=ffffc7072ac9a008
iopl=0         nv up ei ng nz na pe nc
cs=0010  ss=0018  ds=002b  es=002b  fs=0053  gs=002b             efl=00000282
ReFS!SmsAllocationRegionEx::AdjustNonCompactibleAllocated+0x20:
fffff806`1d178474 cc              int     3
Resetting default scope

BLACKBOXBSD: 1 (!blackboxbsd)


BLACKBOXNTFS: 1 (!blackboxntfs)


BLACKBOXPNP: 1 (!blackboxpnp)


BLACKBOXWINLOGON: 1

PROCESS_NAME:  System

ERROR_CODE: (NTSTATUS) 0x80000003 - {EXCEPTION}  Breakpoint  A breakpoint has been reached.

EXCEPTION_CODE_STR:  80000003

EXCEPTION_PARAMETER1:  0000000000000000

EXCEPTION_STR:  0x80000003

STACK_TEXT:  
ffffa403`2b77fc28 fffff806`1d178382     : ffffc707`296ff640 65565fcc`39fc4b5f 795c0f5e`95f3ec81 6b14dc41`f93d386c : ReFS!SmsAllocationRegionEx::AdjustNonCompactibleAllocated+0x20
ffffa403`2b77fc30 fffff806`1d159895     : 00000000`00000000 ffffa403`2b77fd81 ffffbf05`5ad92e58 00000000`00000000 : ReFS!CmsAllocator::UpdateCountersForDeleteQueue+0xf2
ffffa403`2b77fca0 fffff806`1d17a404     : ffffc707`27e9a210 ffffc707`2ac9a008 ffffc707`292cebe8 ffffbf05`570f38a8 : ReFS!CmsAllocator::ProcessNonCompactionDeleteQueueEntry+0x1d9
ffffa403`2b77fdd0 fffff806`1d11f112     : ffffc707`27e8f470 ffffbf05`570f38a8 ffffbf05`570f38a8 fffff806`1d116275 : ReFS!CmsAllocator::ProcessDeleteQueueEntryParallel+0xe4
ffffa403`2b77fe60 fffff806`1d11eff2     : 00000000`00000000 ffffc707`292cebe8 00000000`0000001d ffffc707`292d1a58 : ReFS!CmsContainerRangeMap::ForEachContainerBucket+0xba
ffffa403`2b77fef0 fffff806`1d11ef28     : 00000000`00000000 ffffc707`292ce8e0 fffff806`1d17a320 ffffa403`2b77ffc0 : ReFS!CmsContainerRangeMap::ParallelForEachBucket+0x8a
ffffa403`2b77ff90 fffff806`1d12a936     : ffffc707`27e8f010 ffffc707`2ac9a008 ffffc707`2ac9a008 fffff806`1d12abb0 : ReFS!CmsAllocator::ProcessDeleteQueue+0x44
ffffa403`2b77ffe0 fffff806`1d13098d     : 00000000`00000000 ffffc707`2ac9a008 00000000`00000000 ffffc707`292d1948 : ReFS!CmsBPlusTable::SpecialPreProcessingForGlobalTables+0x1ce
ffffa403`2b780040 fffff806`1d130794     : 00000000`00000000 00000000`00000000 ffffc707`279fd290 fffff806`1d11f252 : ReFS!CmsBPlusTable::FailableTreeUpdate+0x7d
ffffa403`2b7800d0 fffff806`1d12e1b2     : ffffc707`27e8f010 00000209`0001441f 00000000`000083b1 ffffc707`27e8f010 : ReFS!CmsBPlusTable::RunFailableTreeUpdateForAllDirtyTrees+0x274
ffffa403`2b7801b0 fffff806`1d12110d     : 00000000`00000000 ffffa403`2b7807b0 00000000`00000001 00000000`00000000 : ReFS!CmsBPlusTable::UpdateBoundTrees+0x762
ffffa403`2b7806b0 fffff806`1d1657af     : ffffc707`27e8f010 ffffa800`36ac8b80 ffffc707`27e8f9a0 00000000`00000000 : ReFS!CmsVolume::Checkpoint+0x601
ffffa403`2b7808e0 fffff806`1d1744b6     : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : ReFS!MspCheckpointVolume+0x4f
ffffa403`2b780990 fffff806`150678d1     : ffffc707`274068b0 ffffc707`2c84e280 ffffa800`00000000 ffffc707`00000000 : ReFS!MspWorkerRoutine+0x46
ffffa403`2b7809e0 fffff806`150e8335     : ffffc707`2c84e280 00000000`00000001 ffffc707`2c84e280 08000197`00000001 : nt!ExpWorkerThread+0x161
ffffa403`2b780bf0 fffff806`15219fd8     : ffffa800`36f84180 ffffc707`2c84e280 fffff806`150e82e0 08000198`c0080001 : nt!PspSystemThreadStartup+0x55
ffffa403`2b780c40 00000000`00000000     : ffffa403`2b781000 ffffa403`2b77b000 00000000`00000000 00000000`00000000 : nt!KiStartSystemThread+0x28


SYMBOL_NAME:  ReFS!SmsAllocationRegionEx::AdjustNonCompactibleAllocated+20

MODULE_NAME: ReFS

IMAGE_NAME:  ReFS.SYS

IMAGE_VERSION:  10.0.20348.1
Did you find out anything more?
Some update: our server crashed every hour but it stayed online long enough to copy stuff. I emptied one of the spaces (each is 27TB and about 90% full) and recreated it from scratch (again formatted the single disk on it with REFS). I then took the disk in the other space offline - sure enough the crashes stopped.

I then took it online again to copy files - crashes started again but after it was only 80% full it stopped and now it's copying for 12 hours without crashing.

When empty I will recreate the storage space.

The only sense I got from all of this is that the error is more likely to occur with more full storage space and (maybe) that reformating might help. Both of the original spaces had been created under Windows Server 2016 - I don't know how Windows upgrades the REFS version and whether this upgrade is as thorough as when you reformat from scratch under Server 2022.

pesos
Expert
Posts: 142
Liked: 10 times
Joined: Nov 12, 2014 9:40 am
Full Name: John Johnson
Contact:

Re: Windows Server 2022 ReFS megathread

Post by pesos »

Subscribe.

I have yet to upgrade any of our major servers, however I did stand up a couple of new standalone 2022 hyper-v hosts and set up very basic community b&r setups to back up a couple of VMs each to a usb drive with refs. No issues with that. I do find it super annoying that MS autoupgrades refs volumes with no warning (bit me at home when I tested out win11).

m.novelli
Veeam ProPartner
Posts: 401
Liked: 44 times
Joined: Dec 29, 2009 12:48 pm
Full Name: Marco Novelli
Location: Asti - Italy
Contact:

Re: Windows Server 2022 ReFS megathread

Post by m.novelli »

Subscribe the thread

ChrisRoad
Service Provider
Posts: 20
Liked: 4 times
Joined: Oct 19, 2010 8:24 am
Full Name: Christoph Roethlisberger
Contact:

Re: Windows Server 2022 ReFS megathread

Post by ChrisRoad » 1 person likes this post

@gostev: our Microsoft support case number is 121091024003588

And to a point that @markusmobius mentioned:
Both our volumes may have been created originally on Windows Server 2016 as well.
Unfortunately I cannot say that for sure, as we did not keep track of when this server was upgraded from 2016 to 2019 and when these volumes were created. (or maybe reformatted)

Gostev
SVP, Product Management
Posts: 29368
Liked: 5486 times
Joined: Jan 01, 2006 1:01 am
Location: Baar, Switzerland
Contact:

Re: Windows Server 2022 ReFS megathread

Post by Gostev » 3 people like this post

We're not seeing this issue in our own labs which are clean installs, so there's a good chance it has to deal with upgrading existing ReFS volumes.

Andrew@MSFT
Technology Partner
Posts: 15
Liked: 31 times
Joined: Nov 19, 2019 5:31 pm
Full Name: Andrew Hansen
Contact:

Re: Windows Server 2022 ReFS megathread

Post by Andrew@MSFT » 5 people like this post

Gostev correct -it has to do with upgrading installs. We have already have a fix and working on a release timeline. In the meantime, we recommend to please hold off on upgrading systems. I'll post back soon with more information.

Andrew@MSFT
Technology Partner
Posts: 15
Liked: 31 times
Joined: Nov 19, 2019 5:31 pm
Full Name: Andrew Hansen
Contact:

Re: Windows Server 2022 ReFS megathread

Post by Andrew@MSFT » 2 people like this post

To be more clear, the data is still intact. And if you are stuck in a "boot loop" scenario, you can try taking your ReFS volume offline or read-only. Another workaround to try is to disconnect the storage backing the ReFS volume.

Again, we have a fix releasing soon and are actively working on KB with all this info in one place.

Ciso_2021
Influencer
Posts: 17
Liked: 4 times
Joined: Sep 13, 2021 7:19 pm
Full Name: Julien Ange
Contact:

Re: Windows Server 2022 ReFS megathread

Post by Ciso_2021 » 1 person likes this post

we have updated one VEEAM Server today from 2019 to 2022.
i must say everything is running fine, CDP/Replication / Backup ect...
i am on ProLiant DL360 Gen9 / Smart Array P440ar Controller Firmware version 7.00 / System Rom P89 v2.76 (10/21/2019)

Gostev
SVP, Product Management
Posts: 29368
Liked: 5486 times
Joined: Jan 01, 2006 1:01 am
Location: Baar, Switzerland
Contact:

Re: Windows Server 2022 ReFS megathread

Post by Gostev » 3 people like this post

Please keep in mind Veeam does not support Server 2022 yet.

mkretzer
Veeam Legend
Posts: 891
Liked: 274 times
Joined: Dec 17, 2015 7:17 am
Contact:

Re: Windows Server 2022 ReFS megathread

Post by mkretzer » 1 person likes this post

Andrew@MSFT wrote: Sep 13, 2021 5:50 pm Gostev correct -it has to do with upgrading installs. We have already have a fix and working on a release timeline. In the meantime, we recommend to please hold off on upgrading systems. I'll post back soon with more information.
Andrew, thank you for responding this quick!
Does that mean customers can get a custom fix (refs.sys) from you already?

COCONET
Service Provider
Posts: 7
Liked: 6 times
Joined: Jan 10, 2018 6:16 pm
Full Name: COCONET
Contact:

Re: Windows Server 2022 ReFS megathread

Post by COCONET » 1 person likes this post

Hi to All busted ones!

Today I slipped exactly into the same scenario but with much more impact.

I created 2 new Hyper-V clusters in the past 3 Weeks, one of them with 3 nodes and the other one with only 2 nodes.

All 5 nodes are build up with HPE BL460c Gen9 blades and Intel Xeon E5-26xx v3/v4 CPUs, ROM Version ist 2.90 (latest 15.09.2021)
Every node has 256GB DDR4 EEC RAM, 16GB FC controller, Emulex 20Gbit adapter.

Our setup was simple: installing alle 5 nodes with Server 2019, upgrading to the latest patch from Microsoft and installing all drivers from HPE SPP and the rest of the newest updates from the website.
All is fine at this moment. Then we started to in place upgrade all nodes to Server 2022 (I know there is no full support for the hardware right now, but somehow we have to setup our hosting environment).
Server 2022 worked like a charm, every part was really stable and we had no issues. When we started to build a cluster we connected our fc storage to the fc switches and started to build up the LUNs.
All Initiators have been grouped up and mapped to the LUNs.
We started to build the cluster within an existing domain and the only 2 problems we had were 1st: The remote display driver ist unsigned, this happens when connected to the host via RDP from a Windows 10 machine.
Connecting thru iLO and the remote display driver warning disappeared. 2nd: one security update was different from 1 host to the others, so we smiled and said thats fine lets go ahead.

The cluster was created smoothly without any problems and therefore we started to move existing Server 2019 VMs to the cluster. Also until now everything is fine and working as expected.
But today I thought I'll move the Veeam VCSP environment to the cluster and mapping the LUNs from the second FC storage. All those LUNs are about 10TB in size and formatted in ReFS with Server 2019 and Server 2016.
At the moment I added those LUNs tho the cluster-manager the RDP session freezed and the complete cluster had a bluescreen (all 3 active nodes at the same time). Worstcase szenario!!!

The bluescreen is the same refs.sys. Bootloop until you disconnect the LUNs and then you have to regenerate all those broken VMs. The point is: I cannot go back to the Server 2019 Cluster with those LUNs because of the upgrade.
Andrew@MSFT wrote: Sep 13, 2021 7:07 pm To be more clear, the data is still intact. And if you are stuck in a "boot loop" scenario, you can try taking your ReFS volume offline or read-only. Another workaround to try is to disconnect the storage backing the ReFS volume.

Again, we have a fix releasing soon and are actively working on KB with all this info in one place.
Good to know, and please let us know when we can hold this holy kb update in our hands! If you need additional logs, please let me know.

Thanks,
Martin
best regards,

Martin
COCONET - we connect everything

Veeam VCSP | Microsoft SPLA, CSP | Lenovo | QNAP | Synology | 3CX | HPE | Snom

Andrew@MSFT
Technology Partner
Posts: 15
Liked: 31 times
Joined: Nov 19, 2019 5:31 pm
Full Name: Andrew Hansen
Contact:

Re: Windows Server 2022 ReFS megathread

Post by Andrew@MSFT » 1 person likes this post

For those impacted by this issue, please DM me and I can work to get you a private fix.

karsten123
Service Provider
Posts: 95
Liked: 23 times
Joined: Apr 03, 2019 6:53 am
Full Name: Karsten Meja
Contact:

Re: Windows Server 2022 ReFS megathread

Post by karsten123 » 2 people like this post

WTF? What is this inplace upgrade thing? And why do you deploy unsupported configurations?

Gostev
SVP, Product Management
Posts: 29368
Liked: 5486 times
Joined: Jan 01, 2006 1:01 am
Location: Baar, Switzerland
Contact:

Re: Windows Server 2022 ReFS megathread

Post by Gostev » 1 person likes this post

Yeah, I would advice against doing that. Please remember that the current Veeam version does not support Windows Server 2022.
By adopting it you render your Veeam install unsupported (at least until 11a is released and you are able to upgrade to it).

COCONET
Service Provider
Posts: 7
Liked: 6 times
Joined: Jan 10, 2018 6:16 pm
Full Name: COCONET
Contact:

Re: Windows Server 2022 ReFS megathread

Post by COCONET »

Because we are a service provider for both Microsoft and Veeam, so we have to test the environment early to be ready when the final Veeam version is available. And in our case all Veeam Servers are running Server 2019 only our new host cluster ist running with Server 2022.
best regards,

Martin
COCONET - we connect everything

Veeam VCSP | Microsoft SPLA, CSP | Lenovo | QNAP | Synology | 3CX | HPE | Snom

Gustav
Influencer
Posts: 17
Liked: 3 times
Joined: May 29, 2020 2:12 pm
Full Name: Gustav Brock
Contact:

Re: Windows Server 2022 ReFS megathread

Post by Gustav » 1 person likes this post

> installing all 5 nodes with Server 2019. .. Then .. in place upgrade all nodes to Server 2022.

But why not a clean install of Windows Server 2022?
We did that on a small test server (MicroServer Gen10 Plus) using the current drivers, and Veeam 11.0.0.837 P20210525 runs fine with that combo.

Ciso_2021
Influencer
Posts: 17
Liked: 4 times
Joined: Sep 13, 2021 7:19 pm
Full Name: Julien Ange
Contact:

Re: Windows Server 2022 ReFS megathread

Post by Ciso_2021 »

Gustav,
i notice CDP is broke on the Server 2022, it's shows error once a day,
i've tried the restore of one of CDP machines and its seems to works. only the warning are showing this error.

Gostev
SVP, Product Management
Posts: 29368
Liked: 5486 times
Joined: Jan 01, 2006 1:01 am
Location: Baar, Switzerland
Contact:

Re: Windows Server 2022 ReFS megathread

Post by Gostev »

Guys, this thread is about Server 2022 ReFS, so please don't post unrelated issues.
In any case, Server 2022 is not currently supported by Veeam, so errors are expected.

I will remove these posts later.

brockp
Lurker
Posts: 1
Liked: never
Joined: Sep 20, 2021 5:37 pm
Full Name: Paul Brock
Contact:

Re: Windows Server 2022 ReFS megathread

Post by brockp »

Hi, Andrew,

I'm too new to PM you, but I too have fallen into this same bucket of a "boot loop".

I'm trying to build my reputation so that I can PM you, in the meantime are there keywords I can give to MSFT Premier support to find the fix?

Thanks!

COCONET
Service Provider
Posts: 7
Liked: 6 times
Joined: Jan 10, 2018 6:16 pm
Full Name: COCONET
Contact:

Re: Windows Server 2022 ReFS megathread

Post by COCONET » 1 person likes this post

Gustav wrote: Sep 20, 2021 8:37 am > installing all 5 nodes with Server 2019. .. Then .. in place upgrade all nodes to Server 2022.

But why not a clean install of Windows Server 2022?
We did that on a small test server (MicroServer Gen10 Plus) using the current drivers, and Veeam 11.0.0.837 P20210525 runs fine with that combo.
Because with HPE Gen9 servers there are no update packs running Server 2022. All drivers are for Server 2016 or 2019. And rewriting the xml files for all drivers and firmware updates to run under Server 2022 was no option. So we decided to go the inplace upgrade way.
best regards,

Martin
COCONET - we connect everything

Veeam VCSP | Microsoft SPLA, CSP | Lenovo | QNAP | Synology | 3CX | HPE | Snom

COCONET
Service Provider
Posts: 7
Liked: 6 times
Joined: Jan 10, 2018 6:16 pm
Full Name: COCONET
Contact:

Re: Windows Server 2022 ReFS megathread

Post by COCONET »

brockp wrote: Sep 20, 2021 6:09 pm Hi, Andrew,

I'm too new to PM you, but I too have fallen into this same bucket of a "boot loop".

I'm trying to build my reputation so that I can PM you, in the meantime are there keywords I can give to MSFT Premier support to find the fix?

Thanks!
Same here, our PMs go to nowhere...
best regards,

Martin
COCONET - we connect everything

Veeam VCSP | Microsoft SPLA, CSP | Lenovo | QNAP | Synology | 3CX | HPE | Snom

Post Reply

Who is online

Users browsing this forum: andydcliff and 100 guests