Server 2016 BSOD Bugcheck

Availability for the Always-On Enterprise

Re: Server 2016 BSOD Bugcheck

Veeam Logoby kubimike » Tue Feb 28, 2017 8:22 pm

which bugcheck are you getting ? I turned off teaming/lacp. Issue still occurs. I got rid of the broadcom nics, issue still occurs. if you have an HP server with the following hardware listed in this article, it might point to something. I know it says Windows 2012 R2 but im sure the code branching works the same between 2012 and 2016.
http://h20566.www2.hpe.com/hpsc/doc/public/display?sp4ts.oid=7271241&docLocale=en_US&docId=emr_na-c05308316
kubimike
Expert
 
Posts: 133
Liked: 20 times
Joined: Fri Feb 03, 2017 2:34 pm
Full Name: MikeO

Re: Server 2016 BSOD Bugcheck

Veeam Logoby Mike Resseler » Wed Mar 01, 2017 7:38 am

Are you guys using teaming in Windows server 2016? I have read quite some reports that there are serious issues with it (probably partly MSFT, partly the vendor as they are all slow in updating drivers...)

If you think it is network related, I can only advise to work with your partner or MSFT to try to solve it. The more pressure...
Mike Resseler
Veeam Software
 
Posts: 2638
Liked: 315 times
Joined: Fri Feb 08, 2013 3:08 pm
Location: Belgium, the land of the fries, the beer, the chocolate and the diamonds...
Full Name: Mike Resseler

Re: Server 2016 BSOD Bugcheck

Veeam Logoby kubimike » Wed Mar 01, 2017 2:10 pm

I thought it was the nics, I thought it was teaming. Teaming is turned off, replaced the broadcoms with intels's running on a single 10GB interface at the moment. Where did you see these reports?
kubimike
Expert
 
Posts: 133
Liked: 20 times
Joined: Fri Feb 03, 2017 2:34 pm
Full Name: MikeO

Re: Server 2016 BSOD Bugcheck

Veeam Logoby lepphce1 » Wed Mar 01, 2017 6:43 pm

Bug Check Code: 0x00000133
Caused by address: ntoskrnl.exe+14a6f0

Dell server.

Thanks!
lepphce1
Influencer
 
Posts: 23
Liked: 2 times
Joined: Tue Jun 28, 2016 4:40 pm

Re: Server 2016 BSOD Bugcheck

Veeam Logoby kubimike » Wed Mar 01, 2017 7:11 pm

@lepphce1 I bet next time it crashes it will be another exe, these are all victim processes. When Microsoft looked at my dump it was not a user-mode thing. Something low-level is causing the kernel to crash. I have verifier running now to find out what it might be. Next time it crashes see if you can view any video over your DRAC
kubimike
Expert
 
Posts: 133
Liked: 20 times
Joined: Fri Feb 03, 2017 2:34 pm
Full Name: MikeO

Re: Server 2016 BSOD Bugcheck

Veeam Logoby Mike Resseler » Wed Mar 01, 2017 8:05 pm

@kubimike: MVP's talking to each other so not really official reports but nevertheless a lot of issues ;-)
Mike Resseler
Veeam Software
 
Posts: 2638
Liked: 315 times
Joined: Fri Feb 08, 2013 3:08 pm
Location: Belgium, the land of the fries, the beer, the chocolate and the diamonds...
Full Name: Mike Resseler

Re: Server 2016 BSOD Bugcheck

Veeam Logoby kubimike » Wed Mar 01, 2017 8:22 pm

@Mike Resseler , Roger.
kubimike
Expert
 
Posts: 133
Liked: 20 times
Joined: Fri Feb 03, 2017 2:34 pm
Full Name: MikeO

Re: Server 2016 BSOD Bugcheck

Veeam Logoby lepphce1 » Wed Mar 01, 2017 8:28 pm

@kubimike I've set up crash captures so I can grab the actual screen. Last time it took 10 days from the previous crash for the crash to occur.

I have about a dozen dump files. I did open them in BlueScreenView and two of those crashes pointed to the Intel NDIS driver. I updated to the latest driver from Intel and that did not help, still crashed. All of the other dumps do not indicate any issue/driver other than ntoskrnl.exe.

I also did a bit of a cross-post to the TechNet forums to see if there is anybody there who can help. If nothing else, if it is a Kernel bug, helpfully it does a small part to escalate the issue. https://social.technet.microsoft.com/Forums/en-US/4861e81a-97ad-4645-8bf0-cacf5d7e6667/0x00000133-blue-screen-server-2016?forum=ws2016

Obviously, I will continue to monitor both forums and post feedback as needed.
lepphce1
Influencer
 
Posts: 23
Liked: 2 times
Joined: Tue Jun 28, 2016 4:40 pm

Re: Server 2016 BSOD Bugcheck

Veeam Logoby kubimike » Wed Mar 01, 2017 8:33 pm

@lepphce1 Yes mine also pointed to network drivers. I had broadcom adapters, knowing their past history I decided to get Intel adpaters. Same issue after swap-out. Ill jump on your social technet post and display my Microsoft case #. I am currently in a holding pattern waiting for my box to fail again. Without special pools turned on its going to be a rough ride trying to figure out which driver is at fault. It was different for me every time. Thanks for the heads up!
kubimike
Expert
 
Posts: 133
Liked: 20 times
Joined: Fri Feb 03, 2017 2:34 pm
Full Name: MikeO

Re: Server 2016 BSOD Bugcheck

Veeam Logoby golmic » Thu Mar 02, 2017 6:40 am

@kubimike we use Intel 82599 10GbE NICs, yes they are teamed. But in our case it looks like the machine crashes when to much parallel task are running to the storage.
golmic
Lurker
 
Posts: 2
Liked: never
Joined: Fri Nov 25, 2016 6:02 am
Full Name: Michael Goll

Re: Server 2016 BSOD Bugcheck

Veeam Logoby Delo123 » Thu Mar 02, 2017 7:59 am

Just a wild guess. Did you guys check if maybe the memory gets exhausted? We have 384GB's of Ram in the box. Our 2016 (Intel S2600GZ based) repo / proxy still runs fine without any crash. Intel 10GB's teamed in WIndows (LACP). In the last few weeks we have copied over nearly 200TB's of Veeam Archives, have multiple dedupe jobs running on these and in parallel using the same box as ReFS repositories for our secondary veeam jobs. Throughput on the system in continuously over 1GB/s due to all the things running in parallel. Not sure, but are all these cases connected to Dell boxes? Maybe swap "dell" drivers with oems?
Delo123
Expert
 
Posts: 305
Liked: 83 times
Joined: Fri Dec 28, 2012 5:20 pm
Full Name: Guido Meijers

Re: Server 2016 BSOD Bugcheck

Veeam Logoby lepphce1 » Thu Mar 02, 2017 2:16 pm

@kubimike looks like the MSFT forums were less help than I was expecting... Your trace, I did see that it might be related to the storage driver? Are you by chance running ReFS? Because we are. And that's the only major change to this hardware besides our upgrade from 2012R2 to 2016. (I digress, but the ReFS storage savings we are getting on our full backups are insane! If only the box was stable...)
lepphce1
Influencer
 
Posts: 23
Liked: 2 times
Joined: Tue Jun 28, 2016 4:40 pm

Re: Server 2016 BSOD Bugcheck

Veeam Logoby kubimike » Thu Mar 02, 2017 2:54 pm

@leppche1 H*ll yes im using ReFS! Formatted 64k 48TB volume. Yes I use it for the storage advantages as well. Yes pretty insane, the box I built is extremely fast. I was mentioning in another thread that I can do a Full Synthetic on one particular job that is 11TBs in 8 mins! :shock: Its just missing fuzzy dice hanging from the disk array. If only the box were stable :roll: I might be on to something here want you to test this. I want you to load verifier on processes in memory. This will help trap the issue, it will flag every process in a pool with a special header.

So click start - run - type verifier.exe click OK
Click 'Create custom settings (for code developers)
Click 'Special Pool & Pool Tracking Only'
Then I would pick everything that is loaded in memory.
Image

I've made two changes to my box which oddly enough its been stable for a day 1/2 now which is weird.
First change is I updated HP's Intelligent Provisioning http://h20564.www2.hpe.com/hpsc/swd/public/detail?sp4ts.oid=7271242&swItemId=MTX_061e8ee804284704998e7a3b49&swEnvOid=4231, this is the UEFI 'Unified Extensible Firmware' that sits between the hardware and OS. I have an embedded NAND flash that stores this image. This didn't get updated with HPs "Proliant Support Pack' because the update came after it was released. Does your Dell server do something like this? If so maybe check to see if there is an update for it.

Second change is running verifier, I have no proof but could running this process cause the drivers to NOT crash? Doesn't make sense but I can't explain it either which is why I want you to try this out. Now this will slow down the box a bit because of whats going on under the covers. But if your box does bomb out it will give way to whats happening with the Kernel. I could take your dump and submit it to Microsoft since I have an open ticket. :mrgreen:
kubimike
Expert
 
Posts: 133
Liked: 20 times
Joined: Fri Feb 03, 2017 2:34 pm
Full Name: MikeO

Re: Server 2016 BSOD Bugcheck

Veeam Logoby Delo123 » Thu Mar 02, 2017 2:57 pm

Another thing which came to my mind is interrupt remapping, this usually can be disabled in the bios. Caused tons of issues with us in the past, ESX as Windows hosts (bsod/psod)...
Delo123
Expert
 
Posts: 305
Liked: 83 times
Joined: Fri Dec 28, 2012 5:20 pm
Full Name: Guido Meijers

Re: Server 2016 BSOD Bugcheck

Veeam Logoby kubimike » Thu Mar 02, 2017 3:03 pm

http://h20564.www2.hpe.com/hpsc/doc/public/display?docId=emr_na-c04912076

*Gen9 servers are not affected and were removed from the document.*


whew!
kubimike
Expert
 
Posts: 133
Liked: 20 times
Joined: Fri Feb 03, 2017 2:34 pm
Full Name: MikeO

PreviousNext

Return to Veeam Backup & Replication



Who is online

Users browsing this forum: Arkadiy B, UNHStorage and 67 guests