Comprehensive data protection for all workloads
kubimike
Veteran
Posts: 373
Liked: 41 times
Joined: Feb 03, 2017 2:34 pm
Full Name: MikeO
Contact:

Server 2016 BSOD Bugcheck

Post by kubimike » 2 people like this post

Just a heads up to anyone that is running Server 2016 w/SQL Express 2012 and they install SQL 2016. My Veeam box became very unstable and would only stay up for an hour Bugcheck: 0x00000133 (0x0000000000000001, 0x0000000000001e00, 0x0000000000000000, 0x0000000000000000) . Mind you this was on brand new hardware, HP DL380 G9 + Veeam 9.5. I do have a ticket open with Microsoft. They're doing a crash dump analysis to find out why exactly.
After installing the following updates it stopped:

Code: Select all

Installation Successful: Windows successfully installed the following update: Security Update for SQL Server 2012 Service Pack 3 GDR (KB3194721)

Installation Successful: Windows successfully installed the following update: Security Update for SQL Server 2016 RTM GDR (KB3194716)

Installation Successful: Windows successfully installed the following update: Definition Update for Windows Defender - KB2267602 (Definition 1.235.2160.0)

Installation Successful: Windows successfully installed the following update: Update for Windows Server 2016 for x64-based Systems (KB3211320)

Code: Select all

0: kd> !analyze -v
*******************************************************************************
*                                                                             *
*                        Bugcheck Analysis                                    *
*                                                                             *
*******************************************************************************

DPC_WATCHDOG_VIOLATION (133)
The DPC watchdog detected a prolonged run time at an IRQL of DISPATCH_LEVEL
or above.
Arguments:
Arg1: 0000000000000001, The system cumulatively spent an extended period of time at
DISPATCH_LEVEL or above. The offending component can usually be
identified with a stack trace.
Arg2: 0000000000001e00, The watchdog period.
Arg3: 0000000000000000
Arg4: 0000000000000000
kubimike
Veteran
Posts: 373
Liked: 41 times
Joined: Feb 03, 2017 2:34 pm
Full Name: MikeO
Contact:

Re: Server 2016 BSOD Bugcheck

Post by kubimike »

Hello, I know Veeam has fingers into back channels for Microsoft. Could someone on your staff point out Microsoft Case ID #117020615276541 , Windows 2016 crashes. Not sure if its related to the other threads I see about ReFS / Dedupes etc but not having a stable box for backups isn't fun. They do have a memory.dump from me and I've been waiting since Monday to hear about their findings. Thank you.
Gostev
Chief Product Officer
Posts: 31459
Liked: 6648 times
Joined: Jan 01, 2006 1:01 am
Location: Baar, Switzerland
Contact:

Re: Server 2016 BSOD Bugcheck

Post by Gostev »

Mike, I'd love to help - but as soon as I start leveraging my back channel for all sorts of support case escalations, they will obviously cut me out in no time ;) so, I don't escalate even our very own support cases with Microsoft, and keep this channel for internally confirmed critical issues impacting large amount of customers only.
kubimike
Veteran
Posts: 373
Liked: 41 times
Joined: Feb 03, 2017 2:34 pm
Full Name: MikeO
Contact:

Re: Server 2016 BSOD Bugcheck

Post by kubimike »

@Gostev OK we wouldn't want that. Figured I'd ask. As a thread update, I've reinstalled the OS w/o SQL 2016. Box still freezes and dies after about 14+ hours of use. HPs embedded tools turned up no hardware issues.
Gostev
Chief Product Officer
Posts: 31459
Liked: 6648 times
Joined: Jan 01, 2006 1:01 am
Location: Baar, Switzerland
Contact:

Re: Server 2016 BSOD Bugcheck

Post by Gostev »

Did you check if some process might be leaking memory?
kubimike
Veteran
Posts: 373
Liked: 41 times
Joined: Feb 03, 2017 2:34 pm
Full Name: MikeO
Contact:

Re: Server 2016 BSOD Bugcheck

Post by kubimike »

Well MSFT called back said they indeed found a problem they thought was fixed in Win 2016 https://support.microsoft.com/en-us/hel ... er-2012-r2

So in order to write a new fix we went into verifier.exe and turn on a bunch of switches as well as making sure we get a kernal mode dump this time too. Now we have to sit back and wait for it to dump again in order for them to collect that data and write a new fix. Joys of 2016 I guess :roll: :( :D
golmic
Novice
Posts: 3
Liked: never
Joined: Nov 25, 2016 6:02 am
Full Name: Michael Goll
Contact:

Re: Server 2016 BSOD Bugcheck

Post by golmic »

Hi,
have you heard anything from MS?
I think we have nearly the same problem.
Delo123
Veteran
Posts: 361
Liked: 109 times
Joined: Dec 28, 2012 5:20 pm
Full Name: Guido Meijers
Contact:

Re: Server 2016 BSOD Bugcheck

Post by Delo123 » 1 person likes this post

We have Server 2016 running with SQL 2016 and 9.5 latest patch & ms updates running for just over a month now on physical intel barebone, Not a single issue. YOu guys seeing issues when backuping / restoring or seomthing else? Or also when idle?
kubimike
Veteran
Posts: 373
Liked: 41 times
Joined: Feb 03, 2017 2:34 pm
Full Name: MikeO
Contact:

Re: Server 2016 BSOD Bugcheck

Post by kubimike » 1 person likes this post

@Golmic, what type of network adapters do you have? Are they teamed ? Which bugcheck error are you getting ? Looks like my Broadcom adapter drivers are the issue. Bought so HP/Intel 10GB adapters to team with now. So far so good. I have another HP box with integrated Broadcom adapters running Windows 2012 R2 that is also teamed and it crashes as well.

@Delo123 Same question what type of Network adapter do you have? And is it teamed ?
Gostev
Chief Product Officer
Posts: 31459
Liked: 6648 times
Joined: Jan 01, 2006 1:01 am
Location: Baar, Switzerland
Contact:

Re: Server 2016 BSOD Bugcheck

Post by Gostev »

kubimike wrote:Broadcom adapters
Ah, our technical support's favorite. I like to say that instead of building complex "Chaos Monkey" type of tests for your networks, you could just swap a few NICs to ones from Broadcom, and you're good to go.
kubimike
Veteran
Posts: 373
Liked: 41 times
Joined: Feb 03, 2017 2:34 pm
Full Name: MikeO
Contact:

Re: Server 2016 BSOD Bugcheck

Post by kubimike »

Why did HP move away from QLogic and Intel on for onboard stuff? Its not that their servers got any CHEAPER! . So far so good on the Intel 561t's!
nmdange
Veteran
Posts: 527
Liked: 142 times
Joined: Aug 20, 2015 9:30 pm
Contact:

Re: Server 2016 BSOD Bugcheck

Post by nmdange »

I would definitely agree with using Intel NICs over Broadcom! If you are looking to do SMB Direct, then Mellanox is the way to go.
Delo123
Veteran
Posts: 361
Liked: 109 times
Joined: Dec 28, 2012 5:20 pm
Full Name: Guido Meijers
Contact:

Re: Server 2016 BSOD Bugcheck

Post by Delo123 »

We also use only Intel Nics (X540-T2 and onboard X540-AT2) Intel drivers but teamed with windows in Server 2016 (LACP) and Qlogic FC HBA's.
Not sure about the Broadcoms but maybe there things like offloading to check for issues.
kubimike
Veteran
Posts: 373
Liked: 41 times
Joined: Feb 03, 2017 2:34 pm
Full Name: MikeO
Contact:

Re: Server 2016 BSOD Bugcheck

Post by kubimike »

Came into the office this morning, the veeam box was hosed again. Even though the iLO the screen was black. Power cycled it an now Im up again. Opening the case again with Microsoft, thinking even though I've disabled on the onboard Broadcom NICs perhaps the driver is still loading? I could try reinstalling the OS again. What a hassle.
lepphce1
Enthusiast
Posts: 31
Liked: 2 times
Joined: Jun 28, 2016 4:40 pm
Contact:

Re: Server 2016 BSOD Bugcheck

Post by lepphce1 »

@kubimike
I am seeing this issue as well, always happens during backups. Intel i350-t NICs, latest drivers from Intel. I do have MS LACP teaming enabled on the access side. But the major throughput is via SAN iSCSI NICs that are not teamed. Server was performing flawlessly on 2008R2 and 2012R2 (Dell R720XD).

Is there reason to believe that teaming is the issue? Because I can easily turn this off while we wait for a fix...
kubimike
Veteran
Posts: 373
Liked: 41 times
Joined: Feb 03, 2017 2:34 pm
Full Name: MikeO
Contact:

Re: Server 2016 BSOD Bugcheck

Post by kubimike »

which bugcheck are you getting ? I turned off teaming/lacp. Issue still occurs. I got rid of the broadcom nics, issue still occurs. if you have an HP server with the following hardware listed in this article, it might point to something. I know it says Windows 2012 R2 but im sure the code branching works the same between 2012 and 2016.
http://h20566.www2.hpe.com/hpsc/doc/pub ... -c05308316
Mike Resseler
Product Manager
Posts: 8044
Liked: 1263 times
Joined: Feb 08, 2013 3:08 pm
Full Name: Mike Resseler
Location: Belgium
Contact:

Re: Server 2016 BSOD Bugcheck

Post by Mike Resseler »

Are you guys using teaming in Windows server 2016? I have read quite some reports that there are serious issues with it (probably partly MSFT, partly the vendor as they are all slow in updating drivers...)

If you think it is network related, I can only advise to work with your partner or MSFT to try to solve it. The more pressure...
kubimike
Veteran
Posts: 373
Liked: 41 times
Joined: Feb 03, 2017 2:34 pm
Full Name: MikeO
Contact:

Re: Server 2016 BSOD Bugcheck

Post by kubimike »

I thought it was the nics, I thought it was teaming. Teaming is turned off, replaced the broadcoms with intels's running on a single 10GB interface at the moment. Where did you see these reports?
lepphce1
Enthusiast
Posts: 31
Liked: 2 times
Joined: Jun 28, 2016 4:40 pm
Contact:

Re: Server 2016 BSOD Bugcheck

Post by lepphce1 »

Bug Check Code: 0x00000133
Caused by address: ntoskrnl.exe+14a6f0

Dell server.

Thanks!
kubimike
Veteran
Posts: 373
Liked: 41 times
Joined: Feb 03, 2017 2:34 pm
Full Name: MikeO
Contact:

Re: Server 2016 BSOD Bugcheck

Post by kubimike »

@lepphce1 I bet next time it crashes it will be another exe, these are all victim processes. When Microsoft looked at my dump it was not a user-mode thing. Something low-level is causing the kernel to crash. I have verifier running now to find out what it might be. Next time it crashes see if you can view any video over your DRAC
Mike Resseler
Product Manager
Posts: 8044
Liked: 1263 times
Joined: Feb 08, 2013 3:08 pm
Full Name: Mike Resseler
Location: Belgium
Contact:

Re: Server 2016 BSOD Bugcheck

Post by Mike Resseler »

@kubimike: MVP's talking to each other so not really official reports but nevertheless a lot of issues ;-)
kubimike
Veteran
Posts: 373
Liked: 41 times
Joined: Feb 03, 2017 2:34 pm
Full Name: MikeO
Contact:

Re: Server 2016 BSOD Bugcheck

Post by kubimike »

@Mike Resseler , Roger.
lepphce1
Enthusiast
Posts: 31
Liked: 2 times
Joined: Jun 28, 2016 4:40 pm
Contact:

Re: Server 2016 BSOD Bugcheck

Post by lepphce1 »

@kubimike I've set up crash captures so I can grab the actual screen. Last time it took 10 days from the previous crash for the crash to occur.

I have about a dozen dump files. I did open them in BlueScreenView and two of those crashes pointed to the Intel NDIS driver. I updated to the latest driver from Intel and that did not help, still crashed. All of the other dumps do not indicate any issue/driver other than ntoskrnl.exe.

I also did a bit of a cross-post to the TechNet forums to see if there is anybody there who can help. If nothing else, if it is a Kernel bug, helpfully it does a small part to escalate the issue. https://social.technet.microsoft.com/Fo ... rum=ws2016

Obviously, I will continue to monitor both forums and post feedback as needed.
kubimike
Veteran
Posts: 373
Liked: 41 times
Joined: Feb 03, 2017 2:34 pm
Full Name: MikeO
Contact:

Re: Server 2016 BSOD Bugcheck

Post by kubimike »

@lepphce1 Yes mine also pointed to network drivers. I had broadcom adapters, knowing their past history I decided to get Intel adpaters. Same issue after swap-out. Ill jump on your social technet post and display my Microsoft case #. I am currently in a holding pattern waiting for my box to fail again. Without special pools turned on its going to be a rough ride trying to figure out which driver is at fault. It was different for me every time. Thanks for the heads up!
golmic
Novice
Posts: 3
Liked: never
Joined: Nov 25, 2016 6:02 am
Full Name: Michael Goll
Contact:

Re: Server 2016 BSOD Bugcheck

Post by golmic »

@kubimike we use Intel 82599 10GbE NICs, yes they are teamed. But in our case it looks like the machine crashes when to much parallel task are running to the storage.
Delo123
Veteran
Posts: 361
Liked: 109 times
Joined: Dec 28, 2012 5:20 pm
Full Name: Guido Meijers
Contact:

Re: Server 2016 BSOD Bugcheck

Post by Delo123 »

Just a wild guess. Did you guys check if maybe the memory gets exhausted? We have 384GB's of Ram in the box. Our 2016 (Intel S2600GZ based) repo / proxy still runs fine without any crash. Intel 10GB's teamed in WIndows (LACP). In the last few weeks we have copied over nearly 200TB's of Veeam Archives, have multiple dedupe jobs running on these and in parallel using the same box as ReFS repositories for our secondary veeam jobs. Throughput on the system in continuously over 1GB/s due to all the things running in parallel. Not sure, but are all these cases connected to Dell boxes? Maybe swap "dell" drivers with oems?
lepphce1
Enthusiast
Posts: 31
Liked: 2 times
Joined: Jun 28, 2016 4:40 pm
Contact:

Re: Server 2016 BSOD Bugcheck

Post by lepphce1 »

@kubimike looks like the MSFT forums were less help than I was expecting... Your trace, I did see that it might be related to the storage driver? Are you by chance running ReFS? Because we are. And that's the only major change to this hardware besides our upgrade from 2012R2 to 2016. (I digress, but the ReFS storage savings we are getting on our full backups are insane! If only the box was stable...)
kubimike
Veteran
Posts: 373
Liked: 41 times
Joined: Feb 03, 2017 2:34 pm
Full Name: MikeO
Contact:

Re: Server 2016 BSOD Bugcheck

Post by kubimike »

@leppche1 H*ll yes im using ReFS! Formatted 64k 48TB volume. Yes I use it for the storage advantages as well. Yes pretty insane, the box I built is extremely fast. I was mentioning in another thread that I can do a Full Synthetic on one particular job that is 11TBs in 8 mins! :shock: Its just missing fuzzy dice hanging from the disk array. If only the box were stable :roll: I might be on to something here want you to test this. I want you to load verifier on processes in memory. This will help trap the issue, it will flag every process in a pool with a special header.

So click start - run - type verifier.exe click OK
Click 'Create custom settings (for code developers)
Click 'Special Pool & Pool Tracking Only'
Then I would pick everything that is loaded in memory.
Image

I've made two changes to my box which oddly enough its been stable for a day 1/2 now which is weird.
First change is I updated HP's Intelligent Provisioning http://h20564.www2.hpe.com/hpsc/swd/pub ... nvOid=4231, this is the UEFI 'Unified Extensible Firmware' that sits between the hardware and OS. I have an embedded NAND flash that stores this image. This didn't get updated with HPs "Proliant Support Pack' because the update came after it was released. Does your Dell server do something like this? If so maybe check to see if there is an update for it.

Second change is running verifier, I have no proof but could running this process cause the drivers to NOT crash? Doesn't make sense but I can't explain it either which is why I want you to try this out. Now this will slow down the box a bit because of whats going on under the covers. But if your box does bomb out it will give way to whats happening with the Kernel. I could take your dump and submit it to Microsoft since I have an open ticket. :mrgreen:
Delo123
Veteran
Posts: 361
Liked: 109 times
Joined: Dec 28, 2012 5:20 pm
Full Name: Guido Meijers
Contact:

Re: Server 2016 BSOD Bugcheck

Post by Delo123 »

Another thing which came to my mind is interrupt remapping, this usually can be disabled in the bios. Caused tons of issues with us in the past, ESX as Windows hosts (bsod/psod)...
kubimike
Veteran
Posts: 373
Liked: 41 times
Joined: Feb 03, 2017 2:34 pm
Full Name: MikeO
Contact:

Re: Server 2016 BSOD Bugcheck

Post by kubimike »

http://h20564.www2.hpe.com/hpsc/doc/pub ... -c04912076

*Gen9 servers are not affected and were removed from the document.*


whew!
Post Reply

Who is online

Users browsing this forum: B.F., Bing [Bot], Ivan239 and 245 guests