Random VM reboots with Veeam B&R

MrJH · Post by **MrJH** » Aug 19, 2015 7:41 am this post

ISSUE RAISED VIA SUPPORT PORTAL - CASE ID: 01000359

I’m wondering if anyone can help us with the issue below.

We are currently running around 12 HA VMs, on a 2-node Windows Server 2012 R2 Hyper-V cluster. VM storage is housed on SMB 3.0 shares, which are running from a 2-node Windows Server 2012 Scale-out File Server cluster.

2 SMB 3.0 shares have been provisioned from the 2 CSVs presented by the SOFS cluster. Each SOFS cluster node is an owner of a CSV. Storage hardware is a Lenovo ThinkServer JBOD.

For VM backup, we are utilising Veeam Backup and Replication 8, with update 2b applied. The backup run begins at 10pm each evening, by means of a scheduled PowerShell script.

The backup completes successfully, but have found over the past week or so that upon arriving at the office the next day, there are multiple 1069 cluster events for the VMs which have rebooted at random.

The VMs in question, are random in terms of which ones reboot each evening.

In an effort to find the root cause of the problem, we disabled all Veeam VM backups one evening. The following morning, our Hyper-V cluster reported that it had gone the entire period without having any issues.

We then, manually ran the backup script during office hours, and waited for any issues. The backup ran without issue, until it came to one of the last few VMs.

What then happened, was that 3 of the VMs restarted. Specifically, EMAIL, PRINTSRV & TS4. The VMs rebooted during the TS4 backup.

These restarted between 12.47pm and 12.48pm. All 3 came back up. There doesn’t appear to be any link between the three (apart from the fact that all 3 were running on the same HV node). What’s more odd is that the reboots occurred way after 2 of the VMs.

I should add that there were other VMs running on that same HV node.

Backup completion times:

EMAIL – 10.49am
PRINTSRV – 12.08am
TS4 – 12.55am

The backup then proceeded, until reaching the penultimate VM. Suddenly, I then noticed that all VMs on all HV nodes lost connection to their storage, and were either turning off or starting up on another node of the HV cluster. A few seconds after seeing this, I checked the logs for our SOFS cluster, and noticed that RHS had stopped unexpectedly, which caused the file cluster to restart and VMs to bomb out.

Amazingly the Veeam backup proceeded to backup the last VM when both clusters returned to a normal running state.

Does anyone have any ideas what is causing this problem? I keep reading about disabling ODX in Server 2012, for storage hardware that doesn’t support it.

All I know is that running the backup, causes problems.

Many thanks.

MrJH · Post by **MrJH** » Aug 19, 2015 10:39 am this post

I have a question - after finding http://www.veeam.com/kb1838, these are recommended for 'Hyper-V' servers.
We installed Backup and Replication 8 to one of our SOFS nodes, NOT to a HV node. It made more sense to install it on the server which conatains the 'source' of the VMs.
As we are doing this, do any of the updates listed on this page apply? Should we still install the recommended updates on our HV nodes, despite the backup being initiated from one of the SOFS nodes?

Post by **foggy** » Aug 19, 2015 5:22 pm this post

The mentioned updates for Hyper-V hosts are recommended regardless of where Veeam B&R itself is installed. It is doesn't actually matter where it is installed, since all the processing is performed by data movers running on the hosts.

nmdange · Post by **nmdange** » Aug 20, 2015 9:39 pm this post

Microsoft added 3 new hotfixes to the recommended hotfix list. At least one of them references RHS deadlock crashes. Look through Microsoft's KB article and make sure you have everything listed applied https://support.microsoft.com/en-us/kb/2920151

Also I'm not sure I'd recommend running Veeam B&R directly on one of the SOFS nodes. You should not have any applications running directly on any of your Hyper-V or SOFS hosts. If you don't have another server you can use, I would run it within a VM on Hyper-V. In my case, I have a separate physical server with local storage that has Veeam B&R installed on it, and also acts as the backups repository and off-host backup proxy.

MrJH · Post by **MrJH** » Aug 21, 2015 9:33 am this post

nmdange wrote:Microsoft added 3 new hotfixes to the recommended hotfix list. At least one of them references RHS deadlock crashes. Look through Microsoft's KB article and make sure you have everything listed applied https://support.microsoft.com/en-us/kb/2920151

Also I'm not sure I'd recommend running Veeam B&R directly on one of the SOFS nodes. You should not have any applications running directly on any of your Hyper-V or SOFS hosts. If you don't have another server you can use, I would run it within a VM on Hyper-V. In my case, I have a separate physical server with local storage that has Veeam B&R installed on it, and also acts as the backups repository and off-host backup proxy.

Thanks for the reply. All updates listed on your link were installed to our Hyper-V cluster the other day.
You are the first person to actively comment on the placement of the B&R install. I think it may be worth placing it away from the SOFS cluster to see if anything improves. We placed it on the SOFS node, as it saved pulling 320 odd GB across the network to another server/machine which held the repository.

Have you encountered any issues when backing up Server 2008 R2 VMs?

Post by **foggy** » Aug 21, 2015 2:48 pm this post

MrJH wrote:We placed it on the SOFS node, as it saved pulling 320 odd GB across the network to another server/machine which held the repository.

I wonder how it could save anything as, in any case, data is retrieved from the storage by Veeam data mover agent installed on the Hyper-V host and then is transferred to repository.

MrJH · Post by **MrJH** » Aug 24, 2015 7:21 am this post

The issue appears to have been resolved. The solution was down to one or both of the following;

1. Excluding Server 2008 R2 SP1 machines from the backup schedule.
2. Patching our SOFS cluster with recommended hotfixes/updates.

Just running a test backup with only 2008 boxes, to confirm. Our clusters went the entire weekend without a single error logged in FCM.

Post by **foggy** » Aug 24, 2015 10:16 am this post

It would be much appreciated if you could get back with similar report after adding 2008 R2 SP1 servers back to the backup job.

MrJH · Aug 24, 2015 1:58 pm

I can confirm that the backup completed successfully after adding the 2008 boxes back in. Resolution of the issue must have therefore been down to the recommended hotfixes/updates.

Recommended hotfixes and updates for Windows Server 2012 based failover clusters -
https://support.microsoft.com/en-us/kb/2784261

Recommended hotfixes and updates for Windows Server 2012 R2-based failover clusters -
https://support.microsoft.com/en-us/kb/2920151

Thanks for all the replies on this one. What a relief!

davidpollock · Post by **davidpollock** » Sep 24, 2015 1:16 am this post

This sounds very much like an issue we're currently seeing, except installing those recommended hotfixes has not resolved the issue

I posted my results in this thread: http://forums.veeam.com/microsoft-hyper ... 28620.html

MrJH · Post by **MrJH** » Sep 24, 2015 7:58 am this post

I also disabled ODX, as we are running a JBOD with no such support. We also disabled CSV caching as part of the process, though I doubt the effect of that one. It was sheer updates in our case.

R&D Forums

Random VM reboots with Veeam B&R

Re: Random VM reboots with Veeam B&R

Re: Random VM reboots with Veeam B&R

Re: Random VM reboots with Veeam B&R

Re: Random VM reboots with Veeam B&R

Re: Random VM reboots with Veeam B&R

Re: Random VM reboots with Veeam B&R

Re: Random VM reboots with Veeam B&R

Re: Random VM reboots with Veeam B&R

Re: Random VM reboots with Veeam B&R

Re: Random VM reboots with Veeam B&R

Who is online