R2: Virtual machines slows down or goes offline on backup

Oletho · Post by **Oletho** » Nov 29, 2013 3:32 pm this post

Support case ID: 00484197

This is a new three-node Windows Server 2012 R2 cluster managed with VMM R2. Virtual machines are created on old Server 2012 cluster CSV, and those CSVs were attached to new cluster after upgrading.

Storage for vms is HP 3PAR FC, backup target is Qnap mapped with iSCSI from virtual Veeam 7 R2 machine. The exact same backup setup was running fine on 2012 cluster.

Shortly after starting the first job users complained about things running slow, desktops freezing and machines going offline. When the job finished things started to behave better, even though the cluster was in a limbo several hours after. Later that night we tried to start the job again, just to be sure that backup was to blame and that it was not just a coincidence. Same thing happened.

There are only few errors that are not just a result of things going wrong. One of these are

Event id 1
vds basic provider
Unexpected failure. Error code: 48F@01000003

I remember having this error back in the early days of Server 2012 Hyper-V when CSVs went offline during almost every backup. This was eventually solved by a hotfix (series of hotfixes actually) from MS.

Another error is this:

Event id 113
Hyper-V-VmSwitch
Failed to allocate VMQ for NIC 183BF80F-566B-4441-9AC9-260B0DF93CFE--374E77C8-D19B-48AC-AE74-C5F8AE23CFD4 (Friendly Name: udv-ab02) on switch 13042E4B-C900-41BA-B649-749EE89B6D90 (Friendly Name: lan logical switch). Reason - The OID failed. Status = Unknown NTSTATUS Error code: 0xc0231001

I have now turned off VMQ in both software and hardware, but if that helps is still to be seen.

I wiil be testing backup on a dedicated clusternode with dedicated CSV later this weekend.

Post by **Gostev** » Nov 29, 2013 8:49 pm this post

Perhaps some newly introduced Server 2012 R2 issue, as the actual backup engine remained unchanged in B&R v7 R2. Do you see anything unusual on the host where affected VMs reside during backup (with Task Manager or PerfMon)?

Oletho · Post by **Oletho** » Nov 30, 2013 6:39 am this post

Nothing unusual. The job runs with an excellent speed as long as the CSVs are still connected of course.

Based on my experience with initial Server 2012 Hyper-V and backup (DPM/Veeam) I suspect the OS. But you never know. Maybe it is a misconfiguration somewhere.

A chance is that Veeam support might see something I don't.

Oletho · Post by **Oletho** » Dec 02, 2013 5:40 am this post

During this weekend I created a new job with vm's isolated on their own cluster node and a separate CSV.

Those vm's still stops ungraceful and are restarted on other hosts, but the others seem to be fine during and after backup.

After the job had been running for about half an hour the following events started

Event ID 1230
Error
FailoverClustering
A component on the server did not respond in a timely fashion. This caused the cluster resource 'Virtual Machine vm-name' (resource type 'Virtual Machine', DLL 'vmclusres.dll') to exceed its time-out threshold. As part of cluster health detection, recovery actions will be taken. The cluster will try to automatically recover by terminating and restarting the Resource Hosting Subsystem (RHS) process that is running this resource. Verify that the underlying infrastructure (such as storage, networking, or services) that are associated with the resource are functioning correctly.

Event ID 1146
Critical
FailoverClustering
The cluster Resource Hosting Subsystem (RHS) process was terminated and will be restarted. This is typically associated with cluster health detection and recovery of a resource. Refer to the System event log to determine which resource and resource DLL is causing the issue.

Event ID 1069
Error
FailoverClustering
Cluster resource 'Virtual Machine vm-name' of type 'Virtual Machine' in clustered role 'vm-name' failed.
Based on the failure policies for the resource and role, the cluster service may try to bring the resource online on this node or move the group to another node of the cluster and then restart it. Check the resource and group state using Failover Cluster Manager or the Get-ClusterResource Windows PowerShell cmdlet.

Event ID 21502
Error
Hyper-V-High-Availability
'Virtual Machine Configuration vm-name' failed to unregister the virtual machine configuration during the initialization of the resource: The wait operation timed out. (0x00000102).

After a lot of these events all virtual machines in the job ends up being restarted on other hosts.

Those errors are only seen during backup, when I initiate migration in VMM everything runs fast and smooth.

Oletho · Post by **Oletho** » Dec 03, 2013 4:52 am this post

It was ODX. Turning it off solved the instability problem.

I was aware that ODX could be a problem when the storage system does not support it, but in this case the 3PAR does.

Well, lesson learned

and kudos to Veeam support.

Post by **Gostev** » Dec 03, 2013 4:16 pm this post

Oletho wrote:It was ODX. Turning it off solved the instability problem.

Sadly for ODX, it seems to be the universal recipe these days - whether or not storage supports ODX...

R&D Forums

R2: Virtual machines slows down or goes offline on backup

Re: R2: Virtual machines slows down or goes offline on backu

Re: R2: Virtual machines slows down or goes offline on backu

Re: R2: Virtual machines slows down or goes offline on backu

Re: R2: Virtual machines slows down or goes offline on backu

Re: R2: Virtual machines slows down or goes offline on backu

Who is online