Host-based backup of Microsoft Hyper-V VMs.
Post Reply
Oletho
Enthusiast
Posts: 67
Liked: 2 times
Joined: Sep 17, 2010 4:37 am
Full Name: Ole Thomsen
Contact:

R2: Virtual machines slows down or goes offline on backup

Post by Oletho »

Support case ID: 00484197

This is a new three-node Windows Server 2012 R2 cluster managed with VMM R2. Virtual machines are created on old Server 2012 cluster CSV, and those CSVs were attached to new cluster after upgrading.

Storage for vms is HP 3PAR FC, backup target is Qnap mapped with iSCSI from virtual Veeam 7 R2 machine. The exact same backup setup was running fine on 2012 cluster.

Shortly after starting the first job users complained about things running slow, desktops freezing and machines going offline. When the job finished things started to behave better, even though the cluster was in a limbo several hours after. Later that night we tried to start the job again, just to be sure that backup was to blame and that it was not just a coincidence. Same thing happened.

There are only few errors that are not just a result of things going wrong. One of these are

Event id 1
vds basic provider
Unexpected failure. Error code: 48F@01000003

I remember having this error back in the early days of Server 2012 Hyper-V when CSVs went offline during almost every backup. This was eventually solved by a hotfix (series of hotfixes actually) from MS.

Another error is this:

Event id 113
Hyper-V-VmSwitch
Failed to allocate VMQ for NIC 183BF80F-566B-4441-9AC9-260B0DF93CFE--374E77C8-D19B-48AC-AE74-C5F8AE23CFD4 (Friendly Name: udv-ab02) on switch 13042E4B-C900-41BA-B649-749EE89B6D90 (Friendly Name: lan logical switch). Reason - The OID failed. Status = Unknown NTSTATUS Error code: 0xc0231001

I have now turned off VMQ in both software and hardware, but if that helps is still to be seen.

I wiil be testing backup on a dedicated clusternode with dedicated CSV later this weekend.
Gostev
Chief Product Officer
Posts: 31814
Liked: 7302 times
Joined: Jan 01, 2006 1:01 am
Location: Baar, Switzerland
Contact:

Re: R2: Virtual machines slows down or goes offline on backu

Post by Gostev »

Perhaps some newly introduced Server 2012 R2 issue, as the actual backup engine remained unchanged in B&R v7 R2. Do you see anything unusual on the host where affected VMs reside during backup (with Task Manager or PerfMon)?
Oletho
Enthusiast
Posts: 67
Liked: 2 times
Joined: Sep 17, 2010 4:37 am
Full Name: Ole Thomsen
Contact:

Re: R2: Virtual machines slows down or goes offline on backu

Post by Oletho »

Nothing unusual. The job runs with an excellent speed as long as the CSVs are still connected of course.

Based on my experience with initial Server 2012 Hyper-V and backup (DPM/Veeam) I suspect the OS. But you never know. Maybe it is a misconfiguration somewhere.

A chance is that Veeam support might see something I don't.
Oletho
Enthusiast
Posts: 67
Liked: 2 times
Joined: Sep 17, 2010 4:37 am
Full Name: Ole Thomsen
Contact:

Re: R2: Virtual machines slows down or goes offline on backu

Post by Oletho »

During this weekend I created a new job with vm's isolated on their own cluster node and a separate CSV.

Those vm's still stops ungraceful and are restarted on other hosts, but the others seem to be fine during and after backup.

After the job had been running for about half an hour the following events started

Event ID 1230
Error
FailoverClustering
A component on the server did not respond in a timely fashion. This caused the cluster resource 'Virtual Machine vm-name' (resource type 'Virtual Machine', DLL 'vmclusres.dll') to exceed its time-out threshold. As part of cluster health detection, recovery actions will be taken. The cluster will try to automatically recover by terminating and restarting the Resource Hosting Subsystem (RHS) process that is running this resource. Verify that the underlying infrastructure (such as storage, networking, or services) that are associated with the resource are functioning correctly.

Event ID 1146
Critical
FailoverClustering
The cluster Resource Hosting Subsystem (RHS) process was terminated and will be restarted. This is typically associated with cluster health detection and recovery of a resource. Refer to the System event log to determine which resource and resource DLL is causing the issue.

Event ID 1069
Error
FailoverClustering
Cluster resource 'Virtual Machine vm-name' of type 'Virtual Machine' in clustered role 'vm-name' failed.
Based on the failure policies for the resource and role, the cluster service may try to bring the resource online on this node or move the group to another node of the cluster and then restart it. Check the resource and group state using Failover Cluster Manager or the Get-ClusterResource Windows PowerShell cmdlet.

Event ID 21502
Error
Hyper-V-High-Availability
'Virtual Machine Configuration vm-name' failed to unregister the virtual machine configuration during the initialization of the resource: The wait operation timed out. (0x00000102).

After a lot of these events all virtual machines in the job ends up being restarted on other hosts.

Those errors are only seen during backup, when I initiate migration in VMM everything runs fast and smooth.
Oletho
Enthusiast
Posts: 67
Liked: 2 times
Joined: Sep 17, 2010 4:37 am
Full Name: Ole Thomsen
Contact:

Re: R2: Virtual machines slows down or goes offline on backu

Post by Oletho »

It was ODX. Turning it off solved the instability problem.

I was aware that ODX could be a problem when the storage system does not support it, but in this case the 3PAR does.

Well, lesson learned :) and kudos to Veeam support.
Gostev
Chief Product Officer
Posts: 31814
Liked: 7302 times
Joined: Jan 01, 2006 1:01 am
Location: Baar, Switzerland
Contact:

Re: R2: Virtual machines slows down or goes offline on backu

Post by Gostev »

Oletho wrote:It was ODX. Turning it off solved the instability problem.
Sadly for ODX, it seems to be the universal recipe these days - whether or not storage supports ODX...
Post Reply

Who is online

Users browsing this forum: No registered users and 7 guests