VM IO pause during backup

Hyper-V specific discussions

VM IO pause during backup

Veeam Logoby jamesharper-bsol » Wed Jun 10, 2015 5:26 pm

Hi,

I've got a support case open (Case 00918770) which is getting more complicated by the day and slow moving. I'm hoping someone may have seen this and can provide additional pointers!

We are backing up Exchange 2013 on Hyper-V 2012 R2. 2 DAG nodes, one active, one passive, the passive is being backed up but experiences IO pauses for up to 90 seconds.

Initially it would affect both DAG nodes and cause database failovers but we have split out the active node onto it's own dedicated LUN (there were other VMs sharing it previously). This has stabilized the system as we no longer get DB failovers and customers don't get a disconnect/reconnect anymore, however the passive node still has the problem.

We have worked through multiple things with support:
http://www.veeam.com/kb1744 (cluster changes made no difference)
8.0 Update 2 installed
SAN firmware upgrade (Dell MD3620f to 8.20 - latencies are low, maxing 30-40ms, nowhere near 90 seconds)
The hardware is Cisco UCS & MDS FC switches.
OS & Hyper-V patches
Manual VSS within Exchange VM was fine with no issues
Only happens during backups, we have moved the backup window, the problem follows it

The VM does not go to a saved state, it keeps running and IO just stops. PerfMon graphs show all disk transfer counters drop to 0 for the 90 seconds while the disk queue raises slowly. Our current hypothesis is that it is CSV VSS snapshots that are causing the pause.

We have enabled the "Allow processing of multiple VMs with a single volume snapshot" which has reduced the frequency of the pauses (was every night, now every few days). Even stranger is that one night when the backup job containing this VM did not run (it was paused to allow tape backup) the pause happened when other jobs were running.

The VMs OS disk is on CSV1, and DB on CSV2. Other VMs that are backed up in the other jobs share CSV1 so we think it might be triggering something, although the pause happens to the DB which is on the other LUN CSV2. We will move the OS drive after business approval (it's politically sensitive after all the failovers), but this would be a workaround and does not indicate the root cause.

Any experience/pointers would be much appreciated.

Thanks,
James
jamesharper-bsol
Novice
 
Posts: 3
Liked: 1 time
Joined: Mon Jan 16, 2012 10:30 am
Full Name: James Harper

Re: VM IO pause during backup

Veeam Logoby ptoro » Tue Aug 11, 2015 1:05 pm

bump

I'm also seeing this IO freeze ON VMS that are not even being backed up BUT share the same CSV.

SOFS with SMB3
ptoro
Influencer
 
Posts: 15
Liked: 5 times
Joined: Fri Jul 24, 2015 4:41 pm

Re: VM IO pause during backup

Veeam Logoby davidpollock » Wed Sep 23, 2015 6:17 am

We're seeing an issue which sounds similar to this.

We have a hyper-v cluster with 2 nodes. VMs are stored on CSVs.

During backups, VMs have been randomly failing because the cluster service detects them as being unresponsive. I believe this is because I/O to their system disk hangs. The VM which fails is always on the same CSV as a VM being backed up at the time. We've also reproduced the issue running backups via DPM so it seems more likely to be caused by the SAN or Microsoft VSS. Issue occurs during both on-host and off-host backups. From what I've seen, the issue seems to occur while the SAN is taking a snapshot of the CSV volume, or possibly straight afterwards.

We've logged calls with Microsoft, Veeam and HP. None of them have been able to resolve the issue so far. If you guys got anywhere I'd love to hear about it.

Some more info:
Cluster is running Server 2012 R2
SAN: HP StoreVirtual 4530s running LeftHand OS version 12.0.00.0725.0
HP StoreVirtual DSM MPIO driver v12.0.0.371.1 is installed on the hyper-v hosts.
davidpollock
Novice
 
Posts: 4
Liked: never
Joined: Wed Sep 23, 2015 6:00 am
Full Name: Dave Pollock

Re: VM IO pause during backup

Veeam Logoby akselc » Wed Sep 23, 2015 10:54 am

@davidpollock

Your description looks similar to one case i opened today, thats why i am reading forum posts now.
2 nodes, hyper-v, hp san CSV.
When Veeam is running, the offhost proxy node (hv02) looses connection to the cluster, and vmvirtual NIC cannot ping the other node, and VM's across nodes cannot are not able to contact eachother.
This resulting in, cluster volume1 is partially offline, hv02 cannot connect to it, it only finds volum2 on the SAN, hv03 (other node) have managed to manually mount volume1 as a d: drive.
HP technical have read the SAN logs, and cannot find any errors.
So something is happening with the I/O when node hv02 (offhost proxy) is bacing up vm's.

Hope Veeam looks at this asap.
akselc
Novice
 
Posts: 8
Liked: 1 time
Joined: Tue Apr 22, 2014 12:52 pm
Full Name: Aksel Celasun

Re: VM IO pause during backup

Veeam Logoby foggy » Wed Sep 23, 2015 12:26 pm

In the OP's case, reducing the number of concurrent VSS snapshots has helped to reduce the occurrence of the pauses.

Aksel, have you opened a case with Veeam technical support for this? Dave, posting your case ID here would help us in further tracking of the resolution. Thanks!
foggy
Veeam Software
 
Posts: 15303
Liked: 1133 times
Joined: Mon Jul 11, 2011 10:22 am
Full Name: Alexander Fogelson

Re: VM IO pause during backup

Veeam Logoby akselc » Wed Sep 23, 2015 1:07 pm 1 person likes this post

Hi Foggy


Yes, Case # 01063492

I am currently looking through logs on my HV02 node, and i believe that I/O from Veeam backup is causing loss of connectivity.
Or is triggering the loss.
these 2 patches may be helpful, i will investigate further, and maybe install patches under.

https://support.microsoft.com/en-us/kb/2870270
https://support.microsoft.com/en-us/kb/2813630
akselc
Novice
 
Posts: 8
Liked: 1 time
Joined: Tue Apr 22, 2014 12:52 pm
Full Name: Aksel Celasun

Re: VM IO pause during backup

Veeam Logoby davidpollock » Wed Sep 23, 2015 11:28 pm

Hi Foggy,

Here's our case number: 00937186

We've installed all of the updates on this page: https://support.microsoft.com/en-us/kb/2920151

I believe the issue is most likely somewhere in VSS... the recommendation from Microsoft support was simply to disable heartbeat monitoring on all VMs, which does prevent the VMs from restarting during backups but does not actually solve the issue of them becoming unresponsive.
davidpollock
Novice
 
Posts: 4
Liked: never
Joined: Wed Sep 23, 2015 6:00 am
Full Name: Dave Pollock

Re: VM IO pause during backup

Veeam Logoby akselc » Thu Sep 24, 2015 2:54 pm

Both of the links
https://support.microsoft.com/en-us/kb/2870270
https://support.microsoft.com/en-us/kb/2813630

When trying ot install gives me a message "Not applicable to your system" or something like that.
When downloading i see that i can only choose Windows-8 RTm version, but it says in description installs on server 2012 datacentre....
akselc
Novice
 
Posts: 8
Liked: 1 time
Joined: Tue Apr 22, 2014 12:52 pm
Full Name: Aksel Celasun

Re: VM IO pause during backup

Veeam Logoby davidpollock » Mon Sep 28, 2015 6:11 am

akselc are you sure you're not running 2012 R2..?
davidpollock
Novice
 
Posts: 4
Liked: never
Joined: Wed Sep 23, 2015 6:00 am
Full Name: Dave Pollock

Re: VM IO pause during backup

Veeam Logoby ptoro » Wed Sep 30, 2015 3:04 am

Glad i'm not the only one seeing this issue.

HyperV to our SOFS cluster.

We see this only during VEEAM backup schedule AND happens to servers that are not being backed up by VEEAM but share the same CSV as servers that are being backed up.

Our servers are all Windows 2012 R2 (they all have the latest updates, minus this month set of patches).

All though it doesn't really do much harm to most servers, if you have a server that is in some sort of SQL cluster it will definitely mess with it. We see most IO freeze errors only on servers that have SQL or AD installed on them (would be the most sensitive to IO freeze and "complain" the most).
ptoro
Influencer
 
Posts: 15
Liked: 5 times
Joined: Fri Jul 24, 2015 4:41 pm

Re: VM IO pause during backup

Veeam Logoby pterpumpkin » Thu Dec 08, 2016 11:51 pm

Sorry for dragging up an old post!

We're also seeing this. When Veeam snapshots a CSV, the IO pauses/queues/high latency for up to 30 seconds affecting all VM's on the CSV.

Is this expected behavior?
pterpumpkin
Influencer
 
Posts: 13
Liked: 1 time
Joined: Tue Jun 14, 2016 9:36 am
Full Name: Pter Pumpkin

Re: VM IO pause during backup

Veeam Logoby foggy » Fri Dec 09, 2016 12:29 pm

There's no any resolution in the support cases mentioned in this thread, so I advise you to open your own case to investigate the reasons causing this behavior in your particular environment.
foggy
Veeam Software
 
Posts: 15303
Liked: 1133 times
Joined: Mon Jul 11, 2011 10:22 am
Full Name: Alexander Fogelson

Re: VM IO pause during backup

Veeam Logoby dmalishkin » Mon Mar 20, 2017 8:17 pm

We are seeing same issues on our CSVs. We have different type of SANs and it happens no matter what hardware we use. Our Hyper-V clusters are all 2012 R2.

Once veeam is backing up single VM, the entire CSV is slowed down so bad in some cases machines crash, or reboot.

Working with Veeam currently but not really seeing anything obvious.
dmalishkin
Novice
 
Posts: 9
Liked: never
Joined: Tue Jul 14, 2009 6:36 am
Full Name: Daniel M

Re: VM IO pause during backup

Veeam Logoby Mike Resseler » Mon Mar 20, 2017 8:25 pm

Hi Daniel,

Can you let us know your support case ID?

Are your Hyper-V clusters all patched (and not only the windows updates, but also our recommended hotfixes?)
Mike Resseler
Veeam Software
 
Posts: 3382
Liked: 384 times
Joined: Fri Feb 08, 2013 3:08 pm
Location: Belgium, the land of the fries, the beer, the chocolate and the diamonds...
Full Name: Mike Resseler

Re: VM IO pause during backup

Veeam Logoby kcm_aaron » Mon Mar 27, 2017 4:31 pm

We have been having the exact same issue for several months now - during overnight VSS based snapshots, we are seeing Hyper-V VMs "fail", and reboot - according to policy. The VMs that crash are never part of the group of VMs being backed up, but do share storage on the CSV(s) involved in the snapshotting process.

We are actually in the process of moving to VEEAM, because of these issues we are seeing with our current backup product - NetApp SnapManager for Hyper-V (SMHV). SMHV uses a proprietary Data ONTAP VSS hardware provider when taking snapshots of the CSVs housing our VMs. Almost every night, at least one VM will fail during the overnight backup process. So, I started testing with VEEAM and found that I don't have the issue if I use the native Microsoft software VSS provider, but do have the issue when using the ONTAP VSS provider. This seems to point to the VSS provider being the issue, but I don't want to move to VEEAM yet, in case we're just masking the source of an ongoing problem.

I have updated all of our server firmware, SAN firmware, Windows updates and recommended hotfixes (recommendation by Microsoft, NetApp and VEEAM) and have even migrated all of our VM related storage to freshly provisioned volumes/luns per NetApp support recommendation, all to no avail. I have open cases with Cisco, Microsoft and NetApp, but they all point the finger at the other vendor.

Next, I will be disabling ODX on my Hyper-V hosts to see if that has any impact, but I just wanted to post our current situation here in case it helps anyone else. If there are any other suggestions out there, please let me know! Thanks!
kcm_aaron
Lurker
 
Posts: 2
Liked: never
Joined: Mon Mar 27, 2017 4:19 pm
Full Name: Aaron

Next

Return to Microsoft Hyper-V



Who is online

Users browsing this forum: No registered users and 1 guest