Why do Hyper-V VMs have so many snapshot issues ?

Hyper-V specific discussions

Why do Hyper-V VMs have so many snapshot issues ?

Veeam Logoby richw » Fri Oct 23, 2015 1:25 am

I'd like to start a general discussion to help in understanding the cause of snapshot issues, and the behaviour of Veeam in such cases. Also to discuss methods and best practices for backing up VM's on Hyper-V

I am managing 4 Hyper-V environments being backed up by Veeam 8, and the frequency of failures and warnings due to VSS issues is a constant pain to me.

I'd like to gain a better understanding of why VSS errors happen, how they disappear within 10 minutes and succeed on a retry, why the same error can sometimes cause a fail and other times a warning only.

Background:
My approach to backing up VM's on Hyper-V has been to not use Application aware unless I need to truncate logs in SQL or Exchange. This is to minimise the likelihood of VSS errors happening. So I have enabled Application Aware and disabled specific VM's that don't need it. That approach led to feedback from customers that certain VM's hang during backup. This was being caused by the VM being suspended in Hyper-V, and led to a recommendation to enable Hyper-V guest quiescence and "Take crash consistent snapshot instead of suspending VM" in the Advanced settings of the job. Now that causes VM's that would otherwise only get a Hyper-V level snapshot to also have a guest level snapshot, which is a highly unreliable event.

So this brings me to the point of either having to troubleshoot VSS issues, or start breaking jobs out into groups based on common settings and common errors.
This is do-able but not ideal.
I'm wondering what others are doing in this situation, how do you set up Hyper-V jobs for best reliability and low maintenance?

Looking through the Action pane from job History, it occurs to me maybe Veeam could do more to be more specific at snapshot/VSS error handling.
If any developers see this, would it be possible to add more detail in the action pane;

Example
Unable to create snapshot (Microsoft CSV Shadow Copy Provider) (mode: Hyper-V child partition snapshot). - has this error come from host or guest ? What does the mode mean? what combination of job settings have caused this particular type of snapshot?
Details: Unknown status of async operation The shadow copy provider had an error. What does that mean, can a link be provided to a KB article?
Check the System and Application event logs for more information. - Could Veeam check the logs and provide the information?
--tr:Failed to create VSS snapshot.
--tr:Failed to perform pre-backup tasks.


Below is the same error from another VM in the same job. This VM below has Veeam App Aware enabled, the VM above does not. The job has Hyper-V guest quiescence enabled. The one below failed (then succeeded on Retry 1) The one above caused a warning, with indications it had failed, but wasn't retried, and did result in a restore point. this behaviour is confusing.
Could the actions pane contain more detail to explain the behaviour? Also what Veeam error handling rules have been applied to arrive at the job status?

Unable to create snapshot (Microsoft CSV Shadow Copy Provider) (mode: Veeam application-aware processing). Details: Unknown status of async operation
The shadow copy provider had an unexpected error while trying to process the specified operation.
--tr:Failed to create VSS snapshot.
--tr:Failed to perform pre-backup tasks.
richw
Veeam ProPartner
 
Posts: 6
Liked: never
Joined: Mon Aug 13, 2012 4:46 am
Location: Melbourne, VIC, AU
Full Name: Richard Warren

Re: Why do Hyper-V VMs have so many snapshot issues ?

Veeam Logoby foggy » Fri Oct 23, 2015 2:24 pm

Richard, details are typically contained in the job logs, have you ever contacted technical support with those VSS issues? I'm sure they are able to assist in pinpointing the actual reason of the failure in each case. In examples you've provided the failure message comes from the VSS provider, not Veeam B&R component, so troubleshooting should start from defining the consequences that caused it (most likely, environmental).

"mode: Hyper-V child partition snapshot" means that you have application-aware image processing disabled and Hyper-V guest quiescence enabled. The fact that the job failed in the second case is explained by the "Require successful processing" setting for the application-aware image processing that you most likely have configured.
foggy
Veeam Software
 
Posts: 14716
Liked: 1075 times
Joined: Mon Jul 11, 2011 10:22 am
Full Name: Alexander Fogelson

Re: Why do Hyper-V VMs have so many snapshot issues ?

Veeam Logoby Gostev » Fri Oct 23, 2015 3:23 pm

richw wrote:Why do Hyper-V VMs have so many snapshot issues ?

Because Hyper-V VSS backup framework is not very reliable generally, I believe at some point (not long ago) it accounted for about 30% of all Hyper-V support cases for Microsoft... this is the main reason why Microsoft is completely revamping VM backup API in Windows Server 2016.

That said, in many cases, fine tuning usually helps solve most of those intermittent issues, so I suggest you work with some T3 experts in our support to tweak a few things around your deployment.
Gostev
Veeam Software
 
Posts: 21385
Liked: 2348 times
Joined: Sun Jan 01, 2006 1:01 am
Location: Baar, Switzerland

Re: Why do Hyper-V VMs have so many snapshot issues ?

Veeam Logoby jshiflet » Tue Oct 27, 2015 12:03 am

I'll add that Hyper-V VSS errors are a constant headache for me. I've got so many VMs that constantly give me warnings because it couldn't properly run the VSS snapshot process, and many even on SQL server systems, so logs never get truncated unless I do so manually. I know full well it's not a Veeam issue because I've run though it with Veeam support before, but I thought I'd add my $0.02 to the conversation.

For us, about 95% of the time, the error is:

Code: Select all
Unable to create snapshot (Microsoft CSV Shadow Copy Provider) (mode: Veeam application-aware processing with failover). Details: Writer 'Microsoft Hyper-V VSS Writer' is failed at 'VSS_WS_FAILED_AT_PREPARE_SNAPSHOT'.
The writer experienced a non-transient error.  If the backup process is retried,
the error is likely to reoccur.
--tr:Failed to verify writers state.
--tr:Failed to perform pre-backup tasks.


With this immediately following:

Code: Select all
Make sure VM does not have 'iSCSI Software Target Storage Provider' feature installed.


Which is wholly fun because we don't use Microsoft's iSCSI Software Target Provider ANYWHERE in our systems. At all. We use the initiator on the Host systems, but not in the VMs. None of our virtual machines use anything iSCSI at all.

When I went though it with Veeam support, we never really got anywhere with it other than "it's probably a bug with Hyper-V's VSS management". The end-result suggestion being "recreate the VM metadata", basically create a new VM and attach the existing VHDs to the new VM. Sometimes that worked, sometimes it didn't. And the times that it did work, it worked without issue for maybe a week or two before coming back to throwing warnings again.

Now that it's been ongoing for probably close to 2 years, I've found some consistency in which machines will almost inevitably have this problem. They usually don't start off with the issue, but at some point, weeks or months down the road, it rears its ugly head.

These are the consistent factors:
  • Version 1 VMs
  • Running Windows Server 2008 R2
  • Running antivirus (we use ESET File Security exclusively, but I've tested with Webroot SecureAnywhere Business Endpoint Protection and had the same result)

Although that last one with antivirus is semi-consistent. If the problem happens on a system that has had antivirus installed on it at ANYTIME in the past, it'll keep doing it. A fresh VM likely won't ever do it so long as you NEVER load antivirus on it. Uninstalling typically doesn't fix it. And even with that, running a Windows system without antivirus is just asking for trouble.

I'd honestly love to be able to use VMware rather than Hyper-V in our environment, but it's just not affordable for us yet, and manageability-wise, we're a 100% Microsoft shop, so getting people to learn and become familiar with anything different is a nightmare.

I'm glad to hear that Microsoft is revamping the Hyper-V Backup API in Server 2016. I really, REALLY hope that makes it more robust and fixes a lot of these types of issues.
--
jason shiflet
systems enginner
jshiflet
Service Provider
 
Posts: 9
Liked: never
Joined: Mon Sep 17, 2012 6:35 pm
Location: Dallas, TX
Full Name: Jason Shiflet

Re: Why do Hyper-V VMs have so many snapshot issues ?

Veeam Logoby tj.kline » Fri Jul 22, 2016 2:36 pm 1 person likes this post

Sorry to bring back an old post, but I thought I would share my findings as I had the same issues as @jshiflet.

Same error, same 'but we're not using iSCSI' BS, but different OS's. Were getting this error on a cluster where both the host and guest is 2012r2 running version 2 vm's. Only on application aware servers, and in this case they are Sharepoint 2013 farm with SQL 2014. We tried recreating the config file as you stated without luck, as well as putting the VM's on a totally different cluster (thats not having the issue, although another instance of the same setup is on it) and still had the same issue. Took antivirus out of the question, as well as GFI. Still no luck.

We are currently working with both MS and Veeam support on this issue, and our only glimmer of hope that we have is running a
Code: Select all
sc config "BITS" type= own
command on all of the vss writers (found HEREhttps://www.veeam.com/kb2041) to basically put each of them on a separate instance of svchost.exe. We've been running about 5 days of backups every half hour without a failure yet. We used to get about 6-7 backups before it would fail. Funny thing - we found that on our own. MS is flabbergasted that its working.

Before the VM's would start failing, we would get an application error in the logs that said something along the lines of 'Faulting application name: svchost.exe'. We came to the conclusion that one of the writers (or another application using svchost.exe) would kill it to the point it couldnt restart, and all the vss writers would be in a failed state. To the point where if you run a 'vssadmin list writers', nothing would show up.

Again, this is just our findings so far, and while it seems to be working:
1. this is more of a workaround than a solution
2. Not tested in the long haul yet.

If you end up trying it, let me know. Curious to see if this works for everyone of just us.
tj.kline
Lurker
 
Posts: 1
Liked: 1 time
Joined: Fri Jul 22, 2016 2:15 pm
Full Name: TJ Kline

Re: Why do Hyper-V VMs have so many snapshot issues ?

Veeam Logoby Vitaliy S. » Mon Jul 25, 2016 11:08 am

Thanks for sharing this! Should be useful for future readers.
Vitaliy S.
Veeam Software
 
Posts: 19538
Liked: 1097 times
Joined: Mon Mar 30, 2009 9:13 am
Full Name: Vitaliy Safarov

Re: Why do Hyper-V VMs have so many snapshot issues ?

Veeam Logoby nmdange » Wed Jul 27, 2016 1:17 pm

Did you run that command on the hosts or the VMs?
nmdange
Expert
 
Posts: 191
Liked: 57 times
Joined: Thu Aug 20, 2015 9:30 pm

Re: Why do Hyper-V VMs have so many snapshot issues ?

Veeam Logoby Vitaliy S. » Mon Aug 01, 2016 12:56 pm

It should be done on the host level (based on the description of the error above).
Vitaliy S.
Veeam Software
 
Posts: 19538
Liked: 1097 times
Joined: Mon Mar 30, 2009 9:13 am
Full Name: Vitaliy Safarov


Return to Microsoft Hyper-V



Who is online

Users browsing this forum: efranklin and 9 guests