Discussions specific to the Microsoft Hyper-V hypervisor
Post Reply
richw
Veeam ProPartner
Posts: 6
Liked: never
Joined: Aug 13, 2012 4:46 am
Full Name: Richard Warren
Location: Melbourne, VIC, AU
Contact:

Why do Hyper-V VMs have so many snapshot issues ?

Post by richw » Oct 23, 2015 1:25 am

I'd like to start a general discussion to help in understanding the cause of snapshot issues, and the behaviour of Veeam in such cases. Also to discuss methods and best practices for backing up VM's on Hyper-V

I am managing 4 Hyper-V environments being backed up by Veeam 8, and the frequency of failures and warnings due to VSS issues is a constant pain to me.

I'd like to gain a better understanding of why VSS errors happen, how they disappear within 10 minutes and succeed on a retry, why the same error can sometimes cause a fail and other times a warning only.

Background:
My approach to backing up VM's on Hyper-V has been to not use Application aware unless I need to truncate logs in SQL or Exchange. This is to minimise the likelihood of VSS errors happening. So I have enabled Application Aware and disabled specific VM's that don't need it. That approach led to feedback from customers that certain VM's hang during backup. This was being caused by the VM being suspended in Hyper-V, and led to a recommendation to enable Hyper-V guest quiescence and "Take crash consistent snapshot instead of suspending VM" in the Advanced settings of the job. Now that causes VM's that would otherwise only get a Hyper-V level snapshot to also have a guest level snapshot, which is a highly unreliable event.

So this brings me to the point of either having to troubleshoot VSS issues, or start breaking jobs out into groups based on common settings and common errors.
This is do-able but not ideal.
I'm wondering what others are doing in this situation, how do you set up Hyper-V jobs for best reliability and low maintenance?

Looking through the Action pane from job History, it occurs to me maybe Veeam could do more to be more specific at snapshot/VSS error handling.
If any developers see this, would it be possible to add more detail in the action pane;

Example
Unable to create snapshot (Microsoft CSV Shadow Copy Provider) (mode: Hyper-V child partition snapshot). - has this error come from host or guest ? What does the mode mean? what combination of job settings have caused this particular type of snapshot?
Details: Unknown status of async operation The shadow copy provider had an error. What does that mean, can a link be provided to a KB article?
Check the System and Application event logs for more information. - Could Veeam check the logs and provide the information?
--tr:Failed to create VSS snapshot.
--tr:Failed to perform pre-backup tasks.


Below is the same error from another VM in the same job. This VM below has Veeam App Aware enabled, the VM above does not. The job has Hyper-V guest quiescence enabled. The one below failed (then succeeded on Retry 1) The one above caused a warning, with indications it had failed, but wasn't retried, and did result in a restore point. this behaviour is confusing.
Could the actions pane contain more detail to explain the behaviour? Also what Veeam error handling rules have been applied to arrive at the job status?

Unable to create snapshot (Microsoft CSV Shadow Copy Provider) (mode: Veeam application-aware processing). Details: Unknown status of async operation
The shadow copy provider had an unexpected error while trying to process the specified operation.
--tr:Failed to create VSS snapshot.
--tr:Failed to perform pre-backup tasks.

foggy
Veeam Software
Posts: 18257
Liked: 1560 times
Joined: Jul 11, 2011 10:22 am
Full Name: Alexander Fogelson
Contact:

Re: Why do Hyper-V VMs have so many snapshot issues ?

Post by foggy » Oct 23, 2015 2:24 pm

Richard, details are typically contained in the job logs, have you ever contacted technical support with those VSS issues? I'm sure they are able to assist in pinpointing the actual reason of the failure in each case. In examples you've provided the failure message comes from the VSS provider, not Veeam B&R component, so troubleshooting should start from defining the consequences that caused it (most likely, environmental).

"mode: Hyper-V child partition snapshot" means that you have application-aware image processing disabled and Hyper-V guest quiescence enabled. The fact that the job failed in the second case is explained by the "Require successful processing" setting for the application-aware image processing that you most likely have configured.

Gostev
SVP, Product Management
Posts: 24789
Liked: 3523 times
Joined: Jan 01, 2006 1:01 am
Location: Baar, Switzerland
Contact:

Re: Why do Hyper-V VMs have so many snapshot issues ?

Post by Gostev » Oct 23, 2015 3:23 pm

richw wrote:Why do Hyper-V VMs have so many snapshot issues ?
Because Hyper-V VSS backup framework is not very reliable generally, I believe at some point (not long ago) it accounted for about 30% of all Hyper-V support cases for Microsoft... this is the main reason why Microsoft is completely revamping VM backup API in Windows Server 2016.

That said, in many cases, fine tuning usually helps solve most of those intermittent issues, so I suggest you work with some T3 experts in our support to tweak a few things around your deployment.

jshiflet
Influencer
Posts: 13
Liked: never
Joined: Sep 17, 2012 6:35 pm
Full Name: Jason Shiflet
Location: Dallas, TX
Contact:

Re: Why do Hyper-V VMs have so many snapshot issues ?

Post by jshiflet » Oct 27, 2015 12:03 am

I'll add that Hyper-V VSS errors are a constant headache for me. I've got so many VMs that constantly give me warnings because it couldn't properly run the VSS snapshot process, and many even on SQL server systems, so logs never get truncated unless I do so manually. I know full well it's not a Veeam issue because I've run though it with Veeam support before, but I thought I'd add my $0.02 to the conversation.

For us, about 95% of the time, the error is:

Code: Select all

Unable to create snapshot (Microsoft CSV Shadow Copy Provider) (mode: Veeam application-aware processing with failover). Details: Writer 'Microsoft Hyper-V VSS Writer' is failed at 'VSS_WS_FAILED_AT_PREPARE_SNAPSHOT'.
The writer experienced a non-transient error.  If the backup process is retried,
the error is likely to reoccur.
--tr:Failed to verify writers state.
--tr:Failed to perform pre-backup tasks. 
With this immediately following:

Code: Select all

Make sure VM does not have 'iSCSI Software Target Storage Provider' feature installed.
Which is wholly fun because we don't use Microsoft's iSCSI Software Target Provider ANYWHERE in our systems. At all. We use the initiator on the Host systems, but not in the VMs. None of our virtual machines use anything iSCSI at all.

When I went though it with Veeam support, we never really got anywhere with it other than "it's probably a bug with Hyper-V's VSS management". The end-result suggestion being "recreate the VM metadata", basically create a new VM and attach the existing VHDs to the new VM. Sometimes that worked, sometimes it didn't. And the times that it did work, it worked without issue for maybe a week or two before coming back to throwing warnings again.

Now that it's been ongoing for probably close to 2 years, I've found some consistency in which machines will almost inevitably have this problem. They usually don't start off with the issue, but at some point, weeks or months down the road, it rears its ugly head.

These are the consistent factors:
  • Version 1 VMs
  • Running Windows Server 2008 R2
  • Running antivirus (we use ESET File Security exclusively, but I've tested with Webroot SecureAnywhere Business Endpoint Protection and had the same result)
Although that last one with antivirus is semi-consistent. If the problem happens on a system that has had antivirus installed on it at ANYTIME in the past, it'll keep doing it. A fresh VM likely won't ever do it so long as you NEVER load antivirus on it. Uninstalling typically doesn't fix it. And even with that, running a Windows system without antivirus is just asking for trouble.

I'd honestly love to be able to use VMware rather than Hyper-V in our environment, but it's just not affordable for us yet, and manageability-wise, we're a 100% Microsoft shop, so getting people to learn and become familiar with anything different is a nightmare.

I'm glad to hear that Microsoft is revamping the Hyper-V Backup API in Server 2016. I really, REALLY hope that makes it more robust and fixes a lot of these types of issues.
--
jason shiflet
systems enginner

tj.kline
Lurker
Posts: 1
Liked: 1 time
Joined: Jul 22, 2016 2:15 pm
Full Name: TJ Kline
Contact:

Re: Why do Hyper-V VMs have so many snapshot issues ?

Post by tj.kline » Jul 22, 2016 2:36 pm 1 person likes this post

Sorry to bring back an old post, but I thought I would share my findings as I had the same issues as @jshiflet.

Same error, same 'but we're not using iSCSI' BS, but different OS's. Were getting this error on a cluster where both the host and guest is 2012r2 running version 2 vm's. Only on application aware servers, and in this case they are Sharepoint 2013 farm with SQL 2014. We tried recreating the config file as you stated without luck, as well as putting the VM's on a totally different cluster (thats not having the issue, although another instance of the same setup is on it) and still had the same issue. Took antivirus out of the question, as well as GFI. Still no luck.

We are currently working with both MS and Veeam support on this issue, and our only glimmer of hope that we have is running a

Code: Select all

sc config "BITS" type= own
command on all of the vss writers (found HEREhttps://www.veeam.com/kb2041) to basically put each of them on a separate instance of svchost.exe. We've been running about 5 days of backups every half hour without a failure yet. We used to get about 6-7 backups before it would fail. Funny thing - we found that on our own. MS is flabbergasted that its working.

Before the VM's would start failing, we would get an application error in the logs that said something along the lines of 'Faulting application name: svchost.exe'. We came to the conclusion that one of the writers (or another application using svchost.exe) would kill it to the point it couldnt restart, and all the vss writers would be in a failed state. To the point where if you run a 'vssadmin list writers', nothing would show up.

Again, this is just our findings so far, and while it seems to be working:
1. this is more of a workaround than a solution
2. Not tested in the long haul yet.

If you end up trying it, let me know. Curious to see if this works for everyone of just us.

Vitaliy S.
Product Manager
Posts: 22984
Liked: 1556 times
Joined: Mar 30, 2009 9:13 am
Full Name: Vitaliy Safarov
Contact:

Re: Why do Hyper-V VMs have so many snapshot issues ?

Post by Vitaliy S. » Jul 25, 2016 11:08 am

Thanks for sharing this! Should be useful for future readers.

nmdange
Expert
Posts: 469
Liked: 113 times
Joined: Aug 20, 2015 9:30 pm
Contact:

Re: Why do Hyper-V VMs have so many snapshot issues ?

Post by nmdange » Jul 27, 2016 1:17 pm

Did you run that command on the hosts or the VMs?

Vitaliy S.
Product Manager
Posts: 22984
Liked: 1556 times
Joined: Mar 30, 2009 9:13 am
Full Name: Vitaliy Safarov
Contact:

Re: Why do Hyper-V VMs have so many snapshot issues ?

Post by Vitaliy S. » Aug 01, 2016 12:56 pm

It should be done on the host level (based on the description of the error above).

Post Reply

Who is online

Users browsing this forum: No registered users and 5 guests