Host-based backup of Microsoft Hyper-V VMs.
Post Reply
rleon
Enthusiast
Posts: 76
Liked: 9 times
Joined: Jun 15, 2017 8:10 am
Full Name: RLeon
Contact:

Strange issue where Hyper-V ver5 VMs runing on a specific SMB3 storage will fail backups

Post by rleon »

Hi all,
We have already opened a Case # 07050740, just thought I'd start a thread here too to see if anyone else has encountered similar issues.

The job fail error message for specific VMs is:

Code: Select all

5:04:04 PM	Failed to create VM recovery checkpoint (mode: Hyper-V child partition snapshot) Details: Unknown status of async operation The shadow copy provider had an unexpected error while trying to process the specified operation. --tr:Failed to create VSS snapshot. --tr:Failed to perform pre-backup tasks. 	00:20
5:04:24 PM	Retrying snapshot creation attempt (Unknown status of async operation The shadow copy provider had an unexpected error while trying to process the specified operation. --tr:Failed to create VSS snapshot. --tr:Failed to perform pre-backup tasks.) 	
Environment:
Veeam version: 12.1 running on Windows Server 2022
Hyper-V version: Windows Server 2019

All hosts in the Hyper-V cluster store all VMs in one of the two SMB3 paths:
\\NETAPP1\...
\\NETAPP2\...

The above two SMB3 storage paths are not backed by Windows Servers, but are backed by two NetApp hardware NAS storage.
Both NetApp NAS are running ONTAP 9.8.
To the best of my knowledge, both NetApps are configured identically when it comes to SMB3/CIFS related configs.

Problem:
  • Any VMs still using Configuration Version 5 AND stored on \\NETAPP1 will get backup error.
  • Any VMs using Configuration Version 9 AND stored on \\NETAPP1 will backcup successfully.
  • Both ver5 and ver9 VMs will backup successfully if they are stored on \\NETAPP2.
  • Even ver5 VMs that got backup errors when they were on \\NETAPP1, if storage-migrated to \\NETAPP2, will backup successfully.
  • If ver5 VMs storage-migrated back to \\NETAPP1, will get backup error again.
  • If a ver5 VM in \\NETAPP1 is in-place upgraded to ver9, then the backup will now complete successfully.
Short summary: Only Ver5 VMs stored on \\NETAPP1 will get backup errors.

We have already tested and isolated out the following:
  • Enabling or disabling CBT in the Job settings will not affect the behavior.
  • Enabling or disabling "Allow processing of multiple VMs with a single volume snapshot" in the Job settings will not affect the behavior.
  • Enabled or disabling Hyper-V guest quiescenc with/without crash consistent in the Job settings will not affect the behavior.
  • The VM Generation version does not affect the behavior. Does not matter if the VM is Gen1 or Gen2 version.
  • Whether the VM is Windows or Linux does not affect the behavior.
  • Whether the VM is powered-on or shutdown does not affect the behavior.
  • Whether the VM is using .VHD or .VHDX virtual disks does not affect the behavior.
  • Does not matter which Hyper-V host in the cluster the VM is running on, does not affect the behavior.
  • Does not matter if the VM is a really old ver5 VM, or if it is a newly created ver5 VM, does not affect the behavior.
  • All VMs can successfully create production-checkpoints or standard-checkpoints when done so manually in Hyper-V's management interface.
We have narrowed it down to using the following four testing VMs to continue troubleshooting this issue:
veeantest1 - ver5 VM running on \\NETAPP1 (backup will always fail, unless storage-mirated to \\NETAPP2)
veeantest2 - ver9 VM running on \\NETAPP1 (backup will always succeed)
veeantest3 - ver5 VM running on \\NETAPP2 (backup will always succeed, unless storage-migrated to \\NETAPP1)
veeantest4 - ver9 VM running on \\NETAPP2 (backup will always succeed)

Currently I'm in the process of collecting guest OS logs for the support... Which I thought was strange because this issue happens whether the guest OS is powered-on or powered-off... Anyhow, let's see where this leads us.
rleon
Enthusiast
Posts: 76
Liked: 9 times
Joined: Jun 15, 2017 8:10 am
Full Name: RLeon
Contact:

Re: Strange issue where Hyper-V ver5 VMs runing on a specific SMB3 storage will fail backups

Post by rleon »

Update:
Very interesting development...
We created two new VMs, one ver5, the other ver9.
This time, these new VMs are not given virtual disks. That means not even a guest OS exists.
We put both VM's configuration files (their "essence" I suppose) in \\NETAPP1, ran a backup job, the ver5 VM failed, and the ver9 VM succeeded.
We then storage-migrated both VMs to \\NETAPP2, ran the job again, both VM succeeded this time.
We then storage-migrated both VMs back to \\NETAPP1, ran the job again for the third time, then ver5 VM failed again, and ver9 VM still succeeded.

At least now we know this issue has nothing to do with a VM's virtual disks.
nmdange
Veteran
Posts: 527
Liked: 142 times
Joined: Aug 20, 2015 9:30 pm
Contact:

Re: Strange issue where Hyper-V ver5 VMs runing on a specific SMB3 storage will fail backups

Post by nmdange »

VM Config Version 5 = Windows Server 2012 R2. It does not support RCT (Hyper-V's native change block tracking) and there's no reason to have any VMs on such an old configuration version at this point. We upgraded all of our VMs to VM config version 8 the minute we migrated from Hyper-V 2012 to Hyper-V 2016.
rleon
Enthusiast
Posts: 76
Liked: 9 times
Joined: Jun 15, 2017 8:10 am
Full Name: RLeon
Contact:

Re: Strange issue where Hyper-V ver5 VMs runing on a specific SMB3 storage will fail backups

Post by rleon »

I so wish I could upgrade all our ver5 VMs to ver9, but that's another red tape battle for another time.
But for as long as Veeam still supports ver5 VMs, I'm force to find a solution to this problem...
(Veeam 12 supports Hyper-V virtual hardware versions 5.0 to 10.0": https://helpcenter.veeam.com/docs/backu ... ml?ver=120)
rleon
Enthusiast
Posts: 76
Liked: 9 times
Joined: Jun 15, 2017 8:10 am
Full Name: RLeon
Contact:

Re: Strange issue where Hyper-V ver5 VMs runing on a specific SMB3 storage will fail backups

Post by rleon »

For anyone else is facing this problem:
In the end, it was the share permissions (not NTFS ACL permissions, those were fine).
The SMB3 share that works for both ver5 and ver9 VMs has in its share permissions all the Hyper-V computer objects at Full Control, as well as "Everyone" also at Full Control.
The SMB3 share that only works for ver9 VMs also has all the Hyper-V computer objects at Full Control, but does not have "Everyone".

It was actually one of the very first thing we discovered before even opening the support case and before creating this thread.
Seeing how the working share has "Everyone" in its share permissions, we also added "Everyone" to the non-working share.
But backups of ver5 VM still failed.
Turned out, the changing of the share permission does not take effect unless the SMB3 session between the Hyper-V host's computer object and the share is reset.
Problem is, unlike a normal share access of say, a Word document, where the same connection session is rarely kept alive for too long, with Hyper-V, the host's computer object keeps the same connection sessions going for as long as there are VMs still online.
That means the newly added "Everyone" share permission never gets to take effect between the Hyper-V host and the share.

The following two workarounds worked for us:
1. After adding the "Everyone Full Control" share permission to the NetApp SMB3 share, live-migrate all VMs off of a Hyper-V host then reboot it. After the host reboots, the new share permissions will take effect on the host's new connection sessions. Live-migrate the VMs back to the host, backups of ver5 VMs will now work.
2. If you don't want to, or cannot live-migrate your VMs off of a Hyper-V host, then on NetApp, on the NTFS volume with the SMB3 share, create a second SMB3 share. (I.e.: Two shares pointing to the same NTFS volume, essentially to the same VM files) Give this new share the "Everyone Full Control" permission. Then on Hyper-V, live-storage-migrate your VMs to this new share. Backups of ver5 VMs will now work. The funny thing about this method is that the VMs essentially get migrated in-place inside the same actual storage volume, but Hyper-V thinks it's a different SMB3 storage altogether.

If any Microsoft and SMB3 experts know of ways to "refresh" the share permission changes online without rebooting the Hyper-V host or affecting running VMs, please share! (the pun)

Though we have the workaround, the question remains: Why does Hyper-V need the "Everyone Full Control" share permission only when a 3rd party backup software like Veeam attempts to backup ver5 VMs that is running on a SMB3 share? This is not needed when the VM is ver9.
Post Reply

Who is online

Users browsing this forum: No registered users and 17 guests