Host-based backup of Microsoft Hyper-V VMs.
Post Reply
Eugen Fournes
Novice
Posts: 4
Liked: 2 times
Joined: Sep 06, 2019 4:10 pm
Full Name: Eugen Fournes
Contact:

Solved: Server 2019 Hyper-V new checkpoint stuck issue

Post by Eugen Fournes » 1 person likes this post

Keywords: Server 2019 Hyper-V checkpoint stuck broken Veeam backup AVHDX

Hopefully that's enough for our friend Google to find this.

WARNING: Some of these instructions are potentially harmful and can cause data loss, even if they are followed correctly. Use at your own risk. If the VM is running and it has critical data that has not been backed up, use another method (e.g. Windows Server Backup, manual copy, etc. from within the VM) to back it up somewhere else before following these instructions. You've been warned.

Problem: Backup runs normally for one run, then fails because the checkpoints can't be created after that. The first backup left behind checkpoints that are halfway merged.

Cause: The default permissions that Hyper-V assigns to the checkpoint files aren't sufficient for the checkpoint service to delete the old files, so it crashes after completing the merge on the first VHDX file. (Root cause: when setting up the folder structure for the VM storage in a non-default location, Hyper-V isn't smart enough to adjust permissions on the folders.)

Good solution: Set the Hyper-V service GUID "account" to have FULL ACCESS rights to the folders where the VHDX files are stored. (It should have full access to the entire VM.) In PowerShell started with Run As Administrator:

First, get the GUID of the VMs on your Hyper-V server. (You can also get these from the permissions on the "stuck" AVHDX files.)

Code: Select all

  get-vm | fl name, id

Output Example:

Code: Select all

    Name : MyVM
    Id   : d3599536-222a-4d6e-bb10-a6019c3f2b9b
 
    Name : TheirVM
    Id   : a0af7903-94b4-4a2c-b3b3-16050d5f80f

Second, grant full access to that GUID to the VM's hierarchy:

Code: Select all

  icacls <Folder with VHDS> /grant "NT VIRTUAL MACHINE\<VM GUID>":(OI)F

Example:

Code: Select all

    icacls "E:\Hyper-V\Virtual Machines\MyVM" /grant "NT VIRTUAL MACHINE\d3599536-222a-4d6e-bb10-a6019c3f2b9b":(OI)F
    icacls "Q:\Hyper-V\Virtual Machines\TheirVM" /grant "NT VIRTUAL MACHINE\a0af7903-94b4-4a2c-b3b3-16050d5f80f2":(OI)F
Bad solution: If you're in a hurry, like I was, just set the VM folders' permissions to grant FULL ACCESS to "Everyone". Then set the permissions as above after everything's fixed up.


To fix the broken checkpoint merges:

First, set the permissions as detailed above. Then shut down the virtual machine. If you're lucky, the merge should begin after a moment and complete successfully. It might go by quickly if the AVHDX's aren't very large. Check the folder to see if the checkpoint files are gone. If they are, the VM is good to go.

If the merge doesn't happen automatically, or seems to get stuck partway through, start up the VM again. If the merge still doesn't complete on its own, you'll have to merge it manually.

Manual merge: *** USE WITH CAUTION ***

Shut down the VM once more.

Open PowerShell with Run As Administrator.

Run the command:

Code: Select all

  Merge-VHD -Path <checkpoint file path> -Destination <parent file path>

CAUTION: IF THERE IS MORE THAN ONE LEVEL OF CHECKPOINT, YOU NEED TO MERGE THE CHECKPOINT FILES IN THE CORRECT ORDER OR YOU *WILL* LOSE DATA!

Find out which checkpoints are the Parent of the others by using the "Inspect" option on the virtual disk in the VM's settings in Hyper-V. You'll need to start with the last checkpoint file--that's likely the one that's showing as the active virtual disk.

Example (merge one checkpoint into another and then into the main VM file):

Code: Select all

  Merge-VHD -Path "E:\Hyper-V\Virtual Machines\MyVM\Virtual Hard Disks\MyVM-Disk1_1508F969-7684-48DC-9B9C-2DC6C5A3CEB7.avhdx" -Destination "E:\Hyper-V\Virtual Machines\MyVM\Virtual Hard Disks\MyVM-Disk1_295A473C-1134-37F7-3783-B3A7820D478F.avhdx"

  Merge-VHD -Path "E:\Hyper-V\Virtual Machines\MyVM\Virtual Hard Disks\MyVM-Disk1_295A473C-1134-37F7-3783-B3A7820D478F.avhdx" -Destination "E:\Hyper-V\Virtual Machines\MyVM\Virtual Hard Disks\MyVM-Disk1.vhdx"  

If this gives you a "file in use" error, check Hyper-V again, it might have started a merge on its own--let it run.
If this gives you a permissions error, check that the files have the correct permissions, as detailed above, plus the Administrator rights.
If this gives you some other error, sorry, that's outside the scope of this document.

After they're merged, go back into Hyper-V, remove the entire virtual disk (if you try to replace the existing one, it still thinks there's a checkpoint pending). Click "Continue" on the warning about checkpoints. Re-add the base disk to the VM. Repeat for any additional virtual disks on that VM.

After starting up the VM again, you may have to put the second (third, fourth, etc.) disks online manually, within the VM, using Administration Tools -> Computer Management -> Disk Management. Everything should be back to normal after that.


References:

https://support.microsoft.com/en-ca/hel ... ack-up-vms
https://docs.microsoft.com/en-us/powers ... w=win10-ps


Note: I didn't bother opening a Veeam support case because this issue wasn't specific to Veeam, but with Hyper-V, though it affects Veeam and other backup software that uses Hyper-V checkpoints, so figured I'd post the solution here.
HannesK
Product Manager
Posts: 14316
Liked: 2890 times
Joined: Sep 01, 2014 11:46 am
Full Name: Hannes Kasparick
Location: Austria
Contact:

Re: Solved: Server 2019 Hyper-V new checkpoint stuck issue

Post by HannesK »

Hello,
and welcome to the forum. Thanks for sharing with the community.

Best regards,
Hannes
Eugen Fournes
Novice
Posts: 4
Liked: 2 times
Joined: Sep 06, 2019 4:10 pm
Full Name: Eugen Fournes
Contact:

Re: NOT Solved: Server 2019 Hyper-V new checkpoint stuck issue

Post by Eugen Fournes »

<sigh> I'm afraid I jumped the gun. After few days, the problem came back.

All permissions are as recommended, and we're still getting the orphaned checkpoint files.

Time to open a ticket with Veeam and probably Microsoft... I'll keep everyone informed how it goes.
Korbman
Lurker
Posts: 1
Liked: 1 time
Joined: Oct 16, 2013 8:40 pm
Contact:

Re: Solved: Server 2019 Hyper-V new checkpoint stuck issue

Post by Korbman » 1 person likes this post

I had this exact issue, and similarly granting either each VM or 'Everyone' rights over the Hyper-V folders didn't work. For a while, our solution was to shut down each VM and then restart the 'Hyper-V Virtual Machine Management' (VMMS) service, which would merge the checkpoints. This would last a day or two before the issue returned.

Digging to the logs, we ended up finding that Veeam noted "NT Virtual Machine\Virtual Machines group on <hyper-v server> does not have the Log on as a Service right. This may prevent backup checkpoints from merging. Please refer to Microsoft KB article 2779204."

As it turns out, this *was* the case for us - there was a modification to the default domain policy that changed the 'Log on as a Service' rights for all servers. Reverting this change, and forcing a 'gpupdate' on the Hyper-V host, immediately (and automatically) granted "NT Virtual Machine\Virtual Machines" log on as a service rights. We've now gone a several days without any issues in Veeam or with the backups or checkpoints, so it looks like this may have been our root cause.
Eugen Fournes
Novice
Posts: 4
Liked: 2 times
Joined: Sep 06, 2019 4:10 pm
Full Name: Eugen Fournes
Contact:

Re: Solved: Server 2019 Hyper-V new checkpoint stuck issue

Post by Eugen Fournes » 1 person likes this post

@Korbman: Thank you very much for sharing this!

It just goes to show that Windows security and permissions has gotten so complex that even Microsoft can't keep track of it anymore.

I found similar group policy entries, both in the default domain policy and another one that propagated from the top level. Removed both of those and did a "gpupdate /force" and not only did the local version of the policy on the Hyper-V server revert back to the default (which includes the "Virtual Machines" group), but it also made that entry in the local policy editable--while it's present in the group policy, that particular policy can't be modified locally.

Anyway, I expect that this will solve the problem here, too, but it'll be a few days before we'll know for certain.

-----

To continue the saga from a couple of weeks ago: I contacted Microsoft support and they pretty much blew me off. They said that if I can manually create and delete a checkpoint, then there's nothing wrong with the Hyper-V system because Veeam and the other backup software use exactly the same function calls as the Hyper-V Management interface.

A couple of days later, I did manage to get a manual checkpoint create/delete to fail "on demand", but it required leaving the system running without restarting the Hyper-V service for over 24 hours. That led to discovering a work-around for the problem.

It seems that whatever causes the permissions issue (likely the group policy), it gets corrected temporarily whenever the Hyper-V service is restarted. So, I created a batch file that stops/starts the Hyper-V service and added it as a pre-execution script to each of the backup jobs. After that, all backup jobs have been running smoothly for a week, and all checkpoints merged properly.

Batch file contents:

net stop VMMS
net start VMMS

To add the pre-execution script: Open the existing backup job, go to the Storage options, click Advanced, click the Scripts tab, checkmark "Run the following script before the job" and browse for the batch file. Click OK, etc. until the modified job is saved. Repeat for all jobs on that server.

Warning: This isn't a good solution. If a backup job starts and restarts the Hyper-V service while something else--like another backup job--is running that depends on the service, it will mess up that other thing. But it was a viable workaround in my case. Hopefully the above policy changes will fix the problem properly without breaking anything else.
tomaskalabis
Novice
Posts: 5
Liked: never
Joined: Oct 19, 2016 6:36 pm
Full Name: Tomas Kalabis
Contact:

Re: Solved: Server 2019 Hyper-V new checkpoint stuck issue

Post by tomaskalabis »

Hi, i have similar problems with one of my customer.
VBR 11 and WS2019 Hyperv with latest updates...

https://tomaskalabis.com/wordpress/micr ... hdx-files/
HannesK
Product Manager
Posts: 14316
Liked: 2890 times
Joined: Sep 01, 2014 11:46 am
Full Name: Hannes Kasparick
Location: Austria
Contact:

Re: Solved: Server 2019 Hyper-V new checkpoint stuck issue

Post by HannesK »

what did support say about the issue @tomaskalabis ? Can you maybe please provide a support case number? Doing restore with the agent sounds like a "creative" workaround ;-)
Monty
Lurker
Posts: 1
Liked: never
Joined: Feb 23, 2022 10:35 pm
Full Name: monty thompson
Contact:

Re: Solved: Server 2019 Hyper-V new checkpoint stuck issue

Post by Monty »

hi all
thanks for this thread
it was extremely helpful

I also had a client wth two hosts that the Hyper V clients/machines weren't being backed up


Installing gpmc on member server from features
Opening affecting policy on the server in question
Add NT Virtual Machine\Virtual Machines to allow run as service.
Gpupdate /force on servers in question

All running perfectly
Post Reply

Who is online

Users browsing this forum: No registered users and 23 guests