TL;DR - Questions are at the bottom, but largely, if a VM is rebooted during a backup (where the VM is in a snapshotted state), can this cause the snapshot to be orphaned? I would think not, but I'm seeing evidence to the contrary.
Long version: I know some of this isn't Veeam specific, but rather ESXi. This incident happened during a Veeam process, so I figured that maybe I would not have been the only one to see this happen. I'm not pointing fingers at Veeam or ESXi - I know that there was a lot of human error here. Here's the story.
We had a client make a snapshot on their production Exchange server a month ago and they forgot about it. Backups have been running fine since then, three times a day, every day, so no alarms were raised (and they were not using VeeamONE or we would have known about the snapshot). Today, they were doing maintenance on the VM. They rebooted the virtual machine during a Veeam backup - therefore the VM had a second active snapshot during the reboot.
The VM booted up, but shortly after it stalled completely - it was unmanageable by ESXi, had no network connectivity, etc. We logged into vCenter, and vCenter said that a snapshot consolidation had been triggered, and that the VM needed consolidation. Wondering if Veeam had triggered a snapshot, I went to open the console locally on our backup server (we are using 9.5U4), and the console screen sat there at the loading screen. And sat there. For about 30 minutes. Usually, the console opens within 30 seconds. I checked the backup repository and noticed that the files were being modified by Veeam at that point in time.
After 30 minutes, the console finally opened, and as we suspected, our backup job was running, but it was now on the very last step, merging backup files. Upon inspecting the job statistics we had these two lines:
2/20/2019 3:14:54 PM :: Removing VM snapshot (this was there for 33 minutes and 33 seconds)
2/20/2019 3:48:28 PM :: SERVER NAME has stuck VM snapshot, will attempt to consolidate periodically
Suddenly, as soon as our backup console opened, our Exchange server became responsive again - network connectivity, mail started flowing, etc. However, vCenter events now showed that the snapshot consolidation had failed and that our VM console was still unavailable (the console for the VM had that corrupted symbol over it). We got in touch with VMWare support and they could tell via SSH that the merge of the gigantic month-old snapshot was actually happening in the background, despite the GUI message that it had failed.
To recap, here are the facts:
We had a VM with a large snapshot that had been backing up fine for a month.
That VM was in the middle of a backup (therefore a second snapshot) and the guest was rebooted.
The VM AND Veeam console became unresponsive at the same time.
The VM and Veeam console became responsive again at the same time.
The job stats show that a consolidation was attempted for about the same amount of time that our VM was stalled.
Here's what I'm assuming happened:
The VM rebooted during the backup job, causing the snapshot to orphan itself somehow.
At the end of the job, when Veeam's "Snapshot Hunter" runs, it found the recently orphaned snapshot (Veeam had never triggered on the old snapshot over the course of 90 previous runs, which makes sense since it was not orphaned).
Veeam tells ESXi to perform a "Delete all snapshots" on the VM.
Due to some unknown issue with the very large first snapshot, this freezes the VM, causing it to hang.
After 30 or so minutes, the attempt times out, unfreezing the guest and giving us the snapshot consolidation failure in the GUI.
Veeam's Snapshot Hunter then tries to consolidate again (maybe using the hard consolidation method? I know it tries three times, with different methods each time).
One of the other consolidation attempts was successful, which lead to VMWare support seeing the (ongoing) successful consolidation attempt via SSH.
There were failures on many parts here:
1. Human error - snapshots were forgotten about, and they rebooted the VM during a backup. Yes, I know VeeamONE would have helped here

2. Veeam error - it appeared the backup console would not load during the snapshot merge operation. As soon as it was done, the console opened right up. This caused additional panic because we thought we may need to restore from backups, and we couldn't access them.
3. ESXi error - A snapshot merge operation caused a VM to totally stall and lose management for over half an hour.
Here are my questions:
1. What are the implications of a guest OS rebooting during a backup?
A) I *feel* like it shouldn't matter (except that Application-Aware processing will fail). Plus, why would a guest operation matter to the hypervisor? Is there some sort of tie in that I'm not aware of?
B) This experience, and past experiences where rebooting a guest VM during a backup has coincidentally led to CBT corruptions is making me wonder what I'm not understanding about rebooting guest VMs during a backup.
2. It seems like the VM stalling was also tied into Veeam console stalling (me unable to get past the loading screen when trying to launch the console locally on our Veeam server). How can this happen?
3. Has anyone seen this behavior before?
I know this is a lot of information. I'd rather give too much than too little. Can anyone provide insight here? I'm planning on opening a case tomorrow, but wanted to see what the great minds in the forums had to say. I am sure that one or more of my "assumptions" must be wrong. Please correct me so that I can fully understand what is happening.
Thanks in advance,
Cory