Felix wrote:So you are saying this is a bug in vSphere not in Veeam?
Yes, our product does not create this snapshot - it is automatically created by ESX host during snapshot management process. Whether the job fails due to an error, or manually stopped, Veeam Backup goes through the same (and only) cleanup procedure that would remove snapshot we have created during backup ("VEEAM BACKUP TEMPORARY SNAPSHOT") - you can easily confirm this snapshot appearing first, and then disappearing.
Now, when the last snapshot is being removed from the VM, ESX host creates "consolidate helper" snapshot to host the data writes while actual snapshot is being removed. After that was done, the actual "consolidate helper" snapshot is being injected into the main VMDK by ESX. Because in order to commit the last helper snapshot VM I/O must be completely frozen (for obvious reasons), the commit can only take place if both of these conditions are true:
- Helper snapshot size is less than 16MB (which is minimal snapshot size in VMware)
- There is very little write I/O going on the VM at the given moment
If any of these are not true, ESX will wait, iteratively creating new helper snapshots to host writes while commiting old ones (remember, it needs to have smallest possible snapshot before final commit) while waiting for a "good moment" to freeze VM and commit the last snapshot. Now, this process may obviously take quite a long time, depending on initial "consolidate helper" snapshot size (in turn, mostly defined by the VEEAM snapshot size - if VEEAM snapshot is large, the "consolidate helper" will also grow large while VEEAM snapshot is being commited), as well as depending on datastore and VM I/O load.
And this actually explains why you observe the snapshot only when stopping the job manually. While you are at the console, stopping the job interactively and immediately going to investigate the Snapshot Manager - you will almost always see the consolidate helper snapshots present.
On the other hand, jobs almost never fail during the actual backup (there is simply no reasons for a job to fail in a midst of data copying, unless network goes down or something), so in most cases jobs fail before our snapshot is even created (or due to being unable to create that snapshot). Thus, there are no snapshots to commit in the first place, and so "consolidate helper" snapshots would simply never appear.
Now, there were also quite a few bugs around snapshot commit functionality in VMware, if you search VMware Communities for "consolidate helper", you will see about 100 threads about this problem of "consolidate helper" snapshots left behind. Most of those issues were bugs from older version of ESX hosts, and they are all fixed now. There are some scenarios when this could still happen under vSphere, for instance lack of free disk space on the datastore, and may be some due to some other new bugs - although I am not aware of such.
Assuming that you are not facing some new snapshot management bug or issue, all you have to do is simply wait until the "consolidate helper" removes itself. Remember that on VMs with heavy I/O, or when the actual datastore is loaded, this process may take quite a long time, up to an hour or more (although more typically, under 20 minutes even for heavily loaded Exchange servers). Also, keep in mind that while the actual vCenter task for snapshot removal times out in 10-15 minutes (can't remember the default setting), snapshot removal will still be processed by ESX in the background, and eventually the "consolidate helper" snapshot will be gone.
Phew, long post, hope this helps
