-
- Expert
- Posts: 203
- Liked: 12 times
- Joined: Dec 04, 2012 2:18 pm
- Full Name: Both
- Contact:
Backup job stuck at removing snapshot 99% for 7hrs
I have a backup job running under Veeam 9.0.0.1715 that has been stuck at 99% removing the snapshot for 7hrs now. There was a pending replication job for the same VM that I stopped. Not sure what to do, leave it awhile longer or kill the job? I checked and there are no other snapshots visible for the VM.
What should I do?
What should I do?
-
- Expert
- Posts: 203
- Liked: 12 times
- Joined: Dec 04, 2012 2:18 pm
- Full Name: Both
- Contact:
Re: Backup job stuck at removing snapshot 99% for 7hrs
Patience won out in the end, it eventually removed the snapshot and all is well.
-
- Expert
- Posts: 163
- Liked: 33 times
- Joined: Dec 05, 2015 10:19 pm
- Full Name: Michael White
- Location: Calgary, Alberta Canada
- Contact:
Re: Backup job stuck at removing snapshot 99% for 7hrs
For what it is worth, VMware often improves the time of clearing or consolidating snapshots in each release. I hope that this issue never hits you again, but if it does, I hope you are on some newer version of VMware that minimizes the time.
Michael
Michael
Michael White
Field Product Manager
https://notesfrommwhite.net
@mwVme
Field Product Manager
https://notesfrommwhite.net
@mwVme
-
- Expert
- Posts: 135
- Liked: 20 times
- Joined: May 31, 2011 9:11 am
- Full Name: Steven Rodenburg
- Location: Switzerland
- Contact:
Re: Backup job stuck at removing snapshot 99% for 7hrs
Sorry mr. White but no way that VMware needs 7 hours or more to delete a backup-snapshot (assuming it's not a gigantic snapshot and the storage is not totally overloaded) .
I maybe looking at a similar situation right now at a customer's site. All the VM's in the job are completely finished, but there is one VM stuck at 99%, which finished (reading all disks) a while ago, that simply does not even enter the "removing snapshot" phase (it has a single vmdk). The API call to vCenter has not been sent yet (I checked). So no way that this is a problem on VMware's side.
This just happens every now and then. I got used to it. One just needs to wait long enough (many hours sometimes) and at one point, Veeam finally issues the API-call to vCenter to delete the snapshot (which goes fast). There is nothing else going on in Veeam. So I have no clue as to what on earth it's waiting for.
With all the other vm's, right after the last VMDK was read, it issues the delete snapshot command to vCenter and done. With this VM, I have no idea. The last line in it's log is that it read the VM and the line saying that it's deleting the snapshot still has not appeared after 3 hours.
I maybe looking at a similar situation right now at a customer's site. All the VM's in the job are completely finished, but there is one VM stuck at 99%, which finished (reading all disks) a while ago, that simply does not even enter the "removing snapshot" phase (it has a single vmdk). The API call to vCenter has not been sent yet (I checked). So no way that this is a problem on VMware's side.
This just happens every now and then. I got used to it. One just needs to wait long enough (many hours sometimes) and at one point, Veeam finally issues the API-call to vCenter to delete the snapshot (which goes fast). There is nothing else going on in Veeam. So I have no clue as to what on earth it's waiting for.
With all the other vm's, right after the last VMDK was read, it issues the delete snapshot command to vCenter and done. With this VM, I have no idea. The last line in it's log is that it read the VM and the line saying that it's deleting the snapshot still has not appeared after 3 hours.
-
- Expert
- Posts: 135
- Liked: 20 times
- Joined: May 31, 2011 9:11 am
- Full Name: Steven Rodenburg
- Location: Switzerland
- Contact:
Re: Backup job stuck at removing snapshot 99% for 7hrs
Update: it finally went to the Delete snapshot stage (which only took 9 seconds to do).
-
- Veeam Software
- Posts: 21139
- Liked: 2141 times
- Joined: Jul 11, 2011 10:22 am
- Full Name: Alexander Fogelson
- Contact:
Re: Backup job stuck at removing snapshot 99% for 7hrs
Steven, I strongly recommend asking support to review this behavior to find the cause, since it's not normal.
-
- Expert
- Posts: 135
- Liked: 20 times
- Joined: May 31, 2011 9:11 am
- Full Name: Steven Rodenburg
- Location: Switzerland
- Contact:
Re: Backup job stuck at removing snapshot 99% for 7hrs
Yeah I know but this is a financial institution and this environment is isolated. Nothing goes in or out. No Internet, no nutt'n. Pain In The Butt. It happens every once in a while and always at night. I see it the next morning when the daily backup job suddenly took waaaay longer than normal and one VM was waiting forever to get started with snapshot deletion. But it always solves itself.
I suspect it has to do with the backup-proxy having trouble letting go off the VMDK it attached to itself (hot-add). Because when I get impatient, and nuke the job, I always end up with a VMDK from the VM that got stuck, still attached to the proxy. If i then de-tach it myself, I can consolidate that "stuck" VM.
I gave it some more thought, and by now I strongly believe that when it happens, the proxy cannot let go for some freak reason. Only when it detaches the VMDK, the job suddenly moves on with the snapshot-delete command towards vCenter and all is well.
Maybe it's an idea to have Veeam not try to detach a VDMK until eternity, but just give up after say 5 or 10 minutes. If it gives up, at least the main job can go on, the stuck VM job-part gets an abort and a mail is sent saying so. If such detach-problems happen, it holds up the entire job. It holds up a slot on the proxy, and for what? for one stupid VM who is stubborn like a donkey? No just break off after 5 or 10 minutes, show an error but move on. Whole backup-windows go to h*ll because of one vmdk detach issue with some VM. The backup-administrator can manually detach that stuck vmdk from the proxy in a controlled manner. And start looking for the cause.
I suspect it has to do with the backup-proxy having trouble letting go off the VMDK it attached to itself (hot-add). Because when I get impatient, and nuke the job, I always end up with a VMDK from the VM that got stuck, still attached to the proxy. If i then de-tach it myself, I can consolidate that "stuck" VM.
I gave it some more thought, and by now I strongly believe that when it happens, the proxy cannot let go for some freak reason. Only when it detaches the VMDK, the job suddenly moves on with the snapshot-delete command towards vCenter and all is well.
Maybe it's an idea to have Veeam not try to detach a VDMK until eternity, but just give up after say 5 or 10 minutes. If it gives up, at least the main job can go on, the stuck VM job-part gets an abort and a mail is sent saying so. If such detach-problems happen, it holds up the entire job. It holds up a slot on the proxy, and for what? for one stupid VM who is stubborn like a donkey? No just break off after 5 or 10 minutes, show an error but move on. Whole backup-windows go to h*ll because of one vmdk detach issue with some VM. The backup-administrator can manually detach that stuck vmdk from the proxy in a controlled manner. And start looking for the cause.
-
- Enthusiast
- Posts: 97
- Liked: 6 times
- Joined: Feb 24, 2009 5:02 pm
- Contact:
Re: Backup job stuck at removing snapshot 99% for 7hrs
Steven
As an aside and not directly associated with the issue you report (yes, I've also encountered your frustrating scenario in the past), deleting a snapshot post backup can be a time-consuming affair particularly where there are multiple old snapshots in the chain.
A couple of years ago I found the best way to monitor the hanging 99% was to hop onto the ESX host where the VM resides, then change directories to get to the VM in question (e.g. cd /vmfs/volumes/xxx-xxx-xxx.../VM-name) and run something along the lines of:
watch -n 10 -d 'ls -luth | grep -E "delta|flat|sesparse"' (this is a cutdown Linux remember!)
to get a regular update of snapshot deletion progress. Like I said, particularly useful for multiple snaps which was the case here.
As an added bonus this also allowed me to calculate a snapshot deletion/merge completion time to within a surprisingly accurate degree.
May help someone in the same situation.
As an aside and not directly associated with the issue you report (yes, I've also encountered your frustrating scenario in the past), deleting a snapshot post backup can be a time-consuming affair particularly where there are multiple old snapshots in the chain.
A couple of years ago I found the best way to monitor the hanging 99% was to hop onto the ESX host where the VM resides, then change directories to get to the VM in question (e.g. cd /vmfs/volumes/xxx-xxx-xxx.../VM-name) and run something along the lines of:
watch -n 10 -d 'ls -luth | grep -E "delta|flat|sesparse"' (this is a cutdown Linux remember!)
to get a regular update of snapshot deletion progress. Like I said, particularly useful for multiple snaps which was the case here.
As an added bonus this also allowed me to calculate a snapshot deletion/merge completion time to within a surprisingly accurate degree.
May help someone in the same situation.
-
- Chief Product Officer
- Posts: 31814
- Liked: 7302 times
- Joined: Jan 01, 2006 1:01 am
- Location: Baar, Switzerland
- Contact:
Re: Backup job stuck at removing snapshot 99% for 7hrs
I would actually disagree here, as I've seen numerous reports in the past years where it was taking over 24 hours. No Veeam in the picture - just admins chatting on VMware Communities worrying about "stuck" snapshot removal task in vCenter.stevenrodenburg1 wrote:Sorry mr. White but no way that VMware needs 7 hours or more to delete a backup-snapshot (assuming it's not a gigantic snapshot and the storage is not totally overloaded)
It important to realize this is not always about the size of snapshots. It could also be due to overloaded primary storage and busy VM, which results in snapshot removal process unable to catch up with the new writes (there are certain snapshot size and storage performance thresholds before hypervisor will allow VM stun for that final aux snapshot commit, to ensure stun time remains acceptable).
Luckily, most of that stuff is a thing of the past now, after VMware had completely re-architected snapshot removal in the latest vSphere releases by reusing the code for Storage VMotion, which allowed new writes to go straight into the base VMDK instead of a snapshot file.
-
- Enthusiast
- Posts: 52
- Liked: 2 times
- Joined: Sep 20, 2010 4:39 am
- Full Name: David Reimers
- Contact:
Re: Backup job stuck at removing snapshot 99% for 7hrs
I've just started noticing the same behaviour. One large VM (2.5TB) is affected. Been fine up until now.
VMware says the snapshot has been removed (which was quick) but Veeam doesn't progress past 'removing snapshot'.
I cancelled the job (gracefully) then rebooted the server and re-ran the job.
Same problem occurs on reboot.
VMware says the snapshot has been removed (which was quick) but Veeam doesn't progress past 'removing snapshot'.
I cancelled the job (gracefully) then rebooted the server and re-ran the job.
Same problem occurs on reboot.
-
- VP, Product Management
- Posts: 7081
- Liked: 1511 times
- Joined: May 04, 2011 8:36 am
- Full Name: Andreas Neufert
- Location: Germany
- Contact:
Re: Backup job stuck at removing snapshot 99% for 7hrs
Do the process monitor of vCenter list the VM Snapshot removal at same level (99%)?
-
- Veeam Software
- Posts: 21139
- Liked: 2141 times
- Joined: Jul 11, 2011 10:22 am
- Full Name: Alexander Fogelson
- Contact:
Re: Backup job stuck at removing snapshot 99% for 7hrs
I recommend asking support to take a closer look at the logs, they should tell what's happening during this time.
Who is online
Users browsing this forum: Google [Bot] and 30 guests