Backup job stuck at removing snapshot 99% for 7hrs

Discussions specific to VMware vSphere hypervisor

Backup job stuck at removing snapshot 99% for 7hrs

Veeam Logoby ashman70 » Sun Aug 13, 2017 12:23 am

I have a backup job running under Veeam 9.0.0.1715 that has been stuck at 99% removing the snapshot for 7hrs now. There was a pending replication job for the same VM that I stopped. Not sure what to do, leave it awhile longer or kill the job? I checked and there are no other snapshots visible for the VM.
What should I do?
ashman70
Expert
 
Posts: 196
Liked: 12 times
Joined: Tue Dec 04, 2012 2:18 pm
Full Name: Both

Re: Backup job stuck at removing snapshot 99% for 7hrs

Veeam Logoby ashman70 » Sun Aug 13, 2017 3:36 am 1 person likes this post

Patience won out in the end, it eventually removed the snapshot and all is well.
ashman70
Expert
 
Posts: 196
Liked: 12 times
Joined: Tue Dec 04, 2012 2:18 pm
Full Name: Both

Re: Backup job stuck at removing snapshot 99% for 7hrs

Veeam Logoby mwvme » Wed Aug 16, 2017 8:49 pm 1 person likes this post

For what it is worth, VMware often improves the time of clearing or consolidating snapshots in each release. I hope that this issue never hits you again, but if it does, I hope you are on some newer version of VMware that minimizes the time.

Michael
Michael White
Field Product Manager
https://notesfrommwhite.net
@mwVme
mwvme
Veeam Software
 
Posts: 102
Liked: 19 times
Joined: Sat Dec 05, 2015 10:19 pm
Location: Calgary, Alberta Canada
Full Name: Michael White

Re: Backup job stuck at removing snapshot 99% for 7hrs

Veeam Logoby stevenrodenburg1 » Thu Dec 21, 2017 3:47 am

Sorry mr. White but no way that VMware needs 7 hours or more to delete a backup-snapshot (assuming it's not a gigantic snapshot and the storage is not totally overloaded) .

I maybe looking at a similar situation right now at a customer's site. All the VM's in the job are completely finished, but there is one VM stuck at 99%, which finished (reading all disks) a while ago, that simply does not even enter the "removing snapshot" phase (it has a single vmdk). The API call to vCenter has not been sent yet (I checked). So no way that this is a problem on VMware's side.

This just happens every now and then. I got used to it. One just needs to wait long enough (many hours sometimes) and at one point, Veeam finally issues the API-call to vCenter to delete the snapshot (which goes fast). There is nothing else going on in Veeam. So I have no clue as to what on earth it's waiting for.
With all the other vm's, right after the last VMDK was read, it issues the delete snapshot command to vCenter and done. With this VM, I have no idea. The last line in it's log is that it read the VM and the line saying that it's deleting the snapshot still has not appeared after 3 hours.
stevenrodenburg1
Expert
 
Posts: 123
Liked: 18 times
Joined: Tue May 31, 2011 9:11 am
Location: Switzerland
Full Name: Steven Rodenburg

Re: Backup job stuck at removing snapshot 99% for 7hrs

Veeam Logoby stevenrodenburg1 » Thu Dec 21, 2017 4:19 am

Update: it finally went to the Delete snapshot stage (which only took 9 seconds to do).
stevenrodenburg1
Expert
 
Posts: 123
Liked: 18 times
Joined: Tue May 31, 2011 9:11 am
Location: Switzerland
Full Name: Steven Rodenburg

Re: Backup job stuck at removing snapshot 99% for 7hrs

Veeam Logoby foggy » Thu Dec 21, 2017 4:34 pm

Steven, I strongly recommend asking support to review this behavior to find the cause, since it's not normal.
foggy
Veeam Software
 
Posts: 15758
Liked: 1183 times
Joined: Mon Jul 11, 2011 10:22 am
Full Name: Alexander Fogelson

Re: Backup job stuck at removing snapshot 99% for 7hrs

Veeam Logoby stevenrodenburg1 » Thu Dec 21, 2017 4:50 pm

Yeah I know but this is a financial institution and this environment is isolated. Nothing goes in or out. No Internet, no nutt'n. Pain In The Butt. It happens every once in a while and always at night. I see it the next morning when the daily backup job suddenly took waaaay longer than normal and one VM was waiting forever to get started with snapshot deletion. But it always solves itself.
I suspect it has to do with the backup-proxy having trouble letting go off the VMDK it attached to itself (hot-add). Because when I get impatient, and nuke the job, I always end up with a VMDK from the VM that got stuck, still attached to the proxy. If i then de-tach it myself, I can consolidate that "stuck" VM.
I gave it some more thought, and by now I strongly believe that when it happens, the proxy cannot let go for some freak reason. Only when it detaches the VMDK, the job suddenly moves on with the snapshot-delete command towards vCenter and all is well.

Maybe it's an idea to have Veeam not try to detach a VDMK until eternity, but just give up after say 5 or 10 minutes. If it gives up, at least the main job can go on, the stuck VM job-part gets an abort and a mail is sent saying so. If such detach-problems happen, it holds up the entire job. It holds up a slot on the proxy, and for what? for one stupid VM who is stubborn like a donkey? No just break off after 5 or 10 minutes, show an error but move on. Whole backup-windows go to h*ll because of one vmdk detach issue with some VM. The backup-administrator can manually detach that stuck vmdk from the proxy in a controlled manner. And start looking for the cause.
stevenrodenburg1
Expert
 
Posts: 123
Liked: 18 times
Joined: Tue May 31, 2011 9:11 am
Location: Switzerland
Full Name: Steven Rodenburg

Re: Backup job stuck at removing snapshot 99% for 7hrs

Veeam Logoby cby » Fri Dec 22, 2017 10:24 am

Steven

As an aside and not directly associated with the issue you report (yes, I've also encountered your frustrating scenario in the past), deleting a snapshot post backup can be a time-consuming affair particularly where there are multiple old snapshots in the chain.

A couple of years ago I found the best way to monitor the hanging 99% was to hop onto the ESX host where the VM resides, then change directories to get to the VM in question (e.g. cd /vmfs/volumes/xxx-xxx-xxx.../VM-name) and run something along the lines of:

watch -n 10 -d 'ls -luth | grep -E "delta|flat|sesparse"' (this is a cutdown Linux remember!)

to get a regular update of snapshot deletion progress. Like I said, particularly useful for multiple snaps which was the case here.

As an added bonus this also allowed me to calculate a snapshot deletion/merge completion time to within a surprisingly accurate degree.

May help someone in the same situation.
cby
Expert
 
Posts: 108
Liked: 6 times
Joined: Tue Feb 24, 2009 5:02 pm

Re: Backup job stuck at removing snapshot 99% for 7hrs

Veeam Logoby Gostev » Wed Dec 27, 2017 3:20 pm

stevenrodenburg1 wrote:Sorry mr. White but no way that VMware needs 7 hours or more to delete a backup-snapshot (assuming it's not a gigantic snapshot and the storage is not totally overloaded)

I would actually disagree here, as I've seen numerous reports in the past years where it was taking over 24 hours. No Veeam in the picture - just admins chatting on VMware Communities worrying about "stuck" snapshot removal task in vCenter.

It important to realize this is not always about the size of snapshots. It could also be due to overloaded primary storage and busy VM, which results in snapshot removal process unable to catch up with the new writes (there are certain snapshot size and storage performance thresholds before hypervisor will allow VM stun for that final aux snapshot commit, to ensure stun time remains acceptable).

Luckily, most of that stuff is a thing of the past now, after VMware had completely re-architected snapshot removal in the latest vSphere releases by reusing the code for Storage VMotion, which allowed new writes to go straight into the base VMDK instead of a snapshot file.
Gostev
Veeam Software
 
Posts: 21820
Liked: 2490 times
Joined: Sun Jan 01, 2006 1:01 am
Location: Baar, Switzerland


Return to VMware vSphere



Who is online

Users browsing this forum: Google [Bot] and 1 guest