We are running into a slight issue during the final stages of a VM backup. The VM looses connectivity for a short period of time, about 5-7 "request timed out" and the next reply is about 500-550ms. Our monitoring tool picks up this "outage" and sends out an email alert. So, in the middle of the night the sucker on-call (ME) get's a call only to find the server responding normally and application/services running normally.
Easiest solution is to up the notification thresholds...but was wondering if anyone out there has come accross this and/or has a fix?
Thanks in advance,
Jason
-
- Lurker
- Posts: 1
- Liked: never
- Joined: Dec 03, 2009 11:32 pm
- Full Name: Jason Gagui
- Contact:
-
- Chief Product Officer
- Posts: 31816
- Liked: 7302 times
- Joined: Jan 01, 2006 1:01 am
- Location: Baar, Switzerland
- Contact:
Re: Veeam Commiting Snapshots
Veeam does not actually commit snapshots (or do any snapshot processing whatsoever). Instead, we issue a single API call asking ESX to commit snapshot - the rest is beyond our control. This is the same API call that vSphere Client uses when you remove snapshot through it.
I recommend that you investigate VM log file. During the final snapshot commit, ESX stuns the VM completely in order to commit the data into VMDK, and logs the stun time in the VM log file. You may want to check these numbers, and if they seem to large, check with VMware if this is "normal". I have never seen stun lasting longer than 2 sec in my lab when I did ESX4 snapshots stress testing (such as committing snapshots while copying lots of data to VM at the same time).
One thing that may really make everything crawl is concurrent snapshot operations on the same LUN (which causes SCSI reservation conflicts).
I recommend that you investigate VM log file. During the final snapshot commit, ESX stuns the VM completely in order to commit the data into VMDK, and logs the stun time in the VM log file. You may want to check these numbers, and if they seem to large, check with VMware if this is "normal". I have never seen stun lasting longer than 2 sec in my lab when I did ESX4 snapshots stress testing (such as committing snapshots while copying lots of data to VM at the same time).
One thing that may really make everything crawl is concurrent snapshot operations on the same LUN (which causes SCSI reservation conflicts).
Who is online
Users browsing this forum: Bing [Bot], Google [Bot] and 99 guests