Discussions specific to the VMware vSphere hypervisor
foggy
Veeam Software
Posts: 16949
Liked: 1380 times
Joined: Jul 11, 2011 10:22 am
Full Name: Alexander Fogelson
Contact:

Re: Snapshot removal issues of a large VM

Post by foggy » Feb 04, 2016 3:22 pm

You can test whether more frequent backups will allow to avoid this. You can just test manual snapshot creation using vSphere Client, even without running Veeam B&R backup job, it should produce similar behavior.

JosueM
Expert
Posts: 162
Liked: 10 times
Joined: Sep 01, 2012 2:53 pm
Full Name: Josue Maldonado
Contact:

[MERGED] VM lost connection while removing VM snapshot.

Post by JosueM » Apr 11, 2016 9:57 pm

Good day everyone.

We have a SQL application server that was moved to SSD drives to improve performance and overall the new storage runs great. The problem is if we run a backup in working hours the app server stop responding for a few seconds and users gets kicked off . This happen must of the time when the job does remove the temporary snapshot, sometimes happens when it creates the snapshots but is rare.

We toguht that moving the server to SSD drives would solve the issue but still happens, is there any other way to backup the information during the work hours without quitting the users?

Thanks in advance.

PTide
Veeam Software
Posts: 4423
Liked: 364 times
Joined: May 19, 2015 1:46 pm
Contact:

Re: VM lost connection while removing VM snapshot.

Post by PTide » Apr 12, 2016 8:30 am

Hi,

Such thing may happen when you backup a VM that has a highly transactional on it. Have you considered using Transaction Log backup during the day instead of doing a VM backup? During that procedure no snapshots are taken thus the connection should be fine.

Thank you.

JosueM
Expert
Posts: 162
Liked: 10 times
Joined: Sep 01, 2012 2:53 pm
Full Name: Josue Maldonado
Contact:

Re: Snapshot removal issues of a large VM

Post by JosueM » Apr 12, 2016 3:22 pm

hello PTide,

Transaction Log seems a good option to backup, the major issue I see is restoring this VM would take abour 6 from plain backup, will have to measure the time adding the tlog restore. Since this in the main app server we would like to have it up and running as soon as possible.

Do you know if the tlog backup works somehow with replica?

foggy
Veeam Software
Posts: 16949
Liked: 1380 times
Joined: Jul 11, 2011 10:22 am
Full Name: Alexander Fogelson
Contact:

Re: Snapshot removal issues of a large VM

Post by foggy » Apr 12, 2016 3:49 pm

Transaction logs backup is not available in replication jobs.

JosueM
Expert
Posts: 162
Liked: 10 times
Joined: Sep 01, 2012 2:53 pm
Full Name: Josue Maldonado
Contact:

Re: Snapshot removal issues of a large VM

Post by JosueM » Jun 17, 2016 2:16 pm

So basically, for most of the app servers (SQL) (thats about 80% of the workload), we will have figure out another way to backup transactionals VMs instead of using veeam like SQL backup and restore?

It seems this is apretty common problem and it has been here for a long time , I wonder how difficult could be for vmware to solve the issue.

randerson999
Influencer
Posts: 14
Liked: never
Joined: Jun 30, 2016 11:49 am
Full Name: Ross Anderson
Contact:

Re: Snapshot removal issues of a large VM

Post by randerson999 » Jun 30, 2016 12:06 pm

I'm late to this party, but same issues here - doesn't matter if it's a large VM or small, during the snapshot creation AND removal, we lose a few packets. Nothing like what a lot of other people are experiencing here (ie. minutes to hours of being offline), but we do lose a few packets here and there due to the VM being "stunned". This happens with our largest and smallest VMs - we lose a few packets, which causes any remote connections to the VM in question to fail (such as SQL connections for SAP processes running from remote App servers), resulting in program dumps. As there is not technically a time-out (rather, a broken network connection), SAP doesn't have a suitable workaround for the problem.

My question is, what is causing the stun on the VM? Is it lack of IO based on VM size? If that was the case, why does even the smallest VM have the same issues (tested during a slow time when all systems are basically idle)? Nothing on our storage or vm infrastructure side indicates a lack of IOPs available, so what is actually causing the VM to be stunned?

I've read through this thread (and many, many others) but I don't know that I've seen an actual ROOT cause for the majority of the issues, other than the not-enough-IO-available conclusion.

BTW - we're on the latest version of Veeam and have vSphere 5.5 U2.

Vitaliy S.
Veeam Software
Posts: 21627
Liked: 1296 times
Joined: Mar 30, 2009 9:13 am
Full Name: Vitaliy Safarov
Contact:

Re: Snapshot removal issues of a large VM

Post by Vitaliy S. » Jun 30, 2016 3:54 pm

Hi Ross,

The actual root cause is the way how VM snapshots are committed in vSphere. Here is a good blog post from Luca for further reading > http://www.virtualtothecore.com/en/vsph ... hing-past/

Thanks!

sajid
Lurker
Posts: 1
Liked: never
Joined: Nov 03, 2016 5:55 am
Full Name: Sajid Attar
Contact:

[MERGED] snapshot removal takes long time

Post by sajid » Nov 03, 2016 6:00 am

Hi Team,

We have Veeam 9.0 version in production and we are facing issue with related to snapshot removal issue it taking more than 8 hours and each replication job complete approx 14 hours to complete

Current infra is Vmware 6.0 U2.

Please suggest on this.

v.Eremin
Veeam Software
Posts: 15227
Liked: 1146 times
Joined: Oct 26, 2012 3:28 pm
Full Name: Vladimir Eremin
Contact:

Re: Snapshot removal issues of a large VM

Post by v.Eremin » Nov 03, 2016 10:13 am

Your post has been merged into existing discussion. Kindly, check answers provided above. Thanks.

lando_uk
Expert
Posts: 282
Liked: 21 times
Joined: Oct 17, 2013 10:02 am
Full Name: Mark
Location: UK
Contact:

Re: [MERGED] snapshot removal takes long time

Post by lando_uk » Nov 08, 2016 5:26 pm

sajid wrote:We have Veeam 9.0 version in production and we are facing issue with related to snapshot removal issue it taking more than 8 hours and each replication job complete approx 14 hours to complete

Current infra is Vmware 6.0 U2.
Not really enough information.
How big is the VM that takes 8hrs to consolidate?
What is the VM doing (change rate) during the backup/replication task?
What's the network speed between source and destination?
Is it much quicker when you replicate from your primary backup rather than the live VM?
What does your SAN/Datastore monitoring tools tell you during this process?
How many other VMs are on the datastore and are they also busy?

KeiichiKun
Enthusiast
Posts: 56
Liked: 9 times
Joined: Jul 21, 2016 3:59 pm
Contact:

[MERGED] Replication job and removing snapshot too long

Post by KeiichiKun » Jan 20, 2017 4:33 pm

Hi,
I'm trying to move a vm from a cluster to another cluster using a replication job to minimize downtime.
After first backup, the subsequent jobs (run manually due to the problem I'm writing about) take about 10 minutes to synchronize but removing snapshot is too long..
My VM has 3 disk for 600Gb total and removing snapshot after 3 hours is at 54%, it will take 7 to 8 hours to complete; every snapshot is about 80 Gb (3 restore points keep).
The VM is on SSD/10k rpm disk on dell compellent with auto tiering.
Do you think this is a normal? I don't think so, I'm opening a support to Dell if you can confirm that.
Thanks!

v.Eremin
Veeam Software
Posts: 15227
Liked: 1146 times
Joined: Oct 26, 2012 3:28 pm
Full Name: Vladimir Eremin
Contact:

Re: Snapshot removal issues of a large VM

Post by v.Eremin » Jan 23, 2017 10:52 am

Even though your VM is not large, the symptoms experienced are quite similar to those described in this thread. So, kindly, familiarize yourself with answers above.

As a first investigation step, you can try to reproduce the issue without VB&R being present in equation - take a snapshot manually, keep it long enough (the time similar to the one replication job takes), delete it and the whether problem re-appears.

Thanks.

shlomia
Influencer
Posts: 13
Liked: never
Joined: Mar 20, 2017 3:40 pm
Full Name: Shlomi
Contact:

[MERGED] VM unresponsive during removing /consolidating snap

Post by shlomia » Apr 29, 2017 11:04 am

Hi,
So only one of my VM, which is a DB server, seems to be unresponsive while vsphere is finishing the backup and trying to remove the snapshot.
He become unresponsive for few minutes, and we cannot connect to him.
also the monitor warning us that he is down.

I'm running ESXI 5.5 and I came up with this patch :
https://kb.vmware.com/selfservice/micro ... Id=2096282

I checked my ESXI's and they do not have this update installed.
I'm just scared to manually update, because it says that impact will be a reboot.
Although I have vsphere HA, do I need to scare to do any reboot to the esxi's?
Also, anyone installed this patch and fixed the problem?

thank you

Vitaliy S.
Veeam Software
Posts: 21627
Liked: 1296 times
Joined: Mar 30, 2009 9:13 am
Full Name: Vitaliy Safarov
Contact:

Re: Snapshot removal issues of a large VM

Post by Vitaliy S. » May 01, 2017 9:14 am

Hi Shlomi,

I cannot comment whether this patch can resolve your issues or not, but you may want to review the last pages of this topic for some tips on how to resolve this behavior.

Do you have VM HA or Cluster HA feature enabled? You can try install this patch via VUM again and see if manual installation is required or not.

Thanks!

Post Reply

Who is online

Users browsing this forum: Google [Bot], jcapone, JLundgren, seb002 and 28 guests