Discussions specific to the VMware vSphere hypervisor
Post Reply
unsichtbarre
Expert
Posts: 130
Liked: 24 times
Joined: Mar 08, 2010 4:05 pm
Full Name: John Borhek
Contact:

Re: Snapshot removal issues of a large VM

Post by unsichtbarre » Jun 22, 2015 1:57 pm

We are running ESXi Build 2638301.

Also, our storage is 10Gb iSCSI, both sides, but dis-similar brands and not 3PAR, so no storage snapshots as far as I know
-The Invisible Admin-
http://www.johnborhek.com

foggy
Veeam Software
Posts: 18356
Liked: 1575 times
Joined: Jul 11, 2011 10:22 am
Full Name: Alexander Fogelson
Contact:

Re: Snapshot removal issues of a large VM

Post by foggy » Jun 22, 2015 2:01 pm

Storage snapshots are also supported on StoreVirtual and some NetApp models. You do not need similar storage on the target side.

cardendave
Service Provider
Posts: 33
Liked: 1 time
Joined: Jul 22, 2015 7:59 am
Full Name: Dave King
Contact:

[MERGED] : New to B&R - Setup Advice Needed

Post by cardendave » Jul 22, 2015 8:02 am

Hi Guys

I am new to Backup & Replication, we are an MSP and are trialling this on a number of clients. Seems to be a great product but I have had 2 occasions where 9am has struck and vmware is still removing a snapshot making the server inaccessible? What are the considerations that should be made to avoid this?

Thanks

Dave

veremin
Product Manager
Posts: 16994
Liked: 1453 times
Joined: Oct 26, 2012 3:28 pm
Full Name: Vladimir Eremin
Contact:

Re: Snapshot removal issues of a large VM

Post by veremin » Jul 22, 2015 8:16 am

Hi, Daven,

Your post has been merged into the existing discussion; some recommendations can be found in this rather long thread.

Thanks.

egraeber
Novice
Posts: 3
Liked: never
Joined: Nov 12, 2015 3:35 am
Full Name: Eric Graeber
Contact:

[MERGED] JDE ERP system failing during vmware snapshot conso

Post by egraeber » Nov 12, 2015 4:01 am

I am having an issue with my JDE ERP system failing during snapshot consolidation at the end of the backup due to loss of connection with my SQL database. The database is MS SQL 2008 R2. The system is MS Windows 2008 R2 hosted on VMware ESXi 5.1. It is a very busy SQL server with a lot of data--around 10TB all together. The snapshot took around 2.5 hours to consolidate, however the issue didn't occur until the very end of the job which presumably corresponds to the time at which the final stun occurred. It doesn't seem that the stun was overly long and all of the databases are on an SSD array so I don't believe performance is an issue. Overall performance is excellent within my VMware environment, especially since we migrated the database onto the SSD array.

Two questions:
1. Is anyone else out there successfully backing up a large highly active SQL server dataset like this without incurring disconnects?
2. Is anyone else out there backing up a JDE database? If yes, have you faced or are you facing similar issues?

-- Eric

P.Tide
Product Manager
Posts: 5306
Liked: 466 times
Joined: May 19, 2015 1:46 pm
Contact:

Re: JDE ERP system failing during vmware snapshot consolidat

Post by P.Tide » Nov 12, 2015 9:35 am

Hi,

Do you backup the whole server every time or just transaction logs? Also what's your transport mode?

Thank you.

egraeber
Novice
Posts: 3
Liked: never
Joined: Nov 12, 2015 3:35 am
Full Name: Eric Graeber
Contact:

Re: Snapshot removal issues of a large VM

Post by egraeber » Nov 12, 2015 4:05 pm

So far I have only executed one full backup. Once it crashed the application I had to back off until I can identify the problem. I'm not sure what you mean by "transport mode". Please elaborate.

P.Tide
Product Manager
Posts: 5306
Liked: 466 times
Joined: May 19, 2015 1:46 pm
Contact:

Re: Snapshot removal issues of a large VM

Post by P.Tide » Nov 12, 2015 4:17 pm

Your topic has been merged to an existing thread, please review possible solutions provided and check whether any of them applicable to your environment. The first step would be to offload the storage your SQL server resides at.
I'm not sure what you mean by "transport mode".
Please take a look at the article and check if you can use another transport mode to optimize your backup process. If any questions arise feel free to ask.

Also please note, that you don't have to take a whole VM snapshot every time. Instead you can utilize transaction log processing feature in order to meet your SQL server RPO.

Thank you.

egraeber
Novice
Posts: 3
Liked: never
Joined: Nov 12, 2015 3:35 am
Full Name: Eric Graeber
Contact:

Re: Snapshot removal issues of a large VM

Post by egraeber » Nov 12, 2015 11:29 pm

Transport mode is automatic. The SQL server and VEEAM server are both VM's in the same VMware cluster. The VMware cluster consists of 5 Cisco UCS blades with PureStorage SSD array on 8Gig Fiber Channel interconnect. The hotadd process seems to be working exactly as expected for the backup proxy (the VEEAM server itself) and throughput is pretty good--243.1MB/s. My backend is a SATA based array which also has 8Gig FC connection.

Our SQL maintenance plans make daily full backups and TRN's on the half hour which we are using for log shipping to our DR system. I have turned off log file processing in order to avoid interfering with this process. Either way, it is my understanding that this is more of a "convenience feature" so that Veeam can manage the logfile truncation without involving a DBA. It is my understanding that this isn't relevant to the size of the VIB's that are created or the method of their creation.

P.Tide
Product Manager
Posts: 5306
Liked: 466 times
Joined: May 19, 2015 1:46 pm
Contact:

Re: Snapshot removal issues of a large VM

Post by P.Tide » Nov 13, 2015 9:35 am

Ok, first you need to get your full backup done. Have you checked this article already? There mentioned couple of things to check, so if you haven't done that yet please follow the link.
PureStorage SSD array on 8Gig Fiber Channel interconnect<...>
You've mentioned FC - you might want to use direct SAN mode - it's even faster than hotadd. You'll need a dedicated physical proxy that has a direct access to your production via FC or iSCSI, however.

tourdemon
Influencer
Posts: 10
Liked: never
Joined: Dec 05, 2011 1:06 pm
Full Name: Joe Rubino
Contact:

[MERGED] Application Error

Post by tourdemon » Feb 04, 2016 2:31 pm

Hello all< i have a Tax application server that I want to back up hourly, it is not a database server, just a server that stores Tax files in folders on drives that are mapped for users which the application accesses. Anyway, when I performed the first hourly backup today it disconnected the users for a brief second. I have app aware and guest file indexing enabled. I am wondering if I need app aware enabled here and if this is what caused the users to be disconnected. Your help would be greatly appreciated.

foggy
Veeam Software
Posts: 18356
Liked: 1575 times
Joined: Jul 11, 2011 10:22 am
Full Name: Alexander Fogelson
Contact:

Re: Application Error

Post by foggy » Feb 04, 2016 2:42 pm

Joe, please check whether disconnection occurred at the time VM snapshot was committed after the backup.

tourdemon
Influencer
Posts: 10
Liked: never
Joined: Dec 05, 2011 1:06 pm
Full Name: Joe Rubino
Contact:

Re: Application Error

Post by tourdemon » Feb 04, 2016 2:48 pm

Yes, around the time the disks consolidated and the snapshot was removed.

foggy
Veeam Software
Posts: 18356
Liked: 1575 times
Joined: Jul 11, 2011 10:22 am
Full Name: Alexander Fogelson
Contact:

Re: Snapshot removal issues of a large VM

Post by foggy » Feb 04, 2016 2:59 pm

Then short stun is expected. Please review this thread for some hints allowing to eliminate this effect.

tourdemon
Influencer
Posts: 10
Liked: never
Joined: Dec 05, 2011 1:06 pm
Full Name: Joe Rubino
Contact:

Re: Snapshot removal issues of a large VM

Post by tourdemon » Feb 04, 2016 3:15 pm

Odd, when I was using Acronis, this didn't happen and it was backing up every 15 minutes. If I change to every 15 instead of an hour wouldn't that make the snapshot smaller, therefore maybe eliminating the short stun? I can not have users constantly being kicked out and having to re-login to the application, talking about 65 accountants.

foggy
Veeam Software
Posts: 18356
Liked: 1575 times
Joined: Jul 11, 2011 10:22 am
Full Name: Alexander Fogelson
Contact:

Re: Snapshot removal issues of a large VM

Post by foggy » Feb 04, 2016 3:22 pm

You can test whether more frequent backups will allow to avoid this. You can just test manual snapshot creation using vSphere Client, even without running Veeam B&R backup job, it should produce similar behavior.

JosueM
Expert
Posts: 162
Liked: 10 times
Joined: Sep 01, 2012 2:53 pm
Full Name: Josue Maldonado
Contact:

[MERGED] VM lost connection while removing VM snapshot.

Post by JosueM » Apr 11, 2016 9:57 pm

Good day everyone.

We have a SQL application server that was moved to SSD drives to improve performance and overall the new storage runs great. The problem is if we run a backup in working hours the app server stop responding for a few seconds and users gets kicked off . This happen must of the time when the job does remove the temporary snapshot, sometimes happens when it creates the snapshots but is rare.

We toguht that moving the server to SSD drives would solve the issue but still happens, is there any other way to backup the information during the work hours without quitting the users?

Thanks in advance.

P.Tide
Product Manager
Posts: 5306
Liked: 466 times
Joined: May 19, 2015 1:46 pm
Contact:

Re: VM lost connection while removing VM snapshot.

Post by P.Tide » Apr 12, 2016 8:30 am

Hi,

Such thing may happen when you backup a VM that has a highly transactional on it. Have you considered using Transaction Log backup during the day instead of doing a VM backup? During that procedure no snapshots are taken thus the connection should be fine.

Thank you.

JosueM
Expert
Posts: 162
Liked: 10 times
Joined: Sep 01, 2012 2:53 pm
Full Name: Josue Maldonado
Contact:

Re: Snapshot removal issues of a large VM

Post by JosueM » Apr 12, 2016 3:22 pm

hello PTide,

Transaction Log seems a good option to backup, the major issue I see is restoring this VM would take abour 6 from plain backup, will have to measure the time adding the tlog restore. Since this in the main app server we would like to have it up and running as soon as possible.

Do you know if the tlog backup works somehow with replica?

foggy
Veeam Software
Posts: 18356
Liked: 1575 times
Joined: Jul 11, 2011 10:22 am
Full Name: Alexander Fogelson
Contact:

Re: Snapshot removal issues of a large VM

Post by foggy » Apr 12, 2016 3:49 pm

Transaction logs backup is not available in replication jobs.

JosueM
Expert
Posts: 162
Liked: 10 times
Joined: Sep 01, 2012 2:53 pm
Full Name: Josue Maldonado
Contact:

Re: Snapshot removal issues of a large VM

Post by JosueM » Jun 17, 2016 2:16 pm

So basically, for most of the app servers (SQL) (thats about 80% of the workload), we will have figure out another way to backup transactionals VMs instead of using veeam like SQL backup and restore?

It seems this is apretty common problem and it has been here for a long time , I wonder how difficult could be for vmware to solve the issue.

randerson999
Influencer
Posts: 14
Liked: never
Joined: Jun 30, 2016 11:49 am
Full Name: Ross Anderson
Contact:

Re: Snapshot removal issues of a large VM

Post by randerson999 » Jun 30, 2016 12:06 pm

I'm late to this party, but same issues here - doesn't matter if it's a large VM or small, during the snapshot creation AND removal, we lose a few packets. Nothing like what a lot of other people are experiencing here (ie. minutes to hours of being offline), but we do lose a few packets here and there due to the VM being "stunned". This happens with our largest and smallest VMs - we lose a few packets, which causes any remote connections to the VM in question to fail (such as SQL connections for SAP processes running from remote App servers), resulting in program dumps. As there is not technically a time-out (rather, a broken network connection), SAP doesn't have a suitable workaround for the problem.

My question is, what is causing the stun on the VM? Is it lack of IO based on VM size? If that was the case, why does even the smallest VM have the same issues (tested during a slow time when all systems are basically idle)? Nothing on our storage or vm infrastructure side indicates a lack of IOPs available, so what is actually causing the VM to be stunned?

I've read through this thread (and many, many others) but I don't know that I've seen an actual ROOT cause for the majority of the issues, other than the not-enough-IO-available conclusion.

BTW - we're on the latest version of Veeam and have vSphere 5.5 U2.

Vitaliy S.
Product Manager
Posts: 23062
Liked: 1582 times
Joined: Mar 30, 2009 9:13 am
Full Name: Vitaliy Safarov
Contact:

Re: Snapshot removal issues of a large VM

Post by Vitaliy S. » Jun 30, 2016 3:54 pm

Hi Ross,

The actual root cause is the way how VM snapshots are committed in vSphere. Here is a good blog post from Luca for further reading > http://www.virtualtothecore.com/en/vsph ... hing-past/

Thanks!

sajid
Lurker
Posts: 1
Liked: never
Joined: Nov 03, 2016 5:55 am
Full Name: Sajid Attar
Contact:

[MERGED] snapshot removal takes long time

Post by sajid » Nov 03, 2016 6:00 am

Hi Team,

We have Veeam 9.0 version in production and we are facing issue with related to snapshot removal issue it taking more than 8 hours and each replication job complete approx 14 hours to complete

Current infra is Vmware 6.0 U2.

Please suggest on this.

veremin
Product Manager
Posts: 16994
Liked: 1453 times
Joined: Oct 26, 2012 3:28 pm
Full Name: Vladimir Eremin
Contact:

Re: Snapshot removal issues of a large VM

Post by veremin » Nov 03, 2016 10:13 am

Your post has been merged into existing discussion. Kindly, check answers provided above. Thanks.

lando_uk
Expert
Posts: 312
Liked: 27 times
Joined: Oct 17, 2013 10:02 am
Full Name: Mark
Location: UK
Contact:

Re: [MERGED] snapshot removal takes long time

Post by lando_uk » Nov 08, 2016 5:26 pm

sajid wrote:We have Veeam 9.0 version in production and we are facing issue with related to snapshot removal issue it taking more than 8 hours and each replication job complete approx 14 hours to complete

Current infra is Vmware 6.0 U2.
Not really enough information.
How big is the VM that takes 8hrs to consolidate?
What is the VM doing (change rate) during the backup/replication task?
What's the network speed between source and destination?
Is it much quicker when you replicate from your primary backup rather than the live VM?
What does your SAN/Datastore monitoring tools tell you during this process?
How many other VMs are on the datastore and are they also busy?

KeiichiKun
Enthusiast
Posts: 62
Liked: 10 times
Joined: Jul 21, 2016 3:59 pm
Contact:

[MERGED] Replication job and removing snapshot too long

Post by KeiichiKun » Jan 20, 2017 4:33 pm

Hi,
I'm trying to move a vm from a cluster to another cluster using a replication job to minimize downtime.
After first backup, the subsequent jobs (run manually due to the problem I'm writing about) take about 10 minutes to synchronize but removing snapshot is too long..
My VM has 3 disk for 600Gb total and removing snapshot after 3 hours is at 54%, it will take 7 to 8 hours to complete; every snapshot is about 80 Gb (3 restore points keep).
The VM is on SSD/10k rpm disk on dell compellent with auto tiering.
Do you think this is a normal? I don't think so, I'm opening a support to Dell if you can confirm that.
Thanks!

veremin
Product Manager
Posts: 16994
Liked: 1453 times
Joined: Oct 26, 2012 3:28 pm
Full Name: Vladimir Eremin
Contact:

Re: Snapshot removal issues of a large VM

Post by veremin » Jan 23, 2017 10:52 am

Even though your VM is not large, the symptoms experienced are quite similar to those described in this thread. So, kindly, familiarize yourself with answers above.

As a first investigation step, you can try to reproduce the issue without VB&R being present in equation - take a snapshot manually, keep it long enough (the time similar to the one replication job takes), delete it and the whether problem re-appears.

Thanks.

shlomia
Influencer
Posts: 13
Liked: never
Joined: Mar 20, 2017 3:40 pm
Full Name: Shlomi
Contact:

[MERGED] VM unresponsive during removing /consolidating snap

Post by shlomia » Apr 29, 2017 11:04 am

Hi,
So only one of my VM, which is a DB server, seems to be unresponsive while vsphere is finishing the backup and trying to remove the snapshot.
He become unresponsive for few minutes, and we cannot connect to him.
also the monitor warning us that he is down.

I'm running ESXI 5.5 and I came up with this patch :
https://kb.vmware.com/selfservice/micro ... Id=2096282

I checked my ESXI's and they do not have this update installed.
I'm just scared to manually update, because it says that impact will be a reboot.
Although I have vsphere HA, do I need to scare to do any reboot to the esxi's?
Also, anyone installed this patch and fixed the problem?

thank you

Vitaliy S.
Product Manager
Posts: 23062
Liked: 1582 times
Joined: Mar 30, 2009 9:13 am
Full Name: Vitaliy Safarov
Contact:

Re: Snapshot removal issues of a large VM

Post by Vitaliy S. » May 01, 2017 9:14 am

Hi Shlomi,

I cannot comment whether this patch can resolve your issues or not, but you may want to review the last pages of this topic for some tips on how to resolve this behavior.

Do you have VM HA or Cluster HA feature enabled? You can try install this patch via VUM again and see if manual installation is required or not.

Thanks!

Post Reply

Who is online

Users browsing this forum: No registered users and 15 guests