Discussions specific to the VMware vSphere hypervisor
Post Reply
gcballard
Influencer
Posts: 14
Liked: never
Joined: Aug 10, 2011 2:34 pm
Contact:

Re: Snapshot removal issues of a large VM

Post by gcballard »

I would like to throw out a general Storage vs. VMware comment. For whatever reason, we are having a hard time seeing anything show up on the storage array that would indicate a problem, but I know from working w/ VMware before and taking some snaps that it is the problem. I have set up alerts monitoring VM latency in VMware. Then when I see that I know there's a storage performance issue.

To set up an alarm in vCenter go to Alarms->New Alarm
Alarm Type: Virtual Machines
Triggers->Add->VM Max Total Disk Latency (ms)
I have it give me a warning if above 20 (Fibre Array) for 2 minutes and alert if above 75 for 10 minutes. You would want to play with the settings to find out what's appropriate for your array.

Yesterday this alarm really helped me figure something out. I had alarms on about 6 machines. I got to looking and they shared two LUNs. I asked the Storage Engineer and found out that those LUNs were on SATA. I had thought they were on Fibre. So, now I know what's going on and I can take appropriate action.

goldleader80
Influencer
Posts: 10
Liked: never
Joined: Mar 20, 2012 8:50 am
Contact:

Exchange 2010 backup every 2 hours

Post by goldleader80 »

[merged]

We are running Veeam backups every 2 hours during a working week of our Exchange 2010 Database server. This is working fine however i am seeing some Outlook clients going offline and in some cases freezing and losing current composed emails, when the snapshoots are taken and removed through VMWare.

The setup is:

Windows Server 2008 R2 with all the latest updates bar the last 2 weeks.
Exchange Server 2010 with SP1 with 4 Databases.
VMWare esxi 4.1, unsure of patch level.
VMware tools installed on the Virtual Machine version 8.3.12.
Veeam version 6 with 2 patch fixs.
Backup Target CIFS share on a Data Domain box.
Windows server 2003 Veeam Proxy Server going straight to the Backup Target.

Backup job is as follows:

Forward Incremental every 2 hours during weekdays with a synthetic full on a saturday.
Application-aware image processing.

Windows Clients:

Windows XP SP3
Outlook 2003


Im unsure if this is a VSS problem on the Exchange server, however i have looked in the Event viewer and see no errors when a backup runs.

Im thinking it might be more on the VMWare side of things because if i do a manual snapshoot of the Virtual Machine i do get the same problem. I have read http://communities.vmware.com/docs/DOC-11987 that this could be a problem with the way VMWare tools are installed in regard to VSS and SYNC driver. I think both of these were installed when the latest VMWare Tools were installed. I wonder if removing the tools and just installing the VSS and not the SYNC driver will fix the problem, as im not using quiescing but using Application-aware image processing in the backup job do i need the SNYC driver? Would this cause other issues?

If anyone else has seen this problem please help.

Thanks.

dellock6
Veeam Software
Posts: 5926
Liked: 1743 times
Joined: Jul 26, 2009 3:39 pm
Full Name: Luca Dell'Oca
Location: Varese, Italy
Contact:

Re: Exchange 2010 backup every 2 hours

Post by dellock6 »

Hi,
we are missing the storage you are using, but sounds like the usual problem with VMs with high I/O when it comes to committing the snapshot.
As I wrote in other threads, you can be sure is VMware problem by trying to snapshot the same VM, wait for a couple of hours, and than try to commit the snapshot. If it happens, then is the same problem as I described.

Luca.
Luca Dell'Oca
Principal EMEA Cloud Architect @ Veeam Software

@dellock6
https://www.virtualtothecore.com/
vExpert 2011 -> 2020
Veeam VMCE #1

goldleader80
Influencer
Posts: 10
Liked: never
Joined: Mar 20, 2012 8:50 am
Contact:

Re: Snapshot removal issues of a large VM

Post by goldleader80 »

Ok i see this is a very common problem then with Backups and snapshots on Exchange running during busy times of the day.

My Storage is:

Dell EqualLogic PS6000 SAS Raid 50
Enclosure Firmware: 02.03
Storage Array Firmware: v5.0.7

I have read through the whole of this thread and there are a few suggestions that might help fix this problem but no one true overall fix, which i guess is right as every environment has different hardware and setup.

From the list of steps to take below please could you give me some guidence on which to start with first:

- Open a support call with VMWare as i do get problems when taking manual snapshots.
- Disable VMWare Tools SYNC Driver.
- Disable Changed Block Tracking for this job. What effect will this have, ie on the time it takes to backup.
- Use CPU reservations on the Exchange VM. None are in place at this time and what is a good starting point for this, currently the VM has 2 vCPUs.
- Update ESXi and vSphere to latest patch level.
- Update Veeam to latest patch level.
- Update EqualLogic to latest Firmware. I really don't want to do this to be honest as apart from this little issue the SAN is rock solid and i don't want to change that.


Thank again for the help.

dellock6
Veeam Software
Posts: 5926
Liked: 1743 times
Joined: Jul 26, 2009 3:39 pm
Full Name: Luca Dell'Oca
Location: Varese, Italy
Contact:

Re: Snapshot removal issues of a large VM

Post by dellock6 »

Hi,
the update stesp are always good stuff to do, usually tehy fix issues and add stabilty, so I will do them for all the three elements you described.
About CBT, do not disable it, since without CBT Veeam has to scan all the vmdk to find out which blocks have changed since last backup, so it will take more time to complete the backup. And this is going to create a larger snapshot file, and then it will take more time to be committed.
CPU reservation has nothing to do with snapshots, the cpu is going to be used by the VMware kernel to do snapshots, not by the VM. In theory, on the opposite, giving less power to the Exchange server will give you better results since it can handle less IOPS while running. But you need to size Exchange for production, not only for beeing backup-ed...
About the sync driver, check this VMware KB article:

http://kb.vmware.com/selfservice/micros ... Id=1009886

Luca.
Luca Dell'Oca
Principal EMEA Cloud Architect @ Veeam Software

@dellock6
https://www.virtualtothecore.com/
vExpert 2011 -> 2020
Veeam VMCE #1

goldleader80
Influencer
Posts: 10
Liked: never
Joined: Mar 20, 2012 8:50 am
Contact:

Re: Snapshot removal issues of a large VM

Post by goldleader80 »

Thanks Luca,

Yes i will look at doing those fixes for all the three areas and see if this improves things.

Thanks for the advice about CBT i will leave that as it is.

With regard to The SYNC driver will this have been disabled already when i installed VMWare Tools 8.3.12? I didn't untick it at the time of installing those VMWare tools but if i look in Device manager on that VM and in Non-Plu and Play Devices it isn't listed. To be safe do you think its best to reinstall the VMWare tools and that way i know for sure that it's not install. That methord you describe does that give the same result as doing an interactive install and then a custome install and just unticking it as part of the install?

Thanks

dellock6
Veeam Software
Posts: 5926
Liked: 1743 times
Joined: Jul 26, 2009 3:39 pm
Full Name: Luca Dell'Oca
Location: Varese, Italy
Contact:

Re: Snapshot removal issues of a large VM

Post by dellock6 »

I will stay with the method explained in the VMware KB to be sure it succeeds.
Luca Dell'Oca
Principal EMEA Cloud Architect @ Veeam Software

@dellock6
https://www.virtualtothecore.com/
vExpert 2011 -> 2020
Veeam VMCE #1

Rohail2004
Enthusiast
Posts: 41
Liked: never
Joined: Oct 06, 2010 6:54 pm
Contact:

Big size VM backup issue.

Post by Rohail2004 »

[merged]

We have been using Veeam for over two years and mostly satisfied, but recently as our company grew and we built more big SQL servers such as from 700GB to 1.2TB VM in size, then we saw backing up through Veeam has been extremely challenging since snapshot takes forever to remove after the backup, and during the backup it is filling up the LUN and causing the VM to go down with no space left on the storage. We have over 10% free space available on those LUNs, but they continue to have removal snapshot issues.

Is anyone out there experiencing the same issue? Or had experienced it but resolved it? so would you share your solution or advice me how to fix this issue, because it is causing a nightmare in our environment as we have to work nights/weekends to get the snapshot removed in order to bring the VM back online. Disabling a backup on those VM isn’t an option.

We are ESXi 4.1 with Veeam backup 5.0. We are upgrading to Veeam 6.0 this week, but I am not sure whether that would get rid of the problem since it is been happening at constant basis.

Thanks in advance.

Jfmoots
Veeam Software
Posts: 214
Liked: 26 times
Joined: Oct 28, 2011 3:26 pm
Full Name: James Moots
Location: Ohio, United States
Contact:

Re: Big size VM backup issue.

Post by Jfmoots »

With the information you have here, I'd suggest increasing the amount of free space on your datastore.

What hardware version are those SQL servers?

How long are they taking to backup?

Are you running them in Incremental or Reverse Incremental?

What's the bottleneck stats for the job regarding this VM?

What's the production storage?

What mode are you backing this VM up with? (Network, Virtual Appliance, or Direct SAN)

Let's examine those details and see if we can uncover something that might speed that job up.

Rohail2004
Enthusiast
Posts: 41
Liked: never
Joined: Oct 06, 2010 6:54 pm
Contact:

Re: Big size VM backup issue.

Post by Rohail2004 »

They're SQL 64 bit..

it is taking over 8 hours to backup the full backup.

I am running Monday- friday Incre, and Saturday full.

I don't know the bottleneck and that's why I am trying to find out.

We are EMC VMAX storage

I'm backing up through Network, have 10GB backbone.

Jfmoots
Veeam Software
Posts: 214
Liked: 26 times
Joined: Oct 28, 2011 3:26 pm
Full Name: James Moots
Location: Ohio, United States
Contact:

Re: Snapshot removal issues of a large VM

Post by Jfmoots »

Double click the job. That will let you view the Realtime Statistics. There is an overall bottleneck stats for the job and then as you click on each VM in the job you'll get VM specific bottleneck stats on the job.

What hardware version is the VM that's giving you trouble?

anorton
Influencer
Posts: 21
Liked: never
Joined: Sep 06, 2011 11:56 am
Full Name: Aaron Norton
Contact:

Re: Snapshot removal issues of a large VM

Post by anorton »

I have been running into the same issue with VM's disconnecting. This only happens during the snapshot removal process and the machine I am testing with only last 58 second but it is long enough to cause issue with some systems because of the network drop. I have done a manual snapshot and removal and that removal takes 2 seconds directly within vmware. Any thoughts on changes or tweaks ?

Vitaliy S.
Product Manager
Posts: 24243
Liked: 1859 times
Joined: Mar 30, 2009 9:13 am
Full Name: Vitaliy Safarov
Contact:

Re: Snapshot removal issues of a large VM

Post by Vitaliy S. »

Have you waited long enough (the same time as you backup takes) before committing the snapshot with vSphere Client?

tsightler
VP, Product Management
Posts: 5675
Liked: 2486 times
Joined: Jun 05, 2009 12:57 pm
Full Name: Tom Sightler
Contact:

Re: Snapshot removal issues of a large VM

Post by tsightler »

anorton wrote:I have been running into the same issue with VM's disconnecting. This only happens during the snapshot removal process and the machine I am testing with only last 58 second but it is long enough to cause issue with some systems because of the network drop. I have done a manual snapshot and removal and that removal takes 2 seconds directly within vmware. Any thoughts on changes or tweaks ?
Yes, when you took the snapshot "directly" did you actually wait the same amount of time as the backup takes and did you take the snapshot during the same window of time? Are other backups/snapshots running when Veeam is removing the snapshot thus adding load. Veeam makes the exact same call as whats done via the vSphere client so it doesn't make much sense that the behavior would be different.

anorton
Influencer
Posts: 21
Liked: never
Joined: Sep 06, 2011 11:56 am
Full Name: Aaron Norton
Contact:

Re: Snapshot removal issues of a large VM

Post by anorton »

I actually waited longer. Veeam has now told me this is not a "proper" test.

anorton
Influencer
Posts: 21
Liked: never
Joined: Sep 06, 2011 11:56 am
Full Name: Aaron Norton
Contact:

Re: Snapshot removal issues of a large VM

Post by anorton »

Maybe I am not doing a commit right. I have done a revert, consolidate and removal all of which take 2-3 seconds

Vitaliy S.
Product Manager
Posts: 24243
Liked: 1859 times
Joined: Mar 30, 2009 9:13 am
Full Name: Vitaliy Safarov
Contact:

Re: Snapshot removal issues of a large VM

Post by Vitaliy S. »

You do not need to revert to the snapshot if you want to reproduce Veeam B&R behavior.

Just follow these steps:
1. Create the snapshot at the same time as you ran the backup job
2. Keep it for the same period of time
3. Delete (do not choose to "Revert/Go To") the snapshot

Veeam backup job follows exactly the same steps. If you do not see any VM disconnections, then you wouldn't see them while running the backup job too.

Hope this helps!

carywlanders
Lurker
Posts: 1
Liked: never
Joined: Jul 05, 2012 7:35 pm
Full Name: Cary Landers
Contact:

Re: Snapshot removal issues of a large VM

Post by carywlanders »

Has anyone been able to fix this issue. I have a large Exchange and SQL server that I can not replicate during the day. The snapshots cause timeouts on the network.

Vitaliy S.
Product Manager
Posts: 24243
Liked: 1859 times
Joined: Mar 30, 2009 9:13 am
Full Name: Vitaliy Safarov
Contact:

Re: Snapshot removal issues of a large VM

Post by Vitaliy S. »

Gostev wrote:Short summary of things which may help (for more information, please read this topic):

1. Make sure VM does not have any other snapshots (including hidden).
2. Increase CPU reservations in the VM settings.
3. Move snapshot location to a different datastore (via workingDir parameter), preferably backed by faster storage (for example, SSD disk).

bhwong
Enthusiast
Posts: 99
Liked: 3 times
Joined: May 24, 2012 9:57 am
Full Name: Boon Hong Wong
Contact:

Re: Snapshot removal issues of a large VM

Post by bhwong »

Will be great if Veeam can utilize SAN level snapshot instead of just hypervisor snapshot to overcome freezing of VMs during removal of snapshots after backup jobs are done. Was told that Simpana can do this.

pacmantravis
Novice
Posts: 6
Liked: never
Joined: Dec 24, 2009 8:32 pm
Full Name: Travis Nieves

Re: Snapshot removal issues of a large VM

Post by pacmantravis »

Yes, it would be great if Veeam could have a feature similar to Simpana's snapprotect. I would even pay more for it as an added option. Simpana is in a different price galaxy compared to Veeam; so I definitely understand having to pay for an extra feature like this.

I have run into this in my environment as well and have reverted to using BackupExec for Exchange backups. For SQL, I use Veeam during off hours and SQL dumps to our Exagrids during the day.

rkovhaev
Veeam Software
Posts: 39
Liked: 21 times
Joined: May 17, 2010 6:49 pm
Full Name: Rustam
Location: hockey night in canada
Contact:

Re: Snapshot removal issues of a large VM

Post by rkovhaev »

If you have NFS storage and you utilize hotadd as backup mode, your VMs might be 'stunned' longer than usual (lose network connectivity)

Recently we had case opened with VMware SDK team and they will be releasing official document shortly. As of now they have explained it as an 'issue' with NFS locking 'mechanism'. Issue can be easily reproduced using vSphere client (snapshot VM, mount disks to backup proxy as independent non-persistent, dismount disks -> remove VM snapshot) on ESXi4/ESXi5

Workarounds:
1. Switch to network mode
2. Migrate backup proxy to the same ESXi host where you have your VM.

cengroba
Lurker
Posts: 1
Liked: never
Joined: Aug 19, 2012 10:34 pm
Contact:

Random Guest Freeze during snapshot creation

Post by cengroba »

[merged]

Hello,

During Backups, we have noticed the guest OS freezes for a few moments during the snapshot creation or removal. The vm's run on local storage (Dell Server, RAID 10, 6x300GB SAS HD, 15K RPM). The issue does not occur all the time during the snapshot creation or deletion; I would say about 60% of the time. It happens to both Windows and Linux VM's. We are running on ESXi 5.0. We do not have the "Application aware" option selected nor the "Enable VMware tools quiescence" checked.

I have read numerous KB articles on VMware however, none of them seem to be our exact issue. The closest one would be: http://kb.vmware.com/selfservice/micros ... Id=1013163

Does Veeam take a snapshot of the VM's memory also? Is there a way to disable this? I should also note the issue occurs if we perform a snapshot using the vsphere client and select the "copy VM's memory" option.

Any assistance would be appreciated, thanks.

Gostev
SVP, Product Management
Posts: 26679
Liked: 4268 times
Joined: Jan 01, 2006 1:01 am
Location: Baar, Switzerland
Contact:

Re: Random Guest Freeze during snapshot creation

Post by Gostev »

No - Veeam specifically instructs vSphere not to take snapshot of memory (only disks).

chimera
Enthusiast
Posts: 57
Liked: 3 times
Joined: Apr 09, 2009 1:00 am
Full Name: J I
Contact:

Re: Snapshot removal issues of a large VM

Post by chimera »

I'll join in the fun and say that I'm experiencing the same issues, and as noted - only with large snapshots. I am on ESXi 5.0 Build 768111 and Veeam 6.1 Patch 1 connected to EqualLogic PS6010XV 10GbE iSCSI SAN firmware 5.2.1 (running Direct SAN, reversed incremental backups). In the past 2 x days, 2 different VM's had lengthy network timeouts with while the snapshot were commited/removed - and these did actually have prior snapshots in place (in fact, they were temporary Veeam snapshots from past jobs that had run and the Veeam backup service had crashed)

Monitoring backups with smaller snapshot deltas, and running a consistant ping, there is only a single timeout when the snapshot removal completes. With the large snapshots pings timeout for 10-30 minutes. I have also had a considerable network timeout with an Exchange 2010 VM where its snapshot was being committed/removed any no other snapshots were in place. I just wish I had taken note of what has changed in the environment, because we had this issue a year ago and it disappeared - but we can't recall what changes were made (so many upgrades - ESXi patches, Veeam patches, SAN firmware upgrades etc) and now here I am with the same issue occurring again (and we've deleted many large snapshots during the day, had Veeam over-runs into business hours etc without any network issues since the last time we had this problem)

For now, I have disabled "retry" on the Veeam jobs and configured to terminate the job if it exceedes the allowable backup window (during business hours) purely as a "please, no more screaming users" safety precaution.

Perhaps one simple suggestion for the Veeam developers - is there any chance you can append the date/time that the snapshot is taken into the description of the snapshot? So it would look something like "Please do not delete this snapshot. It is beign used by Veeam Backup (21 Aug 2012, 5:56pm)" In fact, I really don't understand why VMWare don't date/time stamp each snapshot automatically to be honest. Anyways, something as simple as this coding change would make troubleshooting alot easier and save having to check through the logs or browse the datastore to check the date/time stamps there.

Vitaliy S.
Product Manager
Posts: 24243
Liked: 1859 times
Joined: Mar 30, 2009 9:13 am
Full Name: Vitaliy Safarov
Contact:

Re: Snapshot removal issues of a large VM

Post by Vitaliy S. »

chimera wrote:Perhaps one simple suggestion for the Veeam developers - is there any chance you can append the date/time that the snapshot is taken into the description of the snapshot? So it would look something like "Please do not delete this snapshot. It is beign used by Veeam Backup (21 Aug 2012, 5:56pm)" In fact, I really don't understand why VMWare don't date/time stamp each snapshot automatically to be honest. Anyways, something as simple as this coding change would make troubleshooting alot easier and save having to check through the logs or browse the datastore to check the date/time stamps there.
Thanks for the feedback! I will pass this information along to our R&D team. On top of that, have you considered using Veeam ONE to alert on existing snapshots and review all active snapshots (via VMware Active Snapshots report) for all VMs in your VI?

JWester
Service Provider
Posts: 58
Liked: 7 times
Joined: Apr 04, 2011 8:56 am
Full Name: Joern Westermann
Contact:

Re: Snapshot removal issues of a large VM

Post by JWester »

chimera wrote: Perhaps one simple suggestion for the Veeam developers - is there any chance you can append the date/time that the snapshot is taken into the description of the snapshot? So it would look something like "Please do not delete this snapshot. It is beign used by Veeam Backup (21 Aug 2012, 5:56pm)" In fact, I really don't understand why VMWare don't date/time stamp each snapshot automatically to be honest. Anyways, something as simple as this coding change would make troubleshooting alot easier and save having to check through the logs or browse the datastore to check the date/time stamps there.
Use PowerCLI:
Get-VM | Get-Snapshot | Where { $_.Name.Length -gt 0 } | Sort Created | Select VM,Name,Created,SizeMB

JWester
Service Provider
Posts: 58
Liked: 7 times
Joined: Apr 04, 2011 8:56 am
Full Name: Joern Westermann
Contact:

Re: Snapshot removal issues of a large VM

Post by JWester »

rkovhaev wrote:If you have NFS storage and you utilize hotadd as backup mode, your VMs might be 'stunned' longer than usual (lose network connectivity)

Recently we had case opened with VMware SDK team and they will be releasing official document shortly. As of now they have explained it as an 'issue' with NFS locking 'mechanism'. Issue can be easily reproduced using vSphere client (snapshot VM, mount disks to backup proxy as independent non-persistent, dismount disks -> remove VM snapshot) on ESXi4/ESXi5

Workarounds:
1. Switch to network mode
2. Migrate backup proxy to the same ESXi host where you have your VM.
Wow. We also had some VMs which are "stunned" very long during snapshot removal. Support said that our storages are too slow - although we had no problem with manual snapshot creations and removal, only through Veeam backup.
But after switching to network mode snapshot removal is now much faster and network stuns only for a few seconds (instead of minutes as with Hotadd).
Thank you!

jbrant
Lurker
Posts: 1
Liked: never
Joined: Sep 07, 2012 5:41 am
Full Name: Jeremy Brant
Contact:

Re: Snapshot removal issues of a large VM

Post by jbrant »

Thank you guys for all your work on this. I am having the same issue on a File server. When you say switch to network mode... what exactly are you speaking of? I have added a proxy to the host.

foggy
Veeam Software
Posts: 19433
Liked: 1763 times
Joined: Jul 11, 2011 10:22 am
Full Name: Alexander Fogelson
Contact:

Re: Snapshot removal issues of a large VM

Post by foggy »

Network transport mode. Transport mode settings are available in the proxy server properties.

Post Reply

Who is online

Users browsing this forum: Bing [Bot] and 9 guests