Host-based backup of VMware vSphere VMs.
Post Reply
mikeely
Expert
Posts: 224
Liked: 69 times
Joined: Nov 07, 2016 7:39 pm
Full Name: Mike Ely
Contact:

Seeing greatly increased snapshot deletion times in ESX7 vs ESX6

Post by mikeely »

This probably isn't a Veeam-specific issue beyond the fact that Veeam is writing and removing snapshots all the time, but I thought I'd ask in here to see if anyone else has had the same experience. We stood up a new cluster on brand-new hardware and have been migrating VMs over to it - we're going Intel->AMD on this change so we're having to shut down the VMs and all that. We're also upgrading the vmware hardware version on the VMs to get them current although this doesn't seem to be making any difference in the snapshot deletion time.

Anyhow what I'm seeing is VMs that previously took on average 4-6 seconds for snapshot deletion are now taking 40-60 seconds, consistently. It's really bad as the VMs are being stunned for long enough during snapshot consolidation that it's causing crashes, alerts, etc. Is this anything somebody else here has seen before, and if so were you able to resolve the issue?
foggy
Veeam Software
Posts: 21071
Liked: 2115 times
Joined: Jul 11, 2011 10:22 am
Full Name: Alexander Fogelson
Contact:

Re: Seeing greatly increased snapshot deletion times in ESX7 vs ESX6

Post by foggy »

Hi Mike, what about the underlying storage where VMs reside - has it also changed?
mikeely
Expert
Posts: 224
Liked: 69 times
Joined: Nov 07, 2016 7:39 pm
Full Name: Mike Ely
Contact:

Re: Seeing greatly increased snapshot deletion times in ESX7 vs ESX6

Post by mikeely »

Nope, same storage: Tintri with the VAAI plugin.
'If you truly love Veeam, then you should not let us do this :D' --Gostev, in a particularly Blazing Saddles moment
Gostev
Chief Product Officer
Posts: 31556
Liked: 6719 times
Joined: Jan 01, 2006 1:01 am
Location: Baar, Switzerland
Contact:

Re: Seeing greatly increased snapshot deletion times in ESX7 vs ESX6

Post by Gostev »

As far as I remember, Tintri was truly one of a kind with how it integrates with VMware? Some sort of very special storage virtualization technic that allowed them to improve VM snapshot management operations performance specifically. Sounds like this logic just does not play well with ESXi7?
soncscy
Veteran
Posts: 643
Liked: 312 times
Joined: Aug 04, 2019 2:57 pm
Full Name: Harvey
Contact:

Re: Seeing greatly increased snapshot deletion times in ESX7 vs ESX6

Post by soncscy »

I'm curious, what isolation tests have you done? As long as a backup uses vADP, it's a standard call to the same type of API, you can even reproduce with PowerCLI if you're interested.

For your storage backing the v7 environment, how do normal snapshot deletions for snapshots that run for the same amount of time fare? Both from the host itself and from remote servers using PowerCLI?
mikeely
Expert
Posts: 224
Liked: 69 times
Joined: Nov 07, 2016 7:39 pm
Full Name: Mike Ely
Contact:

Re: Seeing greatly increased snapshot deletion times in ESX7 vs ESX6

Post by mikeely »

I don't have great data for manual snapshot removal either via powercli or vsphere web console because we just don't do that many of those - one offs for big software updates is about it. I did a test just now both on web and powercli and in both cases the VMs I was snapshotting took a long time to snapshot but consolidated almost instantly.

Is there a specific powershell way to use vADP or does something like

Code: Select all

 Get-Vm vmname | New-Snapshot -name "foo" -Quiesce -Memory
use the API?
soncscy
Veteran
Posts: 643
Liked: 312 times
Joined: Aug 04, 2019 2:57 pm
Full Name: Harvey
Contact:

Re: Seeing greatly increased snapshot deletion times in ESX7 vs ESX6

Post by soncscy » 1 person likes this post

Unless VMware has sneakily changed something (like Microsoft with HypeV...) it should be the exact same call as here: https://vdc-download.vmware.com/vmwb-re ... -guide.pdf (pg 69)

Code: Select all

 // At this point we assume the virtual machine is identified as ManagedObjectReference vmMoRef.
    String SnapshotName = "Backup";
    String SnapshotDescription = "Temporary Snapshot for Backup";
    boolean memory_files = false;
boolean quiesce_filesystem = true;
ManagedObjectReference taskRef = serviceConnection.getservice().CreateSnapshot_Task(vmMoRef,
SnapshotName, SnapshotDescription, memory_files, quiesce_filesystem);
That should be exactly what PowerCLI calls.

For your tests, just to be sure though, the length of time on snapshot and the time since the last backup was roughly equivalent to the backup situation? Time on snapshot tends to be a factor that gets overlooked I think since you have redo log growth for even allegedly inactive machines. A few years back had a client who had long snapshot consolidation on their Lotus Notes machines, and turns out the backup window was overlapping with some replication/garbage collection procedure, which churned tons of data and made the snapshot redo logs bloat.

I'm not saying that's specifically your issue, but more to illustrate the effect that the time on snapshot can have
Andreas Neufert
VP, Product Management
Posts: 6748
Liked: 1408 times
Joined: May 04, 2011 8:36 am
Full Name: Andreas Neufert
Location: Germany
Contact:

Re: Seeing greatly increased snapshot deletion times in ESX7 vs ESX6

Post by Andreas Neufert » 1 person likes this post

As far as I know the snapshot commit process did not change by VMware.

As usual datastore performance, snapshot place (if changed), data in the snapshot (changes during snapshot lifetime) and overall the IO load at the time the snapshot gets commited are key factors.

It do not matter if the snapshot was triggered by Veeam, the API, CLI or UI the process is always the same.
mikeely
Expert
Posts: 224
Liked: 69 times
Joined: Nov 07, 2016 7:39 pm
Full Name: Mike Ely
Contact:

Re: Seeing greatly increased snapshot deletion times in ESX7 vs ESX6

Post by mikeely » 1 person likes this post

I was able to prove the issue with VMWare support, they've escalated it to engineering. Something's different, unsure whether it has to do with something vmware did, differences in behavior between Intel (our old infra) and AMD (new), or something else entirely. It's pretty ugly though. I'll report back here once I have news from VMWare in case another Veeam user runs into this.
'If you truly love Veeam, then you should not let us do this :D' --Gostev, in a particularly Blazing Saddles moment
mikeely
Expert
Posts: 224
Liked: 69 times
Joined: Nov 07, 2016 7:39 pm
Full Name: Mike Ely
Contact:

Re: Seeing greatly increased snapshot deletion times in ESX7 vs ESX6

Post by mikeely »

Yeah, they're trying to blame Veeam:
I see that the previous engineer was able to identify that the issue is residing on your 7.0 vCenter and the snapshots are slow during the Veeam backups?

Have you gotten in touch with Veeam backupd?
No mention of the 2+ gigs of support packages I uploaded to the ticket, either.
'If you truly love Veeam, then you should not let us do this :D' --Gostev, in a particularly Blazing Saddles moment
Gostev
Chief Product Officer
Posts: 31556
Liked: 6719 times
Joined: Jan 01, 2006 1:01 am
Location: Baar, Switzerland
Contact:

Re: Seeing greatly increased snapshot deletion times in ESX7 vs ESX6

Post by Gostev » 1 person likes this post

But of course... it's an easy way out :D

Which is exactly why experienced users already recommended above that you try and reproduce the issue by creating, holding and removing VM snapshots without Veeam in the picture. On a second thought though, this may prove to be a challenge in case your issue also requires significant concurrent load on Tintri, such as one from backup jobs reading data. In this case, running IOmeter in a few VMs while snapshot is being removed should do.

Also, you really need to put Tintri in the loop with VMware, as honestly they would be the primary suspect for me, as opposed to VMware. Because by now, we can be confident that ESXi 7 itself does not have some major regression with snapshot deletion times. Otherwise, Veeam forums would have had a 10+ pages topic devoted to this issue by now :D
mikeely
Expert
Posts: 224
Liked: 69 times
Joined: Nov 07, 2016 7:39 pm
Full Name: Mike Ely
Contact:

Re: Seeing greatly increased snapshot deletion times in ESX7 vs ESX6

Post by mikeely »

Yeah, we're working the Tintri angle as well. I'd open a Veeam ticket on this if I thought there was any reason to think there was a need to but since Veeam's making the same API calls to both systems there's no logic behind that.

+1 on the iometer suggestion, and I'll do something to dirty a lot of blocks on the VM I'm testing with.
'If you truly love Veeam, then you should not let us do this :D' --Gostev, in a particularly Blazing Saddles moment
mikeely
Expert
Posts: 224
Liked: 69 times
Joined: Nov 07, 2016 7:39 pm
Full Name: Mike Ely
Contact:

Re: Seeing greatly increased snapshot deletion times in ESX7 vs ESX6

Post by mikeely »

As we've been working through this a new data point has emerged - the difference in snapshot consolidation times is very strongly correlated to switching over to hot-add versus NBD. I can't think of what the mechanism might be to cause this but we saw it when we moved all our VMs at a given location over to hot-add while changing nothing else on those VMs - they're on the same hardware with the same backing storage on the same ESX servers and the same vSphere server at the same versions of all things.

I've opened ticket 04643396 at Sev2 since the minute-plus stun times are crashing things in our environment.
'If you truly love Veeam, then you should not let us do this :D' --Gostev, in a particularly Blazing Saddles moment
mikeely
Expert
Posts: 224
Liked: 69 times
Joined: Nov 07, 2016 7:39 pm
Full Name: Mike Ely
Contact:

Re: Seeing greatly increased snapshot deletion times in ESX7 vs ESX6

Post by mikeely » 1 person likes this post

Had a great call with support, and it turns out the problem is basically this:
https://kb.vmware.com/s/article/2010953

VMs which are not on the same compute host as the proxy will be stunned for at minimum about 40 seconds. Moving to NFSv4 at the ESX datastore level is not an option for us.

One possible approach would be to have one proxy per ESX host, but here are the problems I can't see a way past so far (please feel free to contribute solutions to them):
  1. With multiple proxies come multiple threads, and we would almost immediately overwhelm the Tintri with too many backup threads.
  2. Setting "automatically select proxy" only chooses the proxy with the fewest threads being used - it does not AFAIK support ESX host affinity - but it would be nice if it did!
Our backup model is that any VM created gets pulled in by Veeam unless it's excluded either individually or by being placed in an excluded folder in the vSphere client. To set hard proxy-to-vm-to-host affinity we'd need to create one backup job per ESX host, set the proxy for that AND never migrate the proxy off that ESX host AND never migrate the other VMs off that host. Might as well use some free virtualization platform that doesn't support live migration at that point :(
'If you truly love Veeam, then you should not let us do this :D' --Gostev, in a particularly Blazing Saddles moment
Gostev
Chief Product Officer
Posts: 31556
Liked: 6719 times
Joined: Jan 01, 2006 1:01 am
Location: Baar, Switzerland
Contact:

Re: Seeing greatly increased snapshot deletion times in ESX7 vs ESX6

Post by Gostev »

Can you not just use Direct NFS transport mode instead of hot add? This issue above was the very reason why we added this transport mode in the first place...
tsightler
VP, Product Management
Posts: 6012
Liked: 2843 times
Joined: Jun 05, 2009 12:57 pm
Full Name: Tom Sightler
Contact:

Re: Seeing greatly increased snapshot deletion times in ESX7 vs ESX6

Post by tsightler » 1 person likes this post

Correct, by far the best way to address this is to use DirectNFS mode instead of hotadd and it works amazingly well (it is one of my favorite Veeam features).

However, just to let you know, ESX affinity with hotadd mode actually is supported as well, but it must be enabled via a registry key EnableSameHostHotaddMode. This regkey is documented in Veeam KB1681 in the "Known issues with NFS 3.0 Datastores" section. However, I would only do this if, for some reason, Direct NFS can't be used in your case.
mikeely
Expert
Posts: 224
Liked: 69 times
Joined: Nov 07, 2016 7:39 pm
Full Name: Mike Ely
Contact:

Re: Seeing greatly increased snapshot deletion times in ESX7 vs ESX6

Post by mikeely »

Our VBR server is a physical host so hotadd isn't an option using it. What's the Linux equivalent to that registry key?

Questions about DirectNFS mode:
1. Does it still require a physical host with an HBA or is that no longer a requirement?
2. Linux support?
'If you truly love Veeam, then you should not let us do this :D' --Gostev, in a particularly Blazing Saddles moment
Gostev
Chief Product Officer
Posts: 31556
Liked: 6719 times
Joined: Jan 01, 2006 1:01 am
Location: Baar, Switzerland
Contact:

Re: Seeing greatly increased snapshot deletion times in ESX7 vs ESX6

Post by Gostev »

It's the same key.

1. DirectNFS never required a physical host or an HBA.
2. If you mean Linux proxy, then in v11 Linux proxies support DirectNFS too.
mikeely
Expert
Posts: 224
Liked: 69 times
Joined: Nov 07, 2016 7:39 pm
Full Name: Mike Ely
Contact:

Re: Seeing greatly increased snapshot deletion times in ESX7 vs ESX6

Post by mikeely »

Thanks. Looking forward to 11 for a lot of reasons. Going to go shuffle through getting DirectNFS working.
'If you truly love Veeam, then you should not let us do this :D' --Gostev, in a particularly Blazing Saddles moment
Gostev
Chief Product Officer
Posts: 31556
Liked: 6719 times
Joined: Jan 01, 2006 1:01 am
Location: Baar, Switzerland
Contact:

Re: Seeing greatly increased snapshot deletion times in ESX7 vs ESX6

Post by Gostev »

You should be able to just use DirectNFS from your physical backup server host?
mikeely
Expert
Posts: 224
Liked: 69 times
Joined: Nov 07, 2016 7:39 pm
Full Name: Mike Ely
Contact:

Re: Seeing greatly increased snapshot deletion times in ESX7 vs ESX6

Post by mikeely »

Yeah that's the plan, and we still hadn't decommissioned the Windows proxy at our other datacenter. It's a VM but as I read the docs that shouldn't be a problem for DirectNFS right?
'If you truly love Veeam, then you should not let us do this :D' --Gostev, in a particularly Blazing Saddles moment
mikeely
Expert
Posts: 224
Liked: 69 times
Joined: Nov 07, 2016 7:39 pm
Full Name: Mike Ely
Contact:

Re: Seeing greatly increased snapshot deletion times in ESX7 vs ESX6

Post by mikeely » 2 people like this post

Ah, that's nice. I had everything set up for DirectNFS and didn't even know it - all I had to do was change the radio button on the proxy settings. Getting about 300MB/s performance on a largish backup job right now.

Thanks y'all.
'If you truly love Veeam, then you should not let us do this :D' --Gostev, in a particularly Blazing Saddles moment
Post Reply

Who is online

Users browsing this forum: No registered users and 47 guests