-
- Expert
- Posts: 232
- Liked: 71 times
- Joined: Nov 07, 2016 7:39 pm
- Full Name: Mike Ely
- Contact:
Seeing greatly increased snapshot deletion times in ESX7 vs ESX6
This probably isn't a Veeam-specific issue beyond the fact that Veeam is writing and removing snapshots all the time, but I thought I'd ask in here to see if anyone else has had the same experience. We stood up a new cluster on brand-new hardware and have been migrating VMs over to it - we're going Intel->AMD on this change so we're having to shut down the VMs and all that. We're also upgrading the vmware hardware version on the VMs to get them current although this doesn't seem to be making any difference in the snapshot deletion time.
Anyhow what I'm seeing is VMs that previously took on average 4-6 seconds for snapshot deletion are now taking 40-60 seconds, consistently. It's really bad as the VMs are being stunned for long enough during snapshot consolidation that it's causing crashes, alerts, etc. Is this anything somebody else here has seen before, and if so were you able to resolve the issue?
Anyhow what I'm seeing is VMs that previously took on average 4-6 seconds for snapshot deletion are now taking 40-60 seconds, consistently. It's really bad as the VMs are being stunned for long enough during snapshot consolidation that it's causing crashes, alerts, etc. Is this anything somebody else here has seen before, and if so were you able to resolve the issue?
-
- Veeam Software
- Posts: 21139
- Liked: 2141 times
- Joined: Jul 11, 2011 10:22 am
- Full Name: Alexander Fogelson
- Contact:
Re: Seeing greatly increased snapshot deletion times in ESX7 vs ESX6
Hi Mike, what about the underlying storage where VMs reside - has it also changed?
-
- Expert
- Posts: 232
- Liked: 71 times
- Joined: Nov 07, 2016 7:39 pm
- Full Name: Mike Ely
- Contact:
Re: Seeing greatly increased snapshot deletion times in ESX7 vs ESX6
Nope, same storage: Tintri with the VAAI plugin.
'If you truly love Veeam, then you should not let us do this ' --Gostev, in a particularly Blazing Saddles moment
-
- Chief Product Officer
- Posts: 31814
- Liked: 7302 times
- Joined: Jan 01, 2006 1:01 am
- Location: Baar, Switzerland
- Contact:
Re: Seeing greatly increased snapshot deletion times in ESX7 vs ESX6
As far as I remember, Tintri was truly one of a kind with how it integrates with VMware? Some sort of very special storage virtualization technic that allowed them to improve VM snapshot management operations performance specifically. Sounds like this logic just does not play well with ESXi7?
-
- Veteran
- Posts: 643
- Liked: 312 times
- Joined: Aug 04, 2019 2:57 pm
- Full Name: Harvey
- Contact:
Re: Seeing greatly increased snapshot deletion times in ESX7 vs ESX6
I'm curious, what isolation tests have you done? As long as a backup uses vADP, it's a standard call to the same type of API, you can even reproduce with PowerCLI if you're interested.
For your storage backing the v7 environment, how do normal snapshot deletions for snapshots that run for the same amount of time fare? Both from the host itself and from remote servers using PowerCLI?
For your storage backing the v7 environment, how do normal snapshot deletions for snapshots that run for the same amount of time fare? Both from the host itself and from remote servers using PowerCLI?
-
- Expert
- Posts: 232
- Liked: 71 times
- Joined: Nov 07, 2016 7:39 pm
- Full Name: Mike Ely
- Contact:
Re: Seeing greatly increased snapshot deletion times in ESX7 vs ESX6
I don't have great data for manual snapshot removal either via powercli or vsphere web console because we just don't do that many of those - one offs for big software updates is about it. I did a test just now both on web and powercli and in both cases the VMs I was snapshotting took a long time to snapshot but consolidated almost instantly.
Is there a specific powershell way to use vADP or does something like use the API?
Is there a specific powershell way to use vADP or does something like
Code: Select all
Get-Vm vmname | New-Snapshot -name "foo" -Quiesce -Memory
-
- Veteran
- Posts: 643
- Liked: 312 times
- Joined: Aug 04, 2019 2:57 pm
- Full Name: Harvey
- Contact:
Re: Seeing greatly increased snapshot deletion times in ESX7 vs ESX6
Unless VMware has sneakily changed something (like Microsoft with HypeV...) it should be the exact same call as here: https://vdc-download.vmware.com/vmwb-re ... -guide.pdf (pg 69)
That should be exactly what PowerCLI calls.
For your tests, just to be sure though, the length of time on snapshot and the time since the last backup was roughly equivalent to the backup situation? Time on snapshot tends to be a factor that gets overlooked I think since you have redo log growth for even allegedly inactive machines. A few years back had a client who had long snapshot consolidation on their Lotus Notes machines, and turns out the backup window was overlapping with some replication/garbage collection procedure, which churned tons of data and made the snapshot redo logs bloat.
I'm not saying that's specifically your issue, but more to illustrate the effect that the time on snapshot can have
Code: Select all
// At this point we assume the virtual machine is identified as ManagedObjectReference vmMoRef.
String SnapshotName = "Backup";
String SnapshotDescription = "Temporary Snapshot for Backup";
boolean memory_files = false;
boolean quiesce_filesystem = true;
ManagedObjectReference taskRef = serviceConnection.getservice().CreateSnapshot_Task(vmMoRef,
SnapshotName, SnapshotDescription, memory_files, quiesce_filesystem);
For your tests, just to be sure though, the length of time on snapshot and the time since the last backup was roughly equivalent to the backup situation? Time on snapshot tends to be a factor that gets overlooked I think since you have redo log growth for even allegedly inactive machines. A few years back had a client who had long snapshot consolidation on their Lotus Notes machines, and turns out the backup window was overlapping with some replication/garbage collection procedure, which churned tons of data and made the snapshot redo logs bloat.
I'm not saying that's specifically your issue, but more to illustrate the effect that the time on snapshot can have
-
- VP, Product Management
- Posts: 7081
- Liked: 1511 times
- Joined: May 04, 2011 8:36 am
- Full Name: Andreas Neufert
- Location: Germany
- Contact:
Re: Seeing greatly increased snapshot deletion times in ESX7 vs ESX6
As far as I know the snapshot commit process did not change by VMware.
As usual datastore performance, snapshot place (if changed), data in the snapshot (changes during snapshot lifetime) and overall the IO load at the time the snapshot gets commited are key factors.
It do not matter if the snapshot was triggered by Veeam, the API, CLI or UI the process is always the same.
As usual datastore performance, snapshot place (if changed), data in the snapshot (changes during snapshot lifetime) and overall the IO load at the time the snapshot gets commited are key factors.
It do not matter if the snapshot was triggered by Veeam, the API, CLI or UI the process is always the same.
-
- Expert
- Posts: 232
- Liked: 71 times
- Joined: Nov 07, 2016 7:39 pm
- Full Name: Mike Ely
- Contact:
Re: Seeing greatly increased snapshot deletion times in ESX7 vs ESX6
I was able to prove the issue with VMWare support, they've escalated it to engineering. Something's different, unsure whether it has to do with something vmware did, differences in behavior between Intel (our old infra) and AMD (new), or something else entirely. It's pretty ugly though. I'll report back here once I have news from VMWare in case another Veeam user runs into this.
'If you truly love Veeam, then you should not let us do this ' --Gostev, in a particularly Blazing Saddles moment
-
- Expert
- Posts: 232
- Liked: 71 times
- Joined: Nov 07, 2016 7:39 pm
- Full Name: Mike Ely
- Contact:
Re: Seeing greatly increased snapshot deletion times in ESX7 vs ESX6
Yeah, they're trying to blame Veeam:
No mention of the 2+ gigs of support packages I uploaded to the ticket, either.I see that the previous engineer was able to identify that the issue is residing on your 7.0 vCenter and the snapshots are slow during the Veeam backups?
Have you gotten in touch with Veeam backupd?
'If you truly love Veeam, then you should not let us do this ' --Gostev, in a particularly Blazing Saddles moment
-
- Chief Product Officer
- Posts: 31814
- Liked: 7302 times
- Joined: Jan 01, 2006 1:01 am
- Location: Baar, Switzerland
- Contact:
Re: Seeing greatly increased snapshot deletion times in ESX7 vs ESX6
But of course... it's an easy way out
Which is exactly why experienced users already recommended above that you try and reproduce the issue by creating, holding and removing VM snapshots without Veeam in the picture. On a second thought though, this may prove to be a challenge in case your issue also requires significant concurrent load on Tintri, such as one from backup jobs reading data. In this case, running IOmeter in a few VMs while snapshot is being removed should do.
Also, you really need to put Tintri in the loop with VMware, as honestly they would be the primary suspect for me, as opposed to VMware. Because by now, we can be confident that ESXi 7 itself does not have some major regression with snapshot deletion times. Otherwise, Veeam forums would have had a 10+ pages topic devoted to this issue by now
Which is exactly why experienced users already recommended above that you try and reproduce the issue by creating, holding and removing VM snapshots without Veeam in the picture. On a second thought though, this may prove to be a challenge in case your issue also requires significant concurrent load on Tintri, such as one from backup jobs reading data. In this case, running IOmeter in a few VMs while snapshot is being removed should do.
Also, you really need to put Tintri in the loop with VMware, as honestly they would be the primary suspect for me, as opposed to VMware. Because by now, we can be confident that ESXi 7 itself does not have some major regression with snapshot deletion times. Otherwise, Veeam forums would have had a 10+ pages topic devoted to this issue by now
-
- Expert
- Posts: 232
- Liked: 71 times
- Joined: Nov 07, 2016 7:39 pm
- Full Name: Mike Ely
- Contact:
Re: Seeing greatly increased snapshot deletion times in ESX7 vs ESX6
Yeah, we're working the Tintri angle as well. I'd open a Veeam ticket on this if I thought there was any reason to think there was a need to but since Veeam's making the same API calls to both systems there's no logic behind that.
+1 on the iometer suggestion, and I'll do something to dirty a lot of blocks on the VM I'm testing with.
+1 on the iometer suggestion, and I'll do something to dirty a lot of blocks on the VM I'm testing with.
'If you truly love Veeam, then you should not let us do this ' --Gostev, in a particularly Blazing Saddles moment
-
- Expert
- Posts: 232
- Liked: 71 times
- Joined: Nov 07, 2016 7:39 pm
- Full Name: Mike Ely
- Contact:
Re: Seeing greatly increased snapshot deletion times in ESX7 vs ESX6
As we've been working through this a new data point has emerged - the difference in snapshot consolidation times is very strongly correlated to switching over to hot-add versus NBD. I can't think of what the mechanism might be to cause this but we saw it when we moved all our VMs at a given location over to hot-add while changing nothing else on those VMs - they're on the same hardware with the same backing storage on the same ESX servers and the same vSphere server at the same versions of all things.
I've opened ticket 04643396 at Sev2 since the minute-plus stun times are crashing things in our environment.
I've opened ticket 04643396 at Sev2 since the minute-plus stun times are crashing things in our environment.
'If you truly love Veeam, then you should not let us do this ' --Gostev, in a particularly Blazing Saddles moment
-
- Expert
- Posts: 232
- Liked: 71 times
- Joined: Nov 07, 2016 7:39 pm
- Full Name: Mike Ely
- Contact:
Re: Seeing greatly increased snapshot deletion times in ESX7 vs ESX6
Had a great call with support, and it turns out the problem is basically this:
https://kb.vmware.com/s/article/2010953
VMs which are not on the same compute host as the proxy will be stunned for at minimum about 40 seconds. Moving to NFSv4 at the ESX datastore level is not an option for us.
One possible approach would be to have one proxy per ESX host, but here are the problems I can't see a way past so far (please feel free to contribute solutions to them):
https://kb.vmware.com/s/article/2010953
VMs which are not on the same compute host as the proxy will be stunned for at minimum about 40 seconds. Moving to NFSv4 at the ESX datastore level is not an option for us.
One possible approach would be to have one proxy per ESX host, but here are the problems I can't see a way past so far (please feel free to contribute solutions to them):
- With multiple proxies come multiple threads, and we would almost immediately overwhelm the Tintri with too many backup threads.
- Setting "automatically select proxy" only chooses the proxy with the fewest threads being used - it does not AFAIK support ESX host affinity - but it would be nice if it did!
'If you truly love Veeam, then you should not let us do this ' --Gostev, in a particularly Blazing Saddles moment
-
- Chief Product Officer
- Posts: 31814
- Liked: 7302 times
- Joined: Jan 01, 2006 1:01 am
- Location: Baar, Switzerland
- Contact:
Re: Seeing greatly increased snapshot deletion times in ESX7 vs ESX6
Can you not just use Direct NFS transport mode instead of hot add? This issue above was the very reason why we added this transport mode in the first place...
-
- VP, Product Management
- Posts: 6035
- Liked: 2860 times
- Joined: Jun 05, 2009 12:57 pm
- Full Name: Tom Sightler
- Contact:
Re: Seeing greatly increased snapshot deletion times in ESX7 vs ESX6
Correct, by far the best way to address this is to use DirectNFS mode instead of hotadd and it works amazingly well (it is one of my favorite Veeam features).
However, just to let you know, ESX affinity with hotadd mode actually is supported as well, but it must be enabled via a registry key EnableSameHostHotaddMode. This regkey is documented in Veeam KB1681 in the "Known issues with NFS 3.0 Datastores" section. However, I would only do this if, for some reason, Direct NFS can't be used in your case.
However, just to let you know, ESX affinity with hotadd mode actually is supported as well, but it must be enabled via a registry key EnableSameHostHotaddMode. This regkey is documented in Veeam KB1681 in the "Known issues with NFS 3.0 Datastores" section. However, I would only do this if, for some reason, Direct NFS can't be used in your case.
-
- Expert
- Posts: 232
- Liked: 71 times
- Joined: Nov 07, 2016 7:39 pm
- Full Name: Mike Ely
- Contact:
Re: Seeing greatly increased snapshot deletion times in ESX7 vs ESX6
Our VBR server is a physical host so hotadd isn't an option using it. What's the Linux equivalent to that registry key?
Questions about DirectNFS mode:
1. Does it still require a physical host with an HBA or is that no longer a requirement?
2. Linux support?
Questions about DirectNFS mode:
1. Does it still require a physical host with an HBA or is that no longer a requirement?
2. Linux support?
'If you truly love Veeam, then you should not let us do this ' --Gostev, in a particularly Blazing Saddles moment
-
- Chief Product Officer
- Posts: 31814
- Liked: 7302 times
- Joined: Jan 01, 2006 1:01 am
- Location: Baar, Switzerland
- Contact:
Re: Seeing greatly increased snapshot deletion times in ESX7 vs ESX6
It's the same key.
1. DirectNFS never required a physical host or an HBA.
2. If you mean Linux proxy, then in v11 Linux proxies support DirectNFS too.
1. DirectNFS never required a physical host or an HBA.
2. If you mean Linux proxy, then in v11 Linux proxies support DirectNFS too.
-
- Expert
- Posts: 232
- Liked: 71 times
- Joined: Nov 07, 2016 7:39 pm
- Full Name: Mike Ely
- Contact:
Re: Seeing greatly increased snapshot deletion times in ESX7 vs ESX6
Thanks. Looking forward to 11 for a lot of reasons. Going to go shuffle through getting DirectNFS working.
'If you truly love Veeam, then you should not let us do this ' --Gostev, in a particularly Blazing Saddles moment
-
- Chief Product Officer
- Posts: 31814
- Liked: 7302 times
- Joined: Jan 01, 2006 1:01 am
- Location: Baar, Switzerland
- Contact:
Re: Seeing greatly increased snapshot deletion times in ESX7 vs ESX6
You should be able to just use DirectNFS from your physical backup server host?
-
- Expert
- Posts: 232
- Liked: 71 times
- Joined: Nov 07, 2016 7:39 pm
- Full Name: Mike Ely
- Contact:
Re: Seeing greatly increased snapshot deletion times in ESX7 vs ESX6
Yeah that's the plan, and we still hadn't decommissioned the Windows proxy at our other datacenter. It's a VM but as I read the docs that shouldn't be a problem for DirectNFS right?
'If you truly love Veeam, then you should not let us do this ' --Gostev, in a particularly Blazing Saddles moment
-
- Expert
- Posts: 232
- Liked: 71 times
- Joined: Nov 07, 2016 7:39 pm
- Full Name: Mike Ely
- Contact:
Re: Seeing greatly increased snapshot deletion times in ESX7 vs ESX6
Ah, that's nice. I had everything set up for DirectNFS and didn't even know it - all I had to do was change the radio button on the proxy settings. Getting about 300MB/s performance on a largish backup job right now.
Thanks y'all.
Thanks y'all.
'If you truly love Veeam, then you should not let us do this ' --Gostev, in a particularly Blazing Saddles moment
Who is online
Users browsing this forum: Google [Bot] and 41 guests