-
- Expert
- Posts: 227
- Liked: 46 times
- Joined: Oct 12, 2015 11:24 pm
- Contact:
Applying Retention Policy Slow
Hi,
I'm having a problem with a replication job not meeting RPO due to long snapshot deletion/merge times aka "Applying retention policy" segment of the job. Maybe this is more of a question for VMWare but perhaps someone has been down this path.
I have a client VM of about 3TB which has 34 VM disks attached. The snapshot removal time ranges from 40 mins to 8 hrs and the job is supposed to run every 2 hours so we are not meeting RPO. The disk subsystem I have never seen above about 30% busy, and is capable of much higher IOPS/throughput so it makes me suspect the bottleneck is elsewhere. The storage is iSCSI with 10GBe networking, again I can't see a bottleneck on the network. So, I have some questions:
1. Will VMWare try to 'merge' all 34 VM disks at once?
2. If so, is there a way to limit the number of disks merged at once?
3. Is it possible this is causing odd performance limitations with iSCSI storage and 'iSCSI disk reservations', i.e. the storage being locked for write whilst it's waiting for another to complete (for reference this VM is on it's own datastore by itself)
Finding this bottleneck is driving me nuts.. thanks in advance!
I'm having a problem with a replication job not meeting RPO due to long snapshot deletion/merge times aka "Applying retention policy" segment of the job. Maybe this is more of a question for VMWare but perhaps someone has been down this path.
I have a client VM of about 3TB which has 34 VM disks attached. The snapshot removal time ranges from 40 mins to 8 hrs and the job is supposed to run every 2 hours so we are not meeting RPO. The disk subsystem I have never seen above about 30% busy, and is capable of much higher IOPS/throughput so it makes me suspect the bottleneck is elsewhere. The storage is iSCSI with 10GBe networking, again I can't see a bottleneck on the network. So, I have some questions:
1. Will VMWare try to 'merge' all 34 VM disks at once?
2. If so, is there a way to limit the number of disks merged at once?
3. Is it possible this is causing odd performance limitations with iSCSI storage and 'iSCSI disk reservations', i.e. the storage being locked for write whilst it's waiting for another to complete (for reference this VM is on it's own datastore by itself)
Finding this bottleneck is driving me nuts.. thanks in advance!
-
- Expert
- Posts: 227
- Liked: 46 times
- Joined: Oct 12, 2015 11:24 pm
- Contact:
Re: Applying Retention Policy Slow
Has anyone dealt with VM's with lots of VM disks attached?
-
- VeeaMVP
- Posts: 6166
- Liked: 1971 times
- Joined: Jul 26, 2009 3:39 pm
- Full Name: Luca Dell'Oca
- Location: Varese, Italy
- Contact:
Re: Applying Retention Policy Slow
Since vSphere 6.0 disks are managed in parallel:
https://www.virtualtothecore.com/en/vsp ... hing-past/
I don't know how many disks are consolidated at once, but in the tests that Tom did, 15 disks were all processed at the same time indeed. No idea if this option can be limited, as you said is something that you may ask to VMware, unless someone here knows.
https://www.virtualtothecore.com/en/vsp ... hing-past/
I don't know how many disks are consolidated at once, but in the tests that Tom did, 15 disks were all processed at the same time indeed. No idea if this option can be limited, as you said is something that you may ask to VMware, unless someone here knows.
Luca Dell'Oca
Principal EMEA Cloud Architect @ Veeam Software
@dellock6
https://www.virtualtothecore.com/
vExpert 2011 -> 2022
Veeam VMCE #1
Principal EMEA Cloud Architect @ Veeam Software
@dellock6
https://www.virtualtothecore.com/
vExpert 2011 -> 2022
Veeam VMCE #1
-
- Expert
- Posts: 227
- Liked: 46 times
- Joined: Oct 12, 2015 11:24 pm
- Contact:
Re: Applying Retention Policy Slow
Thanks Luca, that really helps.
I have engaged VMWare support - I'm scratching my head on this one. I can see the merge happening and review ESXTOP stats - only getting 10MB/s read/write from the host. Yet, if I run the VMWare IO analyzer appliance simultaneously (same host, same datastore, no other VM's) I can get 300MB/s + for 512k random read/writes). Doesnt appear that host,network or storage are running out of headroom
I have engaged VMWare support - I'm scratching my head on this one. I can see the merge happening and review ESXTOP stats - only getting 10MB/s read/write from the host. Yet, if I run the VMWare IO analyzer appliance simultaneously (same host, same datastore, no other VM's) I can get 300MB/s + for 512k random read/writes). Doesnt appear that host,network or storage are running out of headroom
-
- Influencer
- Posts: 16
- Liked: 8 times
- Joined: Apr 26, 2021 3:18 pm
- Contact:
Re: Applying Retention Policy Slow
I'm curious if you ever found a solution to this as we're facing the same issues. Our storage array is capable of much higher IOPS yet the retention policy application takes an incredibly long time however, this is occurin to a VM that is no more than 200GB in size (90GB used.) Replicating to a DR site that has 2 small network monitoring VMs. Trully at a loss and will get vmware support involved shortly too.
-
- Veeam Software
- Posts: 21139
- Liked: 2141 times
- Joined: Jul 11, 2011 10:22 am
- Full Name: Alexander Fogelson
- Contact:
Re: Applying Retention Policy Slow
Have you tried to perform the same process (i.e. create the snapshot and consolidate it) manually? That will allow to rule out Veeam from the equation completely.
-
- Novice
- Posts: 8
- Liked: never
- Joined: Sep 24, 2019 3:56 pm
- Full Name: Wayne Mery
- Contact:
Re: Applying Retention Policy Slow
Seeing a similar problem with veeam11 after putting replicas on a new SAN.
-
- Veeam Software
- Posts: 3626
- Liked: 608 times
- Joined: Aug 28, 2013 8:23 am
- Full Name: Petr Makarov
- Location: Prague, Czech Republic
- Contact:
Re: Applying Retention Policy Slow
Hi Wayne,
In fact, Veeam just sends a request to create or delete or revert a snapshot while the process itself is fully managed by the hypervisor. I suppose you would see a slow snapshot deletion if you tried to perform the same thing directly in vSphere client as Foggy said above. I don't recommend carrying out such a test on your working replicas but it definitely makes sense to ask our support engineers to have a look at the issue and probably involve VMware or storage vendor support team in the investigation.
Thanks!
In fact, Veeam just sends a request to create or delete or revert a snapshot while the process itself is fully managed by the hypervisor. I suppose you would see a slow snapshot deletion if you tried to perform the same thing directly in vSphere client as Foggy said above. I don't recommend carrying out such a test on your working replicas but it definitely makes sense to ask our support engineers to have a look at the issue and probably involve VMware or storage vendor support team in the investigation.
Thanks!
Who is online
Users browsing this forum: No registered users and 28 guests