Applying Retention Policy Slow

adapterer · Post by **adapterer** » Jul 10, 2017 4:28 am this post

Hi,

I'm having a problem with a replication job not meeting RPO due to long snapshot deletion/merge times aka "Applying retention policy" segment of the job. Maybe this is more of a question for VMWare but perhaps someone has been down this path.

I have a client VM of about 3TB which has 34 VM disks attached. The snapshot removal time ranges from 40 mins to 8 hrs and the job is supposed to run every 2 hours so we are not meeting RPO. The disk subsystem I have never seen above about 30% busy, and is capable of much higher IOPS/throughput so it makes me suspect the bottleneck is elsewhere. The storage is iSCSI with 10GBe networking, again I can't see a bottleneck on the network. So, I have some questions:

1. Will VMWare try to 'merge' all 34 VM disks at once?
2. If so, is there a way to limit the number of disks merged at once?
3. Is it possible this is causing odd performance limitations with iSCSI storage and 'iSCSI disk reservations', i.e. the storage being locked for write whilst it's waiting for another to complete (for reference this VM is on it's own datastore by itself)

Finding this bottleneck is driving me nuts.. thanks in advance!

adapterer · Post by **adapterer** » Jul 11, 2017 11:57 pm this post

Has anyone dealt with VM's with lots of VM disks attached?

Post by **dellock6** » Jul 12, 2017 11:35 am this post

Since vSphere 6.0 disks are managed in parallel:
https://www.virtualtothecore.com/en/vsp ... hing-past/

I don't know how many disks are consolidated at once, but in the tests that Tom did, 15 disks were all processed at the same time indeed. No idea if this option can be limited, as you said is something that you may ask to VMware, unless someone here knows.

adapterer · Jul 13, 2017 12:47 am

Thanks Luca, that really helps.

I have engaged VMWare support - I'm scratching my head on this one. I can see the merge happening and review ESXTOP stats - only getting 10MB/s read/write from the host. Yet, if I run the VMWare IO analyzer appliance simultaneously (same host, same datastore, no other VM's) I can get 300MB/s + for 512k random read/writes). Doesnt appear that host,network or storage are running out of headroom

hexadecimal · Post by **hexadecimal** » Nov 16, 2021 3:21 pm this post

I'm curious if you ever found a solution to this as we're facing the same issues. Our storage array is capable of much higher IOPS yet the retention policy application takes an incredibly long time however, this is occurin to a VM that is no more than 200GB in size (90GB used.) Replicating to a DR site that has 2 small network monitoring VMs. Trully at a loss and will get vmware support involved shortly too.

Post by **foggy** » Nov 16, 2021 5:52 pm this post

Have you tried to perform the same process (i.e. create the snapshot and consolidate it) manually? That will allow to rule out Veeam from the equation completely.

wsmery · Post by **wsmery** » Dec 29, 2022 5:15 am this post

Seeing a similar problem with veeam11 after putting replicas on a new SAN.

Post by **PetrM** » Dec 29, 2022 2:21 pm this post

Hi Wayne,

In fact, Veeam just sends a request to create or delete or revert a snapshot while the process itself is fully managed by the hypervisor. I suppose you would see a slow snapshot deletion if you tried to perform the same thing directly in vSphere client as Foggy said above. I don't recommend carrying out such a test on your working replicas but it definitely makes sense to ask our support engineers to have a look at the issue and probably involve VMware or storage vendor support team in the investigation.

Thanks!

R&D Forums

Applying Retention Policy Slow

Re: Applying Retention Policy Slow

Re: Applying Retention Policy Slow

Re: Applying Retention Policy Slow

Re: Applying Retention Policy Slow

Re: Applying Retention Policy Slow

Re: Applying Retention Policy Slow

Re: Applying Retention Policy Slow

Who is online