Page 1 of 1

Snapshot Removal slowness

Posted: Oct 04, 2017 1:23 pm
by B.F.
We have a VM which is around 850gb. It has lots of changes daily which means that the VM replication can sometimes take a bit of time. The actual data transfer for the replication takes around 1hr 20min. The total time for the job can take over 6hrs! The rest of the time is spent removing snapshot (according to vSphere) which I'm assuming is doing some merging? I'm starting to wonder if we did just a straight full replication every time if it will be quicker than doing the merge process.

Replication Details
1 Restore Point
Data Transfer = Direct (WAN Accelerator is greyed out)

Any suggestions or alternatives on how to speed this up?

Thanks

Re: Snapshot Removal slowness

Posted: Oct 04, 2017 4:06 pm
by DGrinev
Hi,
B.F. wrote: I'm starting to wonder if we did just a straight full replication every time if it will be quicker than doing the merge process.
You cannot avoid the merge process.
You can reduce the snapshot consolidation time by integration with storage systems (if you're running any supported storage for integration).
Or you can try to offload the datastore where this VM runs.
Also, I'd recommended you to retain more than 1 restore point for replication, in case of the RP corruption due to malware etc failover becomes unavailable.
Please review the existing discussion you might find useful information there. Thanks!

Re: Snapshot Removal slowness

Posted: Oct 04, 2017 4:19 pm
by B.F.
DGrinev wrote: Or you can try to offload the datastore where this VM runs.
Not sure if I understand what that means.

Another thought was maybe replicate more often throughout the day. That way it will incorporate the changes throughout the day in smaller chunks instead of one 6+ hour chunk once a day?

Thanks

Re: Snapshot Removal slowness

Posted: Oct 04, 2017 6:38 pm
by Deon
Hello B.F.,

DGrinev meant that it is likely that your datastore is quite busy with active I/O, that would be the most likely explanation to the snapshot removal slowness.
If you would find a way to put this highly transactional VM on a faster/less IO intensive datastore, the issue may be gone alltogether.

Another possibility to consider is a replication job run at a different time, when the VMs on the datastores are not as active, so that the snapshot operations would be faster. This could only work if you have an analysis of datastore latency/IO during the day and notice that at some periods of time the datastore disk is less loaded, i.e. if the users are less active during the night. Of course in our AlwaysOn 24/7 world it is not always possible to have this "breather".

You could also look deeper into what actually makes the disk operations slow. The underlying issue may be either in the fragmentation/blocksize of the VMFS datastore, the connection between the host and the storage with data holding the datastore, or poor random read from the disk device. However this type of analysis usually takes some resources in a form of a specialist who closely works with performance troubleshooting and knows VMware technology well, so it's not always trivial. And in the end you need to consider what to change, which may result in an uncomfortable decision: i.e. a necessity to buy faster disks or reformat datastore, which is not always possible.

Re: Snapshot Removal slowness

Posted: Oct 06, 2017 7:48 pm
by B.F.
The datastore where this job places all the replica's is actually isolated from other active VM's

I did a little more digging on the one particular VM that consistently takes much longer. It has SQL server installed and the install was done by another vendor. Looked into the scheduled SQL tasks and I discovered that every night it does an Index Reorganization along with Updating the Statistics. I'm assuming this process would churn a lot of data changes. Disabled that job and sure enough, reduced the amount of time it takes by 2hrs! Made the adjustment so that the re-index is only done once a week (daily is unnecessary from what I've read).

Thanks

Re: Snapshot Removal slowness

Posted: Oct 11, 2017 3:28 pm
by foggy
I also wonder is it a temporary snapshot that is created for the source VM during backup or the replica VM restore point (snapshot) that is merged when retention applies at the end of the job.

Re: Snapshot Removal slowness

Posted: Oct 11, 2017 4:03 pm
by B.F.
It's the replica VM doing disk consolidation when removing the snapshot according to vCenter

Re: Snapshot Removal slowness

Posted: Oct 11, 2017 4:09 pm
by foggy
Then look at the target datastore performance, since there's no ability to perform full replication each time.

Re: Snapshot Removal slowness

Posted: Oct 16, 2017 4:33 pm
by B.F.
Perhaps backing up to a ReFS storage would be a better option in this case?

Re: Snapshot Removal slowness

Posted: Oct 16, 2017 4:41 pm
by foggy
It depends n your requirements, since backup and replication have different scenarios behind.