Comprehensive data protection for all workloads
Post Reply
ChrisGundry
Veteran
Posts: 259
Liked: 40 times
Joined: Aug 26, 2015 2:56 pm
Full Name: Chris Gundry
Contact:

Replica failback performance / changes not making sense

Post by ChrisGundry »

Hi,

Wondering if I can get some clarification on the issue I am seeing. I failed a test VM over to a replica, then a few minutes later I failed it back again.

Process results are below:

Code: Select all

03/04/2019 10:19:20          Creating working snapshot on original VM
03/04/2019 10:28:20          Calculating original signature Hard disk 1 (48.0 GB) 
03/04/2019 10:28:29          Changed block tracking has been disabled for backup proxy VM.
03/04/2019 10:33:16          Replicating restore point for Hard disk 1 (48.0 GB) 7.4 GB processed
03/04/2019 10:33:33          VM TEST01_replica was powered off successfully
03/04/2019 10:33:42          Creating replica restore point
03/04/2019 10:48:50          Replicating changes Hard disk 1 (48.0 GB) 30.6 GB processed
03/04/2019 10:49:13          Removing working snapshot from original VM
03/04/2019 10:50:03          Powering on original VM
03/04/2019 10:50:20          Failback completed at 03/04/2019 10:50:20
My issue is:
I created the replica this morning, just before the failover took place. Veeam is saying that there were 7.4GB of changes on the VM between the time I created the replica point, failed over and the time I initiated failback? Considering that the VM is simply a Win8.1 Ent machine, powered up but with no one logged in I find this 7.4GB hard to believe. But lets say fine.
Then once the VM was powered off and the final changes were replicated back it says it replicated another 30.6GB back...

That is not making any sense to me. The original VM has a 47GB drive, with only 29.2GB in use within the guest. The machine has not been used, just powered up. I can't see how it is possible to have anywhere near that level of changes.

The original replication job I ran to create this test replica this morning stated:
Processed 31.9GB
Read: 29.2GB
Transferred: 19.8GB (1.7x)

So the total VM was compressed and deduped down to 19.8GB and replicated.

So why did the replication failback copy 7.4GB + 30.6GB of changes on a machine that had been replicated, failed over, powered up, not used by anyone, then failed back, all within 1-2 hours (1hr to complete the original replication as it was a one time replication with no previous points, 1hr to complete failback)? The failback replicated more than the initial replication did! It just doesn't add up to me.

Any ideas?

We are currently running a decent sized VM (800GB) from replica in our DR site due to an issue with the source VM. This needs to be returned to production site ASAP but I can't risk using failback to do that at this time because it will power off the VM to do the final change replication. As shown above we have no idea what the final changes will be, it doesn't add up. If we allow Veeam to do the failback and it shuts the VM down to do the final change replication it could be down for many hours replicating a lot of 'changes', but in reality there are not that many changes, so it should be MUCH quicker.

I am currently doing a new replication job to do the 'manual reverse replication' method, which is in the process of calculating digests. But once this completes the initial replication we can power off, do a final replication which should be quick, then swap the VMs around. This isn't ideal for us because it will create a VM with a new ID which will break our backup chains. But it will be a lot quicker than using Veeam's built in failback due to the above issue. We can't risk replicating it back to the original VM because if it doesn't work for some reason we will have to revert back to using the normal Veeam failback process, even if it takes many hours with the server offline.

But this seems like a big issue with Veeam replication right now, and doesn't reflect my previous experience with failback some time ago. Although it always takes a while to calculate the digests I have not seen it replicate back that many changes before which makes no sense. I did some digging and I saw an old topic about this same issue, but that was before it seemed to be added in an update. But then I see someone mentioned they had the same issue in March , but no one has replied or acknowledged his issue so I have started a new topic for my issue.

Finally a feature request:
The biggest issue with the Veeam failback for me, when it is working correctly is that there is no option to confirm you want to proceed with failover by clicking a button. So the changes between the replica restore point and current replica VM state are calculated and replicated. Then the machine is powered off and the remaining changes are calculated. This makes downtime very hard to arrange because we don't know when it will happen. It would be good to have an option to have a confirmation where we can click a button or schedule to power off the machine at a time of our choosing, then do the final replication. Can this be added for review please, I don't see it would have much change in code, but would really help us and other users! I have seen this mentioned before in a couple of threads and Gostev mentioned it would be looked at, but several years later it is not an option. I understand that you are probably working on the process of "failing back as quickly as possible" so a manual step might not be ideal in all scenarios, but for us and many others users it really would help us with scheduling downtime in a 24/7 organisation.

Thanks!
bdufour
Expert
Posts: 206
Liked: 41 times
Joined: Nov 01, 2017 8:52 pm
Full Name: blake dufour
Contact:

Re: Replica failback performance / changes not making sense

Post by bdufour »

We are currently running a decent sized VM (800GB) from replica in our DR site due to an issue with the source VM.
this is exactly why i make sure i always have enough production storage to replicate critical vms back to production storage! i replicate to a dr site as well, but if i have an OS level issue with a critical vm on production - i like the idea of having a replica ready to go on the production side. bc sure you can failover to the dr site, but its failing back those changes to the primary site, when that time comes, where the real issues can arise - plus issues you mentioned! bc you will also still need to replicate all other vms over the wan to the dr site, while trying to failback the vm you failed over too.
alphinantony
Influencer
Posts: 10
Liked: 1 time
Joined: Apr 04, 2017 5:03 am
Full Name: Alphin Antony
Location: Kochi, India
Contact:

Re: Replica failback performance / changes not making sense

Post by alphinantony »

I'm facing same issue with one of my clients prod environment. This issue was supposed to be cleared with Update 2 release for Veeam 9.5. Can someone please shed some light on this?
Post Reply

Who is online

Users browsing this forum: Google [Bot], Semrush [Bot] and 71 guests