Wondering if I can get some clarification on the issue I am seeing. I failed a test VM over to a replica, then a few minutes later I failed it back again.
Process results are below:
Code: Select all
03/04/2019 10:19:20 Creating working snapshot on original VM
03/04/2019 10:28:20 Calculating original signature Hard disk 1 (48.0 GB)
03/04/2019 10:28:29 Changed block tracking has been disabled for backup proxy VM.
03/04/2019 10:33:16 Replicating restore point for Hard disk 1 (48.0 GB) 7.4 GB processed
03/04/2019 10:33:33 VM TEST01_replica was powered off successfully
03/04/2019 10:33:42 Creating replica restore point
03/04/2019 10:48:50 Replicating changes Hard disk 1 (48.0 GB) 30.6 GB processed
03/04/2019 10:49:13 Removing working snapshot from original VM
03/04/2019 10:50:03 Powering on original VM
03/04/2019 10:50:20 Failback completed at 03/04/2019 10:50:20
I created the replica this morning, just before the failover took place. Veeam is saying that there were 7.4GB of changes on the VM between the time I created the replica point, failed over and the time I initiated failback? Considering that the VM is simply a Win8.1 Ent machine, powered up but with no one logged in I find this 7.4GB hard to believe. But lets say fine.
Then once the VM was powered off and the final changes were replicated back it says it replicated another 30.6GB back...
That is not making any sense to me. The original VM has a 47GB drive, with only 29.2GB in use within the guest. The machine has not been used, just powered up. I can't see how it is possible to have anywhere near that level of changes.
The original replication job I ran to create this test replica this morning stated:
Processed 31.9GB
Read: 29.2GB
Transferred: 19.8GB (1.7x)
So the total VM was compressed and deduped down to 19.8GB and replicated.
So why did the replication failback copy 7.4GB + 30.6GB of changes on a machine that had been replicated, failed over, powered up, not used by anyone, then failed back, all within 1-2 hours (1hr to complete the original replication as it was a one time replication with no previous points, 1hr to complete failback)? The failback replicated more than the initial replication did! It just doesn't add up to me.
Any ideas?
We are currently running a decent sized VM (800GB) from replica in our DR site due to an issue with the source VM. This needs to be returned to production site ASAP but I can't risk using failback to do that at this time because it will power off the VM to do the final change replication. As shown above we have no idea what the final changes will be, it doesn't add up. If we allow Veeam to do the failback and it shuts the VM down to do the final change replication it could be down for many hours replicating a lot of 'changes', but in reality there are not that many changes, so it should be MUCH quicker.
I am currently doing a new replication job to do the 'manual reverse replication' method, which is in the process of calculating digests. But once this completes the initial replication we can power off, do a final replication which should be quick, then swap the VMs around. This isn't ideal for us because it will create a VM with a new ID which will break our backup chains. But it will be a lot quicker than using Veeam's built in failback due to the above issue. We can't risk replicating it back to the original VM because if it doesn't work for some reason we will have to revert back to using the normal Veeam failback process, even if it takes many hours with the server offline.
But this seems like a big issue with Veeam replication right now, and doesn't reflect my previous experience with failback some time ago. Although it always takes a while to calculate the digests I have not seen it replicate back that many changes before which makes no sense. I did some digging and I saw an old topic about this same issue, but that was before it seemed to be added in an update. But then I see someone mentioned they had the same issue in March , but no one has replied or acknowledged his issue so I have started a new topic for my issue.
Finally a feature request:
The biggest issue with the Veeam failback for me, when it is working correctly is that there is no option to confirm you want to proceed with failover by clicking a button. So the changes between the replica restore point and current replica VM state are calculated and replicated. Then the machine is powered off and the remaining changes are calculated. This makes downtime very hard to arrange because we don't know when it will happen. It would be good to have an option to have a confirmation where we can click a button or schedule to power off the machine at a time of our choosing, then do the final replication. Can this be added for review please, I don't see it would have much change in code, but would really help us and other users! I have seen this mentioned before in a couple of threads and Gostev mentioned it would be looked at, but several years later it is not an option. I understand that you are probably working on the process of "failing back as quickly as possible" so a manual step might not be ideal in all scenarios, but for us and many others users it really would help us with scheduling downtime in a 24/7 organisation.
Thanks!